Group Dynamic Analysis and Multi-Modal Ontologies

Task Objectives

Lymba has developed a state-of-the-art Natural Language Processing system named K-Platform for the extraction of semantically-rich knowledge from text. In the following tasks, Lymba proposes two areas of innovative research to enhance automatic comprehension of information: Group Dynamic Analysis and Multi-Modal Ontologies.
 
Task 1: Analysis of Social Group Dynamics in Textual Communications
In this task, Lymba proposes to adapt its K-Platform NLP system to research methods of capturing and modeling attributes of social groups and its members through textual communications (email, blogs, social network posts, etc). Organizational membership, structure and developmental stage can be extracted from communications and used to issue reports on activities or emerging social groups. Groups tend to use language in a unique and mutually accepted way. Within a group, members will frequently discuss topics related to the group’s common interest. These topics can be aggregated to represent group beliefs.
Within a group, each member plays a certain role. In the process of identifying the group’s role, Lymba will make use of the set of goals identified for the members of the group and the group’s structure as defined by the roles and interactions between its members. We will take into account the dependencies between the individual goals of the members and weigh their objectives accordingly – aims of low-level members help achieve the goals of higher-level members; objectives of group leaders outweigh the goals of non-leaders. We also note that, for homogenous groups, all members of the group try to attain the group’s goal. Thus, a predominant intention for many members of the group might be classified as the group’s purpose.
Lymba proposes to exploit the attributes, roles, and relations of individuals extracted by the K-Platform processing and project them onto the group level. Specifically, we propose to build a system that recognizes groups, their stage in the group lifecycle, and their classification. We further propose to develop modules dedicated to recognizing emerging and dissolving groups. Lymba has developed a relationship detection system that identifies links between persons and groups, such as: cohabitating with, is friends with, kinship, attends school with, sold to, escorted, works with, and others. Lymba proposes to extend this work and develop algorithms to form a network of people, organizations, and groups in order to detect the interpersonal relationships, associations between groups and group members and the strength of interactions.
The recognition of emerging groups and dissolving groups require special attention. Since there are exponentially many potential groups, which could form from any given set of individuals, Lymba will reduce the solution space by identifying a set of leading indicators of group formation, in particular development of common goals, leadership, and participant interaction.
Groups differ not only in their developmental stage, but also in the composition of their defining dimensions. In this task, Lymba will use data containing attributes, roles, interpersonal relations, intensions, beliefs. We will develop group profiles in terms of the make up of group dimensions, including: (1) Peer groups (high in common participant attributes, low in dominance structure) (2) Cliques (high in common participant attributes, unstable in dominance structure) (3) Gangs (focus on participant location, high in dominance structure) (4) Mobs (short lifecycle, high in common goals) (5) Squads (high in common goals, high in dominance structure) and (6) Communities (overlapping commonalities, moderate dominance structure).
Task 2: Multi-modal ontologies for multimedia information analysis
Human communication is expressed in a multi-media manner: including visual, audio, and text-based. These channels can be used independently or jointly to express information that oftentimes is similar or identical, but subject to the attributes of the presentation method. Much research has been done on information analysis of each of these media types individually. Since there are times when contextual information is only available from a combination of the understanding of various media, some research has attempted to fuse the analyses in an attempt to capture this combined knowledge. This method, however, misses the contextualized information that is hidden behind the integrated analysis of the modalities. Only by joining the analysis of the modalities can this information be made available to client analysis applications and users. Consider a caption describing an image in a newspaper, there may exist information that is only available through the understanding of the connection of the caption to the image components.
A simple example would be a caption describing a photo of a soccer player about to score a goal, with his back to the photographer, seen in Figure 1. The caption states the player’s name, which is also written on the jersey, thereby enhancing the opportunity for rich information extraction. How would information about the color of the player’s jersey
Additionally, identical or similar concepts and entities can be expressed across modalities. The nature of the modality affects how the concept or entity is presented, but the information present may be the same. The research challenge is how to join this multimodal information into a coherent and accessible framework for innovative analysis. Lymba proposes the following tasks to meet this challenge and bridge the semantic gap present between textual analysis and visual analysis.
Lymba proposes to exploit the hierarchical organization of concepts and the multi-modal manner of expressing information about concepts to develop ontologies that capture features spanning modalities. In a textual hierarchy of words such as WordNet, or those generated by Lymba’s automatic ontology generation system Jaguar, concepts have features which are generalized up the hierarchy. For example in WordNet, the concept train has subclasses of freight train, passenger train, streamliner and others. Freight trains and streamliners are both trains, but have different visual cues and different purposes (see Figure 2). However, they share some common traits with a train object and the models for detecting train include a combination of features used to detect the different types of trains. Thus given annotated images for lower-level concepts in the textual ontology, the ontology-driven approach generates models for those concepts as well as induces a model for higher-level concepts in the ontology. This enables an automatic image annotation tool to classify a subway train as a train even if it has not seen specific examples of subway train in its training data. Lymba has already demonstrated that the semantics between annotation words as derived from a text ontology can be used to generate hierarchical models for image annotation and has published papers on this subject.
Lymba proposes extending textual hierarchies with image and video features, associating these visual features not only with annotated words used in object recognition but also its hypernyms in the ontology. Combining language models and image features will enable multi-modal surface forms and glosses to be associated with concepts. Synsets and glosses may appear across modalities. Surface forms can be in the form of textual representation, visual terms or visual feature blobs that express the extracted features.
These multi-modal ontologies therefore provide a mesh of concepts with methods for identifying them in textual and multimedia data, and a collection of sample multimedia data that captures the concepts and the context in which they appear. Such a rich representation of multimedia information can be used in building specialized analysis systems and tools such as recognizers and query systems. A number of methods for visual feature extraction and object detection have been described, the proposed effort here is to illustrate the integration of analysis across modalities.