A significant portion of news and social media coverage contains opinions with clear economic relevance – product and travel reviews, for example, or the articles of well-known and respected bloggers who influence consumer purchase decisions. Analyzing and acting upon user-generated content is becoming imperative for decision makers who aim to engage with large user communities.
The ever-increasing amount of such user-generated content and the limits of human cognition require automated approaches to analyzing the sentiment expressed in such online content. As part of opinion mining, sentiment analysis identifies and aggregates polar opinions – i.e., positive or negative statements about facts. The results can be provided as data services, or in the form of visual tools that color-code the results to indicate the automated classification. Colors in the following examples range from red (negative) to grey (neutral) and green (positive). They vary in saturation, depending on the degree of polarity. Vivid colors hint at emotionally charged issues, less saturated ones a more neutral coverage.
Extracting Opinions from Multilingual Resources
The user-generated content gathered from various social media platforms is noisy, unstructured, and multilingual. Processing this type of content requires robust text mining techniques, ranging from simple spelling corrections to more complex tasks such as sentiment analysis or the identification of named entities (locations, persons, organizations).
Multilingual capabilities are becoming increasingly important when applying these techniques to global networks. This goes beyond merely detecting the language(s) used, a comparably straightforward task. It requires specific language resources (sentiment lexicons), and a system that is capable of processing grammatical structures and consider the idiosyncrasies of specific language communities.
Context-Awareness and Dealing with Ambiguity
The limited ability of automated systems to resolve ambiguities, for example to distinguish the term “apple” referring to an edible fruit from the brand name of the technology company. Domain-specific knowledge and the ability to identify relations between semantic concepts are essential to address this problem.
Recent articles published in Knowledge-Based Systems and IEEE Intelligent Systems, two prestigious scientific journals, present webLyzard’s novel approach to contextualization. These articles describe how the underlying methods were developed, point out their advantages over existing approaches, and include detailed evaluation sections based on user-generated online reviews of movies (IMDb), hotels (TripAdvisor), and products (Amazon).
While the 2013 IEEE article focuses on disambiguation and the generation of contextualized sentiment lexicons, the Knowledge-Based Systems article extends the method, using knowledge graph enrichment techniques to integrate external affective resources into the opinion mining process. The 2017 IEEE article increases the granularity of the approach by focusing on specific aspects such as product features (e.g., a digital camera’s resolution) or the perceptions of a specific event (e.g., as part of a sponsorship agreement). Another important extension is emotion detection, using more fine-grained affect models to distinguish a range of human emotions such as Joy, Trust, Anger and Fear.
- Weichselbraun, A., Gindl, S., Fischer, F., Vakulenko, S. and Scharl, A. (2017). “Aspect-Based Extraction and Analysis of Affective Knowledge from Social Media Streams“, IEEE Intelligent Systems, 32(3): 80-88.
- Weichselbraun, A., Gindl, S. and Scharl, A. (2014). “Enriching Semantic Knowledge Bases for Opinion Mining in Big Data Applications”, Knowledge-Based Systems, 69:78-85.
- Weichselbraun, A., Gindl, S. and Scharl, A. (2013). “Extracting and Grounding Contextualized Sentiment Lexicons”, IEEE Intelligent Systems, 28(2): 39-46.
- Scharl, A. and Weichselbraun, A. (2008). “An Automated Approach to Investigating the Online Media Coverage of US Presidential Elections”, Journal of Information Technology & Politics, 5(1): 121-132.