Today, sentiment analysis is a topic of great interest and development since it has many practical applications.

Considering publicly and privately available information over the Internet is constantly growing, a large number of texts expressing opinions are available in forums, blogs, review sites and social media channels.

The increasing need of extracting subjective information from written texts leads brands to analyse sentiment and viewpoints, just to get customer information related to feelings, attitudes, emotions and opinions of existing and potential buyers towards their products or services.

What is Sentiment Analysis?

Sentiment Analysis is the process of determining whether a piece of content is positive, negative or neutral in tonality.

It’s based on algorithms evaluating whether the words included in a post are related to positive, negative or neutral emotions and combines natural language processing (NLP) and machine learning techniques.
For specific use cases that require 100% accuracy, Sensika (and other solutions for media monitoring) allow manual sentiment evaluation, as well.

Usually, besides identifying the opinion, these systems extract attributes of the expression for example:

  • Polarity: if the speaker expresses a positive or negative opinion.
  • Subject: the thing that is being talked about.
  • Opinion holder: the person, or entity that expresses the opinion.

How does it work?

There are many ways that people analyse bodies of text for sentiment or opinions, but it usually boils down to two methods.

Using Natural Language Processing (NLP), and the attempt to truly “understand” the text.
This model attempts to have the machine actually understand the structure of the sentences and the context. Therefore, it is more focused on the succession of a string of words. Usually, this approach requires the machine to have an understanding of grammar principles. To do this, NLP techniques are being used to tag parts of speech, named entities and more, in order to actually understand the “language” of the text, and not just look for target words.

In Arabic, the letters are written in a different way depending on whether they are at the beginning, at the middle or at the end of the word.

This method works quite well for Western languages but unfortunately, it is not very useful when it comes to languages such as Arabic, Farsi, Urdu, Hindi etc. The challenge comes from the specific symbols of their alphabets, their word formation and different dialects that basically don’t follow any written rules.

Although still challenging, the next method has proven to be more successful in the sentiment evaluation of texts written in Arabic.

“Bag of Words” model.
This model focuses completely on the words, or sometimes a string of words, but usually pays no attention to the “context” so-to-speak. The “Bag of Words” model usually has a large list, probably better thought of as a sort of a “dictionary”, which is considered to be a set of words that carry the sentiment. These words each have their own “value” when found in the text. The values are typically all added up and the result is a sentiment valuation. The equation to add and derive a number can vary, but this model mainly focuses on the words and makes no attempt to actually understand language fundamentals.

Most of the current thinking in sentiment analysis happens in a categorical framework: sentiment is analysed as belonging to a certain bucket, to a certain degree. For example, a given sentence may be 23% sad, 89% excited, 45% happy and 55% anxious. These numbers don’t add up to 100.

To read between the lines – the biggest challenge.

“My plain has been delayed. Brilliant!”

Most of us would be able to quickly interpret that the person was being sarcastic and to categorise the sentence as negative. But without contextual understanding, the machine will detect the word “Brilliant” and will categorise the sentence as positive.

Most highly machine-trained systems can reach 80% accuracy on average. But the sentiment analysis is not a perfect science. It can’t fully understand the complexities of human language and it definitely can’t be categorised simply as positive, negative or neutral.

Usually, we are good at judging sentiment in a given context. Detection of feelings like irony, sarcasm, scepticism, anxiety, that depends on a simple thing as the tone of one’s voice, is a complex task, even for some humans, but for the machines it is impossible.

Subjectivity in sentiment analysis.

Did you know that human sentiment accuracy is not 100% either? Sentiment basically refers to the contextual polarity of a text or a document, meaning, the emotional effect the text or the document has on the reader. It also indicates the attitude of the author on the subject.
Feelings are inherently subjective, you and I may interpret the attitude of the same text differently. It depends on the personal morals, values and beliefs of a person.

Have a look at the example below:

If your analysis is meant to be in favor of the Democratic Party what would be the sentiment? The sentiment of this article would be different for the Republicans, right?

Probably the biggest problem in this analysis is subjectivity. Someday highly machine-trained systems may reach 99% accuracy of the text dissection. But even then, they will never be able to predict the personal biases based on gender, race and political views of the reader.

In conclusion.

Sentiment analysis is very useful in Social media marketing and PR because it helps in forecasting trends and gathering public opinion about products, services or situations.

Implemented for specific use cases, both methods – NLP and “Bag of words” – have their advantages and disadvantages. Our advice is when choosing a software solution for sentiment analysis you should always have the option for manual scoring. With Sensika you can always manually change all automated sentiment – so you can rest easy knowing that you’ll always have the final say.