Shuykin S. A., Ballod B. A., Koltsova E. A.

Ivanovo State Power Engineering University, Russia

Automation Method Analysis of Sentiment in Social Media Messages

 

         The article considers one of the aspects of text analysis of natural language, namely, sentiment analysis. The study findings have a wide range of possible applications in different spheres, for instance in marketing, sociology, politics, etc. In these areas, it is important to have an idea about the mood of customers / society / electorate and their attitude to certain events or phenomena in social media. Besides, there exists an urgent need to increase the accuracy and rate of message processing as the mood and opinions of social network users are constantly changing. The results of the undertaken research could allow following and controlling these opinions.

         Thus, the present paper aims to introduce the new methodology for sentiment analysis which is able to increase the speed and accuracy of data processing.

         To begin with, let us have a look at the methods and models employed for sentimental processing. The currently existing models of data processing have proved to be unsuitable and inefficient, not meeting modern requirements. At present, all existing models of sentimental determination can be divided into 2 categories.

         The first category includes methods for vector analysis of text. The text is represented as a vector of words or their combinations (n-grams). A number of algorithms such as SVM (support vector method), Naive Bayesian Classifier, decision trees and some others are used for this purpose.

         The second category refers to the search for emotive vocabulary (words that are responsible for the overall tonality of the text) according to pre-compiled dictionaries.

         For the primary goal to be achieved, a number of methods have been applied like tokenisation, TF-IDF analysis and machine learning. These methods demonstrate the best ratio of accuracy and speed that were the key aspects for data processing and proved to be timesaving. The system developed was tested on posts in Vk.com social media. The tests were performed and the conclusion about emotional message of those text was made whether they are positive, negative or neutral.

         Turning to the analysis itself, the main steps and terms are to be distinguished. The input is text from social network. The first step is separation of lexemes and determining their properties such as part of speech and morphological features. Then, stop-words are separated that is those not causing evaluation of the overall tone. The second step is the evaluation of the tone by the TF-IDF method.

where Vt,d  is the width of the word t in a post d; Ñt,d  is a count of the word t occurring in a post d; |P| is a count of the posts with positive tone; |N| - a count of the posts with negative tone; Pt – a count of positive posts where t word occurs and Nt is a count of the negative posts where t word occurs.

         Thus, most frequent elements in positive and negative texts are determined. To function correctly, the initial stage requires manual posts evaluation for algorithm verification.

         The next stage is constructing a sentence structure in the form of a hierarchy of binary relationships based on semantic rules.

good

 

Fig. 1. An example of estimating the tonality of binary links.

         The example above shows that the word "very" reinforces the positive significance of the word "good" and "no" inverts the design behind it, thereby establishing a negative tone.

         The program was implemented in Python programming language using Django framework for the frontend.

         As a result, the information system that automates the work on analytical text processing to determine the key element has been developed. This development is aimed at improving the existing methods of processing in order to increase the efficiency. Increasing efficiency means reducing the time for processing the message, and, therefore, affecting the timeliness of the submission of the required data. In the future it is planned to introduce the developed method into the social media monitoring system.