Not logged in.

Contribution Details

Type Bachelor's Thesis
Scope Discipline-based scholarship
Title Financial Correlation Modeling Using News and Social Media Data
Organization Unit
Authors
  • Ceren Satilmis
Supervisors
  • Thorsten Hens
  • Sven Christian Steude
Language
  • English
Institution University of Zurich
Faculty Faculty of Economics, Business Administration and Information Technology
Date 2015
Zusammenfassung This bachelor thesis compares the performance of the different models that measure the relationship between news and stock price correlations. These models are introduced in the master thesis of Martin Castrischer and accordingly this bachelor thesis expands its results. Previous researches investigate the relationship between the news flows and the stock price through a qualitative approach, such as sentiment analysis. Sentiment analysis classifies the news according to the polarity of the words, like good or bad. This approach can bring misstatements along, since it is quite complex to quantify subjective matter. Therefore in the master thesis of Martin Castrischer, a different approach is introduced, which is not based on sentiment analysis or any qualitative matter. The relationship between the news cooccurrences of two firms in the media and the correlation in their stock prices is investigated. The idea is that if these two firms are mentioned in the news together, then their stock prices should also correlate. Co-occurrence, which indicates the number of news items corresponding to the stock pair, is applied as the correlation predictor. In order to investigate this relationship, different model specifications are constructed with different co-occurrence and correlation measures, whose performance is assessed with the method “Area under the receiver operating characteristic curve” (AUC). According to Fawcett (2006) a receiver operating characteristics (ROC) curve is a technique for visualizing, organizing and selecting classifiers based on their performance and AUC is a summary statistic that allows us to obtain a single figure as a measure of the classifiers performance, which is desirable when we are comparing a number of different classification schemes (Bradley(1996)). There are other summary statistics but AUC is the most statistically consistent one and can measure the overall performance of a classifier system better. Therefore we only consider the AUC values of the model specifications for the performance evaluation. In this thesis also the different results of the models are analysed when they are applied on the different datasets, Snews and RNSE, and the reason behind it. Snews dataset contains social media news while RNSE dataset contains news flows provided by Thomson Reuters. The news flows of these datasets have different characteristics. Social media news is much more dynamic, since it allows a wide range of audience an easy and fast access to gain and share information. The term “echo effect” is introduced in this bachelor thesis, which stands for the continuous information sharing and discussing in the social media platforms. On the other hand the news flow of the RNSE dataset doesn’t spread as widely and quickly as the social media news flow. As a result of the different characteristics of these datasets, there is not one overall best performing model but one best performing model for each dataset. The performance evaluation is conducted systematically in five steps. In the first step we find that the co-occurrence measures have predictive information about the stock price correlations. Correlation measures react on co-occurrence measures and not the other way round. The model specifications, which indicate that the co-occurrence measures react on the correlation measures, are eliminated because of their weak performance. In the second step two different co-occurrence measures are compared. It is observed that the co-occurrence measure, which indicates the amount of co-occurrences of the stock pair, performs better than the co-occurrence measure, which implies the difference in co-occurrences of the stock pair between two days. The second measure normalizes the co-occurrence level in an undesired way, which can bring misstatements along since an important part of the data gets lost and the effect of the news co-occurrences decreases. For this reason we look for the best performing model specifications only among the model specifications, which have the co-occurrence measure, that indicates only the amount of co-occurrences. In the third step we observe that for the Thomson Reuters news, we get the most accurate results when we use the actual change in the correlation between two days as the correlation measure. However for the social media news, we have to consider the previous correlation level because of the echo effect. For this reason the model specifications with the relative change in their correlation measures perform better for the Snews dataset. In the fourth step we observe the trend of the AUC values of all remaining model specifications. The observations imply that individuals who make their investment decisions according to the social media news on the Internet can react faster on the news, since they are independent in their choices. However, investors who are considering the RNSE news are more likely to be institutional and professional, whose trading takes more time because of the higher hierarchy in the institutions and also higher amount of money that would be invested. In the fifth and last step we observe that for the Snews dataset it makes more sense to investigate the relationship between the news flow and the correlation volatility, rather than the correlation level. For the RNSE dataset however, the model specifications with the actual change in the correlation level performs better than the ones with the correlation volatility. After eliminating the model specifications by taking the facts mentioned above into consideration, it remains only one model specification for each dataset. For the RNSE dataset the best working model specification is constructed by the amount of the news co-occurrences of the stock pair as the classifier score and the actual change in their correlation level between two days as the binary variable. For the Snews dataset the best performing model specification is constructed by the number of news co-occurrences of the stock pair as the classifier score and absolute value of the relative change in the correlation as the binary variable.
Export BibTeX