Not logged in.

Contribution Details

Type Master's Thesis
Scope Discipline-based scholarship
Title Big Data: Twitter sentiment as stock market predictor
Organization Unit
Authors
  • Alexandre Munday
Supervisors
  • Michel Habib
  • Jacqueline Haverals
Language
  • English
Institution University of Zurich
Faculty Faculty of Business, Economics and Informatics
Number of Pages 71
Date 2018
Abstract Text This thesis investigates whether there is any valuable information in population mood shared on Twitter. Furthermore, we tested for relationships between sentiment contained in Tweets and the S&P 500 index returns. Rational investors in efficient markets are the foundation of traditional finance. Sentiment influencing investing behaviour is a violation of this and therefore should not exist. Moreover, if such sentiment would be observable, according to the traditional finance hypothesis, rational arbitrageurs would eliminate the effect. However, at multiple times over the years, such anomalies have been observed. These observations led to an evolution of the traditional finance field to include psychological and sociological aspect to understand empirically observed behaviour (Sewell 2007). Behavioural finance found evidence that these behavioural biases do exist, and people do not act fully rationally, and limits to arbitrage exist. Early research on stock markets state that markets follow a random walk pattern and the Efficient Market Hypothesis (Fama 1965) (Cootner 1964). The main hypothesis of the Efficient Market Hypothesis is that the prices of the different stocks react to the arrival of new information and not to the historical prices of the stocks. According to this theory the stock market will follow a random walk, because news are unpredictable, and we will not be able to predict the market with a higher accuracy than 50 percent (Qian & Rasheed 2007). More recent studies showed us the limitation of the Efficient Market Hypothesis. Many researchers show that the stock market prices do not follow a random walk and can be predicted with a precision higher than 50 percent (Qian & Rasheed 2007) (Gallagher & Taylor 2002). Other studies proved that obviously we couldn’t predict news, without committing insider trading, but that we can extract early indicators using social media like Twitter. Those early indicators may be used to predict changes in different financial and commercial indicators. If we extend the hypothesis to the stock market price, we think the population mood and public mood can be as important as news. We know from psychological research that emotions, added to information, play an important role in human decision-making (Damasio 1994) (Dolan 2002). Behavioural finance has proved that emotion and mood significantly influence the stock market prices (Nofsinger2005). We can assume therefore that sentiment and public mood can drive the market with the same strength as news, as maintained by Gilbert and Karahalios (2010). Over the last years various algorithms in sentiment analysis have been written; they are particularly efficient at extracting public mood directly from social media content and more specifically from Tweets (Abbasi, Hassan& Dhar2014). The goal of this master’s thesis is to investigate whether or not public sentiment, measured as mood by sentiment algorithm, influences stock market returns. The main purpose of this study is to investigate whether or not there is a link between how people feel and the evolution of stock markets in the United States. We have collected data continuously from Thursday, 19 March 2015 midnight UTC until Tuesday, 31 January 2017, and thus have 684 days of data. We chose as an output all the different categories of Tweets namely positive, negative, and neutral Tweets. We computed the returns of the S&P 500 and adjusted closing values obtained from Yahoo! Finance. The index offers a broad view of the financial health of the USA. We generated the positive variable (Pos) by taking the mean per minute of positive Tweets for a given day. The idea behind the mean was not to be too influenced by outliers. The negative sentiment (Neg) was generated the same way. We created the reactional (Reac) variable by adding both the positive and negative Tweets and dividing by the total number of Tweets. We used Vector Autoregressive Regression (VAR) approach for multiple reasons. It is suitable for time series; this model allows us to use several endogenous variables, and it will also permit us to investigate feedback effects. An essential need for our prediction is the possibility that the value of a variable may depend on its own lag and the lags of the other variables in the model. The results of this study from section 10 show no significant relation of investment sentiment, proxied by Tweet sentiment, and stock market returns based on S&P 500 data. Through test of causality and Impulse Response Function, the results are not identified as mood affecting the stock market returns. Additionally, the results from the whole period showed no significant relation between investor sentiment and stock market returns on the same day and the following days. Several factors explain why we could not find any effect on the stock market return. First of all, market actors might have included the bias in their investment models following the increased publicity and access to social media mood indicators, like the Twitter Happiness Index. Secondly, as described in section 2, investors learn from their mistakes. Thirdly, the analysis may fail to identify an effect following the other research cited in the study due to social media users becoming decreasingly representative investors. Additionally, the demographic distribution of social media users was found to have shifted toward lower income and less educated households, as mentioned in section 5. That could be a reason why Twitter as mood indicator became less representative for investors. What really surprised us is the little difference between the models using the sum of Tweets (models one and two) and the models using the percentages of Tweets (models three and four). We could have expected more accuracy in the percentages models because they would be less affected by bias discussed above. Lastly, most similar studies have been conducted with data from 2008 or soon after that, which was just after the financial crisis. It is possible that the findings of Bollen et al. (2011) are correlated with the excessive stress of the market. It is also possible that the time frame of their data set was too short to obtain a representative effect and with more data they would arrive at the same conclusion as we do. The robustness of our results is confirmed by the same results that we obtain through the Granger causality and the Impulse Response Functions.
Export BibTeX