Charting The Future of Forecasting: The Sentiment Indicator Advantage
Running an XGBoost Regression on Sentiment Data
In the dynamic landscape of financial markets, understanding market sentiment is paramount for making informed investment decisions. This article delves into the realm of sentiment analysis and its application in forecasting the daily returns of financial assets. By harnessing the power of sentiment data, we aim to uncover hidden patterns and insights that can provide traders and investors with a valuable edge in navigating the foreign exchange market’s fluctuations.
What is Sentiment Analysis?
Market sentiment analysis is a specialized application of sentiment analysis that focuses on evaluating and interpreting sentiment, opinions, and emotions expressed in financial data, news, and social media to gain insights into financial markets and make informed investment decisions. This analysis aims to gauge the collective sentiment of investors and traders and its potential impact on asset prices and market movements.
Using historical sentiment data as inputs in a regression task with the aim to forecast the return of the underlying asset is an interesting experiment. For this task we will use XGBoost regression on sentiment data (inputs) to forecast the returns of CADUSD (outputs).
XGBoost regression is a powerful and versatile machine learning algorithm used for regression tasks. It belongs to the gradient boosting family of algorithms, which iteratively builds an ensemble of decision trees to make highly accurate predictions. XGBoost enhances this approach by optimizing for both computational efficiency and predictive performance. XGBoost is renowned for its ability to handle complex, high-dimensional data, and its success in various domains, from finance to healthcare, making it a popular choice for predictive modeling and forecasting tasks.
The following chart shows the price of CADUSD in the first panel and a custom sentiment data on the CAD.
The correlation between the two time series is 0.32 which is positive albeit not very strong.
Merry christmas! This week’s sentiment report is open to all subscribers. Many new opportunities inside and especially a new view on equities! Check them out!🎄🎄🎄
Creating the Algorithm
The plan of attack for the algorithm is as follows:
Download the historical data of CADUSD and its sentiment indicator from here.
Import the data to Python, preprocess it (by differencing it), and splitting it into training and test sets.
Fit the model and predict the values.
Evaluate the results.
Use the following code to run the experiment (you must pip install xgboost first):
from xgboost import XGBRegressor
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_excel('CADUSD_Sentiment.xlsx')
# Transforming the File to an Array
data['Sentiment'] = data['Sentiment'].shift(5)
data = data.dropna()
data = data.diff()
data = np.array(data)
data = data[1:, ]
from statsmodels.tsa.stattools import adfuller
print('p-value: %f' % adfuller(data[:, 1])[1])
def data_preprocessing(data, train_test_split):
# Split the data into training and testing sets
split_index = int(train_test_split * len(data))
x_train = data[:split_index, 1]
y_train = data[:split_index, 0]
x_test = data[split_index:, 1]
y_test = data[split_index:, 0]
return x_train, y_train, x_test, y_test
x_train, y_train, x_test, y_test = data_preprocessing(data, 0.70)
# Create the model
model = XGBRegressor(random_state = 0, n_estimators = 100, max_depth = 100)
# Fit the model to the data
model.fit(np.reshape(x_train, (-1, 1)), y_train)
y_pred = model.predict(x_test)
# Plotting
plt.plot(y_pred[-100:], label='Predicted Data', linestyle='--', marker = '.', color = 'red')
plt.plot(y_test[-100:], label='True Data', marker = '.', alpha = 0.7, color = 'blue')
plt.legend()
plt.grid()
plt.axhline(y = 0, color = 'black', linestyle = '--')
same_sign_count = np.sum(np.sign(y_pred) == np.sign(y_test)) / len(y_test) * 100
print('Hit Ratio XGBoost = ', same_sign_count, '%')
Note that the inputs are the 5-day lagged values of the sentiment indicator.
The following chart compares predicted versus true data in the test sample. Values greater than zero refer to bullish days and values below zero refer to bearish days.
The following is the output of the code:
Hit Ratio XGBoost = 54.23 %
For a model that is not optimized, it performs relatively well on such a chaotic time series that is the daily returns of CADUSD. You can add more features, optimize the lag, and even add technical indicators.