Use CatBoost to Predict Your Time Series
Creating an Ensemble Model to Predict RSI Values in Python
This article aims to provide a systematic exploration of the amalgamation of Catboost and technical indicators, shedding light on its potential applications for traders and analysts. We will delve into the technical intricacies of Catboost, elucidating its practical implementation in financial engineering.
CatBoost Algorithm and the Relative Strength Index
Catboost, short for “categorical boosting,” stands out in the realm of machine learning algorithms due to its specialized focus on categorical features. Developed by Yandex, it employs a gradient boosting framework, making it particularly effective for tasks like classification and regression. Unlike traditional gradient boosting algorithms, Catboost requires minimal pre-processing of categorical data, easing the burden on practitioners.
The algorithm excels in handling complex datasets, automatically dealing with categorical variables without the need for extensive manual encoding. This feature makes Catboost an attractive choice for financial engineers and data scientists navigating datasets rich in both numerical and categorical information. As we explore further, we’ll uncover how Catboost can be harnessed to predict technical indicators, providing a data-driven approach to market analysis.
On the other hand, the Relative Strength Index (RSI) is a momentum oscillator that has been a staple in technical analysis for decades. Developed by J. Welles Wilder, the RSI is designed to measure the speed and change of price movements, helping traders identify overbought or oversold conditions in a market. RSI values range from 0 to 100, with readings above 70 indicating potentially overbought conditions and readings below 30 signaling potential oversold conditions.
Traders often use the RSI to spot potential trend reversals and to confirm the strength of a prevailing trend. It provides a quantitative measure of the recent price performance of an asset, offering insights into whether it might be due for a correction or continuation.
The aim of the article is therefore to use the power of CatBoost to predict the next change of the RSI. By predicting this with accuracy, we may be able to predict the next change in the underlying price of the security.
You can also check out my other newsletter The Weekly Market Sentiment Report that sends tactical directional views every weekend to highlight the important trading opportunities using a mix between sentiment analysis (COT reports, Put-Call ratio, Gamma exposure index, etc.) and technical analysis.
Creating the Algorithm
Now, it’s time to create the algorithm that will predict the next RSI change in values. The framework is as follows:
Download a sample of RSI values from here.
Preprocess, split, and fit the data using a lag of 50.
Predict and evaluate the data on the test set.
Use the following code to create the algorithm:
import numpy as np
from catboost import CatBoostRegressor
import matplotlib.pyplot as plt
import pandas as pd
from statsmodels.tsa.stattools import adfuller
def data_preprocessing(data, num_lags, train_test_split):
# Prepare the data for training
x = []
y = []
for i in range(len(data) - num_lags):
x.append(data[i:i + num_lags])
y.append(data[i+ num_lags])
# Convert the data to numpy arrays
x = np.array(x)
y = np.array(y)
# Split the data into training and testing sets
split_index = int(train_test_split * len(x))
x_train = x[:split_index]
y_train = y[:split_index]
x_test = x[split_index:]
y_test = y[split_index:]
return x_train, y_train, x_test, y_test
# Set the time index if it's not already set
data = pd.read_excel('RSI_Values.xlsx').values
data = np.reshape(data, (-1))
data = np.diff(data)
x_train, y_train, x_test, y_test = data_preprocessing(data, 50, 0.9)
# Create a CatBoostRegressor model
model = CatBoostRegressor(iterations = 100, learning_rate = 0.1, depth = 6, loss_function = 'RMSE')
# Fit the model to the data
model.fit(x_train, y_train)
# Predict on the same data used for training
y_pred = model.predict(x_test) # Use X, not X_new for prediction
# Plot the original sine wave and the predicted values
plt.plot(y_pred[-50:], label='Predicted Data', linestyle='--', marker = 'o')
plt.plot(y_test[-50:], label='True Data', marker = 'o', alpha = 0.7)
plt.legend()
plt.grid()
plt.axhline(y = 0, color = 'black', linestyle = '--')
import math
from sklearn.metrics import mean_squared_error
rmse_test = math.sqrt(mean_squared_error(y_pred, y_test))
print(f"RMSE of Test: {rmse_test}")
same_sign_count = np.sum(np.sign(y_pred) == np.sign(y_test)) / len(y_test) * 100
print('Hit Ratio = ', same_sign_count, '%')
The following shows the predicted values compared to the real values:
The results are as follows:
RMSE of Test: 4.17
Hit Ratio = 65.23 %
Note that the RSI analyzed in this study has been calculated using a simple moving average instead of a smoothed moving average. This small distinction is important as both do not behave exactly the same way but remain positively correlated to the underlying security.
You can also check out my other newsletter The Weekly Market Analysis Report that sends tactical directional views every weekend to highlight the important trading opportunities using technical analysis that stem from modern indicators. The newsletter is free.
If you liked this article, do not hesitate to like and comment, to further the discussion!