Ultra-Fast Time Series Forecasting Using Extreme Learning Machines

Creating a Multi-Step Forecasting Machine Learning Model

Jun 23, 2025

This article presents a machine learning model known as the Extreme Learning Machine (ELM) algorithm. It explains the intuition and how the model is structured, then it creates a multi-step forecasting model in order to predict using the model’s inputs.

Multi-step forecasting is a technique to project multiple values to the future, thus creating a trajectory. It uses previous forecasts as inputs to forecast new values. Naturally, this comes with a great bias, but it remains interesting to see it in action.

Intuition of the Model

The extreme learning machine (ELM) is a type of single-layer feedforward neural network (SLFN) designed for fast learning and simplicity. Unlike traditional neural networks, where all weights are adjusted during training via iterative optimization (like backpropagation), ELMs take a radically different approach: only the output weights are trained, while the input weights and biases of the hidden layer are randomly assigned and fixed.

This design leads to extremely fast training times and a surprisingly strong ability to generalize, especially on classification and regression tasks with relatively small or medium-sized datasets.

The biggest strength of the ELM is speed. Because there's no iterative weight updating, training is often thousands of times faster than traditional backpropagation-based methods. This makes ELM ideal for real-time learning, online updates, and applications with limited computational resources. Despite its simplicity, ELMs often achieve accuracy comparable to support vector machines (SVMs), multilayer perceptrons (MLPs), and even deep learning models in some cases.

However, ELMs come with trade-offs. Since the hidden layer is randomly initialized and never updated, performance can vary from run to run, especially if the number of hidden neurons is not large enough. This randomization can lead to instability or overfitting, particularly in high-dimensional or noisy datasets. Also, because the method relies on computing a matrix pseudoinverse, it can become memory-intensive or numerically unstable if the hidden layer is very large or the dataset is huge. Another limitation is that ELMs lack the internal feature learning that deeper networks provide, making them less suited for tasks like image or language modeling where hierarchical feature extraction is crucial.

The Mackey-Glass system is a classic time-delay differential equation used to model nonlinear, chaotic dynamics. It was originally proposed to simulate physiological processes, like blood production, but it's now widely used as a benchmark in time series prediction and nonlinear systems analysis. We will use it to create a time series.

Do you want to master Deep Learning techniques tailored for time series, trading, and market analysis🔥? My book breaks it all down from basic machine learning to complex multi-period LSTM forecasting while going through concepts such as fractional differentiation and forecasting thresholds. Get your copy here 📖!

Deep Learning for Finance

Creating a Multi-Step Forecasting Algorithm

The plan of attack will be as follows:

Generate synthetic time series (Mackey-Glass).
Define a simple ELM.
Train it on past values to predict the next one.
Plot the prediction.

Use the following code to implement the experiment:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

# Mackey-Glass time series
def mackey_glass(length=2500, tau=17, delta_t=1.0, beta=0.2, gamma=0.1, n=10):
    history = [1.2] * (tau + 1)
    for t in range(length):
        xtau = history[-tau] if len(history) > tau else 0
        x = history[-1]
        x_dot = beta * xtau / (1 + xtau ** n) - gamma * x
        history.append(x + delta_t * x_dot)
    return np.array(history[tau+1:])
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Extreme Learning Machine class
class ELM:
    def __init__(self, n_inputs, n_hidden, activation=sigmoid):
        self.n_inputs = n_inputs
        self.n_hidden = n_hidden
        self.activation = activation
        self.input_weights = np.random.randn(n_hidden, n_inputs)
        self.biases = np.random.randn(n_hidden)
        self.output_weights = None

    def _hidden_layer(self, X):
        G = self.activation(np.dot(X, self.input_weights.T) + self.biases)
        return G

    def fit(self, X, y):
        H = self._hidden_layer(X)
        self.output_weights = np.linalg.pinv(H).dot(y)

    def predict(self, X):
        H = self._hidden_layer(X)
        return H.dot(self.output_weights)
    
# Generate and normalize data
data = mackey_glass()
data = (data - np.min(data)) / (np.max(data) - np.min(data))

# Prepare windowed data
window_size = 100
X = np.array([data[i:i + window_size] for i in range(len(data) - window_size - 1)])
y = np.array([data[i + window_size] for i in range(len(data) - window_size - 1)])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train ELM
wnn = ELM(n_inputs=window_size, n_hidden=100)
wnn.fit(X_train, y_train)

# Multi-step Forecast
n_steps = 100  # how many future steps to predict
init_window = X_test[0].copy()  # first test window

forecast = []
window = init_window.copy()

for _ in range(n_steps):
    pred = wnn.predict(window.reshape(1, -1))[0]
    forecast.append(pred)
    window = np.roll(window, -1)  # shift left
    window[-1] = pred             # append prediction

# Ground truth (future values)
ground_truth = y_test[1:n_steps + 1]

# Plot
plt.figure(figsize=(10, 5))
plt.plot(range(n_steps), ground_truth, label="True Future")
plt.plot(range(n_steps), forecast, label="ELM Forecast")
plt.title("Wavelet Neural Network - Recursive Multi-Step Forecast")
plt.xlabel("Time Steps Ahead")
plt.ylabel("Normalized Value")
plt.legend()
plt.show()

The following is the result of the code:

In essence, the ELM is a blunt but effective tool. It works remarkably well when speed matters and when the problem fits the SLFN framework. It sidesteps the complexity of training deep networks by relying on randomness and linear algebra, offering a compelling alternative for many practical tasks—so long as you understand its strengths, its blind spots, and its boundaries.

Every week, I analyze positioning, sentiment, and market structure. Curious what hedge funds, retail, and smart money are doing each week? Then join hundreds of readers here in the Weekly Market Sentiment Report 📜 and stay ahead of the game through chart forecasts, sentiment analysis, volatility diagnosis, and seasonality charts.

Free trial available🆓

All About Trading!

Discussion about this post