A Primer to Logistic Regression for Time Series Analysis
Logistic Regression to the Rescue in Simple Time Series Analysis
Logistic regression is one of the simplest and most widely used classification algorithms in machine learning. Despite the name, it’s a classification method—not a regression technique.
If you're trying to predict a binary outcome (yes/no, 0/1, spam/not spam), logistic regression is often where you start.
What Is Logistic Regression?
At its core, logistic regression estimates the probability that a data point belongs to a particular class. It uses a logistic (sigmoid) function to squeeze output values between 0 and 1.
Mathematically, it models the log-odds of the outcome as a linear combination of input features:
Where:
p is the probability of the positive class (e.g., spam = 1).
β’s are the model’s coefficients.
Typically we use logistic regression when:
The target variable is binary or categorical.
We want a fast, interpretable model.
The data is mostly linearly separable.
Logistic Regression in Python
Let’s classify whether someone buys a product based on age and income using scikit-learn.
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# Sample data
data = pd.DataFrame({
'age': [22, 25, 47, 52, 46, 56, 55, 60],
'income': [25000, 32000, 47000, 52000, 46000, 60000, 58000, 62000],
'bought': [0, 0, 1, 1, 1, 1, 1, 1]
})
X = data[['age', 'income']]
y = data['bought']
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
# Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Predictions
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
You’ll get output with precision, recall, and F1 score—standard metrics to evaluate classifiers. While we’re on this topic, what are precision, recall, and F1 score in layman’s terms? For this let’s take another example. Consider a binary output of spam or not spam.
Precision = How many of the things you flagged were actually correct?. "When I say it's spam, how often am I right?".
Recall = How many of the actual spam emails did you catch?. "Of all the real spam, how much did I find?".
F1 Score = A balance between precision and recall. "How good is my model overall, balancing being right and not missing stuff?"
F1 combines precision and recall into a single score (using the harmonic mean). It’s useful when you care about both:
Not missing spam (recall).
Not annoying users by flagging legit emails (precision).
Logistic regression is a go-to algorithm when you need something fast, simple, and interpretable for binary classification. It’s not the most powerful tool in the box, but it’s often good enough, and a great baseline for comparison.
Use it when:
Your data is clean and not too complex.
You need to explain the model’s decisions.
You want a quick, reasonably accurate model for classification.
Avoid it when:
The relationship between inputs and outputs is non-linear.
You have lots of irrelevant or highly correlated features.
You need top-tier accuracy on complex tasks.
Check out my newsletter that sends weekly directional views every weekend to highlight the important trading opportunities using a mix between sentiment analysis (COT report, put-call ratio, etc.) and rules-based technical analysis.