Spotting Trends in Financial Markets: Decision Trees vs Neural Networks

Magically Trending Markets

Among human traders, there is something mystical about a trending market.  Fundamental forces, often unidentifiable, conspire to move a market week after week in one direction.  It appears completely non-random.  The Düsseldorf housing market has risen 7% per year for 7 years.  TREND!  Books tell you, “The trend is your friend,” so don’t fight it.  Many technical trading systems focus only on entry and exit points for trend following.  In a strongly falling market, traders caution each other not to “catch a falling knife.”

Trends are easy to spot… in hindsight.  But when they begin, they look like momentary trading opportunities.  German retirees who happily cashed in their apartments five years ago are bitter today.  Wouldn’t it be great if we could train a machine to give us a second opinion on what is a trend and what is short-lived volatility?


Trends and Option-trading

Before jumping into our python code for machine learning, let us consider how trending and volatile markets affect the lowly option trader.  Imagine that you are an option trader, and you built up a big gamma-long position (i.e. you paid cash for calls or puts from another option trader).  You will need to monetize market volatility to earn back the premium that you paid.  This can be hard work.  How you go about monetizing volatility is by hedging your options with an offsetting volume of the underlying security.  If you have the right balance, then whatever direction the market moves, your portfolio gains value.  Sounds too easy?  Well, remember that you already took an upfront loss to be in this advantageous position.

See how selling the right amount of underlying security effectively rotates the value curve into an ever-positive range? This curve does not include the initial payout to buy the option.

If the market moves up, you gain value – secure it by selling more of the underlying security.  If the market moves down, you again gain value – secure it by buying more of the underlying.

But here’s the trick: if the market goes up today and you think it will keep going up tomorrow, you can cheat a little and wait for tomorrow to do your selling.  Or the next day at an even higher price.  Then you would be making a lot more money.  Successfully spotting a trend and being patient would be a lot easier than your usual nickel-and-dime game.  On the other hand, if you believe the market would keep chopping up and down like small waves, you need to actively monetize the volatility at every chance.

To repeat: a trending market and a volatile market are best monetized by the option trader in completely different ways.  It would be advantageous for her to know which market she’s dealing with at any moment.


A Use-case for Machine Learning

Thus, what we’re after is a systematic second opinion for which type of market we’re dealing with.  Here are our 4 market types:

Some machine-learning tools are trained to recognize a stop-sign.  Ours will recognize a trend, a lack of trend, up and down volatility, and lack of volatility.  For a data set I selected the Brent Oil Fund ETF “BNO” that mimics the returns of holding and rolling front month futures contracts.  I have wrestled with this security as an option trader for my personal account.  Here are the daily closes for the last four years:

When you look at this chart, you can see periods that very much resemble our four distinct market states, and there are some periods that are not so clear.  I next did something very manual that would make a data scientist scream: I used Excel, and I visually divided this chart into my four market types.

No day was left unassigned, which meant I had to make some debatable classifications.  That’s okay: markets are messy.  Now I have “correct” market classifications that I can use to train and test some ML (that is, Machine Learning) classifiers.

And this is as good a time as any to briefly digress into a few ML concepts.  ML is about training a model to give an answer (a prediction or a description) about a system that is itself too complex to fully model in a deterministic way.  Perhaps we lack sufficient understanding of the system, or we lack the computing power that we need, or we are just lazy.   If I am forecasting the probability for mass riots for Pittsburgh next week, it would be great if I could model every actor in that system and their psychological and microeconomic state, but that is an impossible calculation.  (Or is it, Facebook?)  It is much easier to pick out the variables, or “features”, that seem to carry the greatest predictive value.  A heat-wave.  A quick decline in purchasing power.  An unwelcome judicial ruling.  A sports championship victory.  These have predictive value for riots.  ML finds the tidiest combination of these predictors.  Next: ML comes in two flavors: classifiers and regressors.  Classifiers assign new instances to pre-defined groups based on some features.  Is it red and octagonal?  Then it’s a stop-sign.  Regressors, on the other hand, give a continuous value answer.  Between 0 and 1, how likely are we to see riots in Pittsburgh next week? Answer: 0.07471.

As you well understood, we’re building market classifiers in the next few paragraphs, not regressors.  For a dataset, you need an output (a named group) for each datapoint and a set of features. For my Brent Crude Oil ETF dataset, I created the output vector myself by hand.  This is the market state.  I created the features from the price data itself, not any external fundamentals.  I didn’t import any data about supply and demand from the EIA or OPEC.  I just made some statistics from the existing price data that you see above.  It is important not to use any features that will obviously lead to overfitting.  If I have a huge price spike on a Thursday, and I use day-of-the-week as a feature, my ML classifier may greedily choose Thursday as price-spike day.  There is nothing magical or explanatory about a Thursday – the price spike just happened to occur on a Thursday.  Overfitting leads to over-optimism and future disappointment.  Hence, I didn’t use any date-related features or features related to absolute price levels.  Rather, I took features such as the 20-day and 50-day standard deviations of day-on-day relative returns, or the number of up-days in the last 20 trading days, or the ratio of the 20-day moving average to the 50-day moving average.  These types of features seemed more universally applicable.

With my outputs and features in hand I went about building two types of classifiers: a decision tree and a neural network.  Decision trees are great because you can see how data is classified based on individual features, applied one at a time.  It’s like watching pachinko balls fall to their destination.  The logic is open for all to discuss.  The downside is that the classification is a rather simple sequence of logic gates.  Below is an example tree depicting survivorship on the Titanic.  The numbers depict survivorship and % of data ending in that “leaf.”  (Note: “sibsp” = number of siblings + spouse.)

For capturing the subtle combinations of features when clear-cut rules seem impossible to hardcode, we turn to neural networks with hidden layers.  (For a nice introduction)

Unfortunately, with a complex topology and an activation function that blends inputs, the neural network becomes a black box.  Much like human traders, you can’t open them up to perform an autopsy every time something goes wrong.



I implemented the decision tree using the scikit-learn library’s decision tree classifier for python.  The neural network was implemented with the keras library and uses a TensorFlow backend.  Both setups employed a pre-shuffled 5-fold cross-validation of my Brent Oil data set.  This means that my time-series of target and feature data was first reordered randomly then split into five equally large groups.  The classifiers were trained on 4/5 of the data and tested on the 1/5 of “new” data.  This was repeated 5 times, until each slice had the chance of being the test set.  A cumulative accuracy % was reported.  You can find the code copied at the end of this article.



In both the Decision Tree and the Neural Network, I train the classifiers on 80% of the data and test the accuracy on the remaining 20%.  With the decision tree I varied the maximum tree depth.  A tree with 3 levels is a lot simpler to understand and less overfit than a tree of 15 levels.

Here we can see that our decision tree accuracy tops out in the low 90% with the feature set I provided.  It’s important to note that about 40% of the test data are trending days and 40% are volatile days.  The remaining 20% are either trending and volatile or just dead calm.  This means that if I had a classifier that only gave Trending as an answer, it would already be 40% accurate.  Let’s call this the Lazy Classifier.  We must significantly outperform the Lazy Classifier.  Happily, with only 3 layers in my tree, I can achieve over 70% accuracy, and the structure is still quite simple, even readable.

With the neural network, I varied two complexity parameters – the number of hidden layers and the neurons per hidden layer – and observed the following accuracy for classifying the testing set.

Interestingly, we do not yet approach the top accuracy of the decision tree classifier with 4 hidden layers and 16 neurons per layer.  I find this interesting because the possibility to simultaneously combine feature inputs (vs sequential application of individual features in decision trees) does not seem to earn us any gain in accuracy.  However, with enough hidden layers and neurons we should be able to match the accuracy of the overfit decision tree.  This is because the neural network will be able to at least reproduce the logical flow of the decision tree.  I have even increased the hidden layers to 10 and the neurons per layer to 32, but with no improvement.  I also modulated the “epochs” from low to high (forcing underfit-ness and overfit-ness with the given topology) but still without matching the decision tree’s accuracy.  Apparently, with this specific problem and these features, the decision tree classifier is much faster and moderately more accurate.  In the future, I would like to test a wider sample of commodity markets.

Thanks for reading.  If you are interested in articles on energy portfolio management, you can follow me on LinkedIn and visit my company website  I also post articles related to optimization and options with python on and organize a courses for all professionals (from new hires to experienced executives) involved in making strategic commercial decisions.

Decision Tree Python Implementation

import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score, KFold

# fix random seed
seed = 7

marketData = pd.read_csv(“market-state_noLevels.csv”)
marketData = marketData.dropna()

features = marketData[marketData.columns[4:]]
feature_names = list(features)
targetVariable = marketData.State

model = DecisionTreeClassifier(criterion = “entropy”, max_depth=3)
# use scikit’s k-fold cross validation for model evaluation
kfold = KFold(n_splits=5, shuffle=True, random_state=seed)
results = cross_val_score(model, features, targetVariable, cv=kfold)
print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))


Neural Network Python Implementation

import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score, KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

# fix random seed
seed = 7

#load dataset
marketData = pd.read_csv(“market-state_noLevels.csv”)
marketData = marketData.dropna()

X = marketData[marketData.columns[4:]]
feature_names = list(X)
Y = marketData.State

# encode class values as integers
encoder = LabelEncoder()
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

# network topology:
# 13 inputs -> [8 hidden nodes] -> 4 outputs (softmax activation: [0,1] highest output is the answer)

# define baseline model
def baseline_model():
model = Sequential()
model.add(Dense(8, input_dim=13, activation=‘tanh’))
model.add(Dense(8, activation=‘tanh’))
model.add(Dense(4, activation=‘softmax’))
# Compile Model
model.compile(loss=‘categorical_crossentropy’, optimizer=‘adam’, metrics=[‘accuracy’])
return model

estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

# use scikit’s k-fold cross validation for model evaluation
kfold = KFold(n_splits=5, shuffle=True, random_state=seed)

results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print(“Baseline: %.2f%% (%.2f%%)” % (results.mean()*100, results.std()*100))