Magically Trending Markets

Among human traders, there is something mystical about a trending market.  Fundamental forces, often unidentifiable, conspire to move a market week after week in one direction.  It appears completely non-random.  The Düsseldorf housing market has risen 7% per year for 7 years.  TREND!  Books tell you, “The trend is your friend,” so don’t fight it.  Many technical trading systems focus only on entry and exit points for trend following.  In a strongly falling market, traders caution each other not to “catch a falling knife.”

Trends are easy to spot… in hindsight.  But when they begin, they look like momentary trading opportunities.  German retirees who happily cashed in their apartments five years ago are bitter today.  Wouldn’t it be great if we could train a machine to give us a second opinion on what is a trend and what is short-lived volatility?

Trends and Option-trading

Before jumping into our python code for machine learning, let us consider how trending and volatile markets affect the lowly option trader.  Imagine that you are an option trader, and you built up a big gamma-long position (i.e. you paid cash for calls or puts from another option trader).  You will need to monetize market volatility to earn back the premium that you paid.  This can be hard work.  How you go about monetizing volatility is by hedging your options with an offsetting volume of the underlying security.  If you have the right balance, then whatever direction the market moves, your portfolio gains value.  Sounds too easy?  Well, remember that you already took an upfront loss to be in this advantageous position.

See how selling the right amount of underlying security effectively rotates the value curve into an ever-positive range? This curve does not include the initial payout to buy the option.

If the market moves up, you gain value – secure it by selling more of the underlying security.  If the market moves down, you again gain value – secure it by buying more of the underlying.

But here’s the trick: if the market goes up today and you think it will keep going up tomorrow, you can cheat a little and wait for tomorrow to do your selling.  Or the next day at an even higher price.  Then you would be making a lot more money.  Successfully spotting a trend and being patient would be a lot easier than your usual nickel-and-dime game.  On the other hand, if you believe the market would keep chopping up and down like small waves, you need to actively monetize the volatility at every chance.

To repeat: a trending market and a volatile market are best monetized by the option trader in completely different ways.  It would be advantageous for her to know which market she’s dealing with at any moment.

A Use-case for Machine Learning

Thus, what we’re after is a systematic second opinion for which type of market we’re dealing with.  Here are our 4 market types:

Some machine-learning tools are trained to recognize a stop-sign.  Ours will recognize a trend, a lack of trend, up and down volatility, and lack of volatility.  For a data set I selected the Brent Oil Fund ETF “BNO” that mimics the returns of holding and rolling front month futures contracts.  I have wrestled with this security as an option trader for my personal account.  Here are the daily closes for the last four years:

When you look at this chart, you can see periods that very much resemble our four distinct market states, and there are some periods that are not so clear.  I next did something very manual that would make a data scientist scream: I used Excel, and I visually divided this chart into my four market types.

No day was left unassigned, which meant I had to make some debatable classifications.  That’s okay: markets are messy.  Now I have “correct” market classifications that I can use to train and test some ML (that is, Machine Learning) classifiers.

And this is as good a time as any to briefly digress into a few ML concepts.  ML is about training a model to give an answer (a prediction or a description) about a system that is itself too complex to fully model in a deterministic way.  Perhaps we lack sufficient understanding of the system, or we lack the computing power that we need, or we are just lazy.   If I am forecasting the probability for mass riots for Pittsburgh next week, it would be great if I could model every actor in that system and their psychological and microeconomic state, but that is an impossible calculation.  (Or is it, Facebook?)  It is much easier to pick out the variables, or “features”, that seem to carry the greatest predictive value.  A heat-wave.  A quick decline in purchasing power.  An unwelcome judicial ruling.  A sports championship victory.  These have predictive value for riots.  ML finds the tidiest combination of these predictors.  Next: ML comes in two flavors: classifiers and regressors.  Classifiers assign new instances to pre-defined groups based on some features.  Is it red and octagonal?  Then it’s a stop-sign.  Regressors, on the other hand, give a continuous value answer.  Between 0 and 1, how likely are we to see riots in Pittsburgh next week? Answer: 0.07471.

As you well understood, we’re building market classifiers in the next few paragraphs, not regressors.  For a dataset, you need an output (a named group) for each datapoint and a set of features. For my Brent Crude Oil ETF dataset, I created the output vector myself by hand.  This is the market state.  I created the features from the price data itself, not any external fundamentals.  I didn’t import any data about supply and demand from the EIA or OPEC.  I just made some statistics from the existing price data that you see above.  It is important not to use any features that will obviously lead to overfitting.  If I have a huge price spike on a Thursday, and I use day-of-the-week as a feature, my ML classifier may greedily choose Thursday as price-spike day.  There is nothing magical or explanatory about a Thursday – the price spike just happened to occur on a Thursday.  Overfitting leads to over-optimism and future disappointment.  Hence, I didn’t use any date-related features or features related to absolute price levels.  Rather, I took features such as the 20-day and 50-day standard deviations of day-on-day relative returns, or the number of up-days in the last 20 trading days, or the ratio of the 20-day moving average to the 50-day moving average.  These types of features seemed more universally applicable.

With my outputs and features in hand I went about building two types of classifiers: a decision tree and a neural network.  Decision trees are great because you can see how data is classified based on individual features, applied one at a time.  It’s like watching pachinko balls fall to their destination.  The logic is open for all to discuss.  The downside is that the classification is a rather simple sequence of logic gates.  Below is an example tree depicting survivorship on the Titanic.  The numbers depict survivorship and % of data ending in that “leaf.”  (Note: “sibsp” = number of siblings + spouse.)

For capturing the subtle combinations of features when clear-cut rules seem impossible to hardcode, we turn to neural networks with hidden layers.  (For a nice introduction)

Unfortunately, with a complex topology and an activation function that blends inputs, the neural network becomes a black box.  Much like human traders, you can’t open them up to perform an autopsy every time something goes wrong.

Implementation

I implemented the decision tree using the scikit-learn library’s decision tree classifier for python.  The neural network was implemented with the keras library and uses a TensorFlow backend.  Both setups employed a pre-shuffled 5-fold cross-validation of my Brent Oil data set.  This means that my time-series of target and feature data was first reordered randomly then split into five equally large groups.  The classifiers were trained on 4/5 of the data and tested on the 1/5 of “new” data.  This was repeated 5 times, until each slice had the chance of being the test set.  A cumulative accuracy % was reported.  You can find the code copied at the end of this article.

Results

In both the Decision Tree and the Neural Network, I train the classifiers on 80% of the data and test the accuracy on the remaining 20%.  With the decision tree I varied the maximum tree depth.  A tree with 3 levels is a lot simpler to understand and less overfit than a tree of 15 levels.

Here we can see that our decision tree accuracy tops out in the low 90% with the feature set I provided.  It’s important to note that about 40% of the test data are trending days and 40% are volatile days.  The remaining 20% are either trending and volatile or just dead calm.  This means that if I had a classifier that only gave Trending as an answer, it would already be 40% accurate.  Let’s call this the Lazy Classifier.  We must significantly outperform the Lazy Classifier.  Happily, with only 3 layers in my tree, I can achieve over 70% accuracy, and the structure is still quite simple, even readable.

With the neural network, I varied two complexity parameters – the number of hidden layers and the neurons per hidden layer – and observed the following accuracy for classifying the testing set.

Interestingly, we do not yet approach the top accuracy of the decision tree classifier with 4 hidden layers and 16 neurons per layer.  I find this interesting because the possibility to simultaneously combine feature inputs (vs sequential application of individual features in decision trees) does not seem to earn us any gain in accuracy.  However, with enough hidden layers and neurons we should be able to match the accuracy of the overfit decision tree.  This is because the neural network will be able to at least reproduce the logical flow of the decision tree.  I have even increased the hidden layers to 10 and the neurons per layer to 32, but with no improvement.  I also modulated the “epochs” from low to high (forcing underfit-ness and overfit-ness with the given topology) but still without matching the decision tree’s accuracy.  Apparently, with this specific problem and these features, the decision tree classifier is much faster and moderately more accurate.  In the future, I would like to test a wider sample of commodity markets.

Thanks for reading.  If you are interested in articles on energy portfolio management, you can follow us here or on LinkedIn. 

Level: Intermediate.  Time: 15 min.

Starting up

A thermal power station — a gas or a coal burner — is more than a giant cash machine.  It is a giant machine, a real option.  You run it when profitable and switch it off when unprofitable.  And as you would imagine, several hundred million euros worth of power plant is more complicated than a light switch.  Oil burners must preheat the main boilers to a sufficient temperature.  Coal must be pulverized.  In some cases, gas capacity must be scheduled.  This takes time and a lot of money.  Starting up a station is a big commitment.

To help utilities make the most profitable startup and shut-down decisions, we rely on optimization algorithms to give us a pretty good schedule for how to run our assets with a given price forecast.

The Slow Approach

If we merely wanted to optimize our station’s run-schedule with the current prices as fixed inputs, we could allow ourselves the luxury of solving a slow, bulky non-linear optimization problem.  This model could include all the bells and whistles: nonlinear fuel efficiency curves as a function of output and temperature; differentiation between cold starts and warm starts; nonlinear jumps of output between zero production and Pmin (technical limit for nonzero minimum production).  Such a complete problem definition would give a very satisfactory dispatch schedule with our fixed price inputs.

But forward prices are annoying: they change constantly.  That’s why in the utility trading business we need to be able to update our asset valuation and running schedule at any time.  Knowing the full option value (and hedge-able option delta) of our real option requires knowing what a perfect dispatcher would do in all price scenarios.  Well, all is a lot.  If we believe that just 1000 or 5000 randomly generated scenarios is a sufficiently good sample, we could still use the slow nonlinear optimization model.  But this would require an overnight run and probably even threading the problem in parallel over multiple computers.

An alternative is to find a clever way to convert the nonlinear problem into a simpler, faster linear problem that can run thousands of scenarios on a single workstation in trade-able time.

The Traveling Salesman Approach

The search for linearity brought me to the famous Traveling Salesman Problem (TSP), the problem of finding the best route through all the connected points on a map.

traveling salesman

‘Best’ typically means fastest, shortest, or cheapest.  (One could also find the costliest route if one so desired.)  A variation of this problem is finding the shortest route from A to Z, without the requirement to hit every node.

traveling salesman

The innovative approach that I want to share with you is how to use the geographic optimization of the Traveling Salesman Problem on the “state-space” of our power station.

First, a description of our state-space.  For every time t, we’ll say that our power station has 3 possible generation states: Pzero (i.e. “off”), Pmin, and Pmax.  These nodes connect via forward arcs to the Pzero, Pmin, and Pmax state nodes of t+1.  Moving from, say, Pmin to Pmax assumes a linear change in production over dt.

traveling salesman power station

Every arc between two nodes has a well-defined efficiency and profit formula.  Hourly commodity prices are plugged in, giving every arc a simple profit (or loss) pre-calculated before the optimization solver runs.  We’ll call this step “mapping.”  Every scenario will require a new mapping as each time step will have adjusted commodity prices.  Arcs connecting Pzero at t to Pmin at t+1 contain the full startup costs.

The magic of the Traveling Salesman Problem is that we do not need to specify an integer number (i.e. 1) of traveling salesmen to travel through our network.  This is fortunate for the salesman — the optimizer doesn’t chop him into pieces and send his parts down alternate paths like internet packets.  The whole salesman will flow down the best route.  Let me clarify: this integer property holds when the profit of each arc is also an integer. Since arcs will typically be weighted by large profits or losses, we can safely round to the nearest integer with no loss of accuracy.  This is a big advantage because what typically requires binary state variables within a mixed integer problem can be solved as a linear program (LP) with very efficient algorithms.

With our TSP setup, we can apply other useful constraints on our thermal power station and maintain the linearity of the problem.  We can limit the number of starts simply by constraining the maximum times the startup arc in our solution is used.  This would be useful if our plant had a contractual service clause with GE or Siemens for a maximum number of starts per year.  Likewise, we can constrain the maximum or minimum cumulative production in our period by limiting a weighted sum of the solution arcs chosen.  This would be useful if we had volumetric fuel constraints.  Moreover, if we need our power station to supply upward flexibility to the grid for a known sub-period we can constrain the plant to run at Pmin for this time.

Implementation

I implemented the above-mentioned power station model in python on one workstation.  My code has four functional parts:

  1. Scenario generation of correlated random forward commodity prices
  2. Extension of scalar forward prices to vector of hourly spot prices according to own model
  3. Mapping of the station’s state-space network using the result of the hourly price vectors
  4. Solving linear program with CPLEX library for python

The linear program must be given to the CPLEX solver in the following form:

Objective:
Minimize sum [- Profit(i) * X(i)], for arc X(i)

Constraints:
sum Xi = 0, for each node except for starting and ending node (what goes into a node must come out)
0 < Xi < 1 (all arcs can take a maximum of 1 traveling salesman)

Depending on the flexibility of the thermal power station that you are modelling, I would recommend a 4- or 8-hour time granularity.  You can also customize the network to decrease the slope of specific arcs, for example, to implement a long start-up time.  Thus, instead of connecting Pzero at t to Pmin at t+1, connect it to Pmin at t+2et voila: longer start times.

Next Step: Confiscate the salesman’s crystal ball

Compared to a simple spread option formula with forward prices, such as Margrabe’s Formula with Kirk’s Approximation (which certainly has its place in our toolbox), the TSP-optimized Power Station can capture the fine details of hourly price shape as well as cumulative plant constraints.  However, one major shortcoming of the TSP-Station is that the optimization of each price scenario assumes perfect foresight of each arc’s profitability throughout the entire optimization period.  (In other words, the traveling salesman knows exactly what the traffic density and petrol price for every leg of his upcoming business trip will be.)  This of course will over-estimate the value of the power station, which is very important to remember if you are on the buy side.  An important commercial principle that I communicate to clients is to pay only up to the level that you can practically monetize (minus a required cost of capital), not the theoretical value.  This will cause you to miss out on a lot of deals, but it will also keep you profitable.

How do we limit our dispatcher’s foresight?  We can put our high-priced CPLEX solver back on the shelf and run a modification of Dijkstra’s Algorithm.  Dijkstra finds “sufficiently good” paths through the network with limited foresight in polynomial time.  The modification we would make to Dijkstra is to extend the foresight to 7 days.  This is perfectly reasonable since weather forecasts within one week are accurate, and we know that there’s a distinct intra-week shape to power demand and spot prices.  Hence, we already have a good sense of the 7-day price shape.  I will leave this implementation for an upcoming article.  Thanks for following along.