See in Action

In this demo, I’d like to show you how to use TrendyPy in some stock data between 2018-01-01 and 2020-06-28. You can download the data from here to reproduce the demo.

Let’s say we have some stock data from a combination of tech and banking. And, we want to identify an unknown trend if it’s a tech stock or banking. For this purpose, we’ll use FB (i.e. Facebook), GOOGL (i.e. Google), AMZN (i.e Amazon), BAC (i.e. Bank of America) and WFC (i.e. Wells Fargo) for training data then AAPL (i.e. Apple) and c (i.e. Citigroup) for prediction data.

But first, here is how the data looks.

In [1]: import pandas as pd

In [2]: import matplotlib.pyplot as plt

In [3]: df = pd.read_csv('stock_data.csv')

In [4]: df.plot()
Out[4]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2d70389a58>
../_images/ticks_raw.png

If we cluster like this, the expensive stocks like GOOGL and AMZN will alone constitute one cluster which it’s clearly not intended. So, let’s scale first.

In [5]: from trendypy import utils

In [6]: df = df.apply(utils.scale_01)

In [7]: df.plot()
Out[7]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2d706baeb8>
../_images/ticks_scaled.png

It’s a bit apparent that BAC, WFC and c are different than the others. Let’s put sectors side by side to see the difference better.

In [8]: fig, axes_ = plt.subplots(nrows=1, ncols=2)

In [9]: axes_[0].set_title('Tech')
Out[9]: Text(0.5, 1.0, 'Tech')

In [10]: axes_[1].set_title('Banking')
Out[10]: Text(0.5, 1.0, 'Banking')

In [11]: df[['AAPL', 'FB', 'GOOGL', 'AMZN']].plot(ax=axes_[0])
Out[11]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2d70591358>

In [12]: df[['BAC', 'WFC', 'c']].plot(ax=axes_[1])
Out[12]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2d703d1898>
../_images/ticks_scaled_subplot.png

Now, we can use the training data to fit. Remember, we’re setting AAPL and c aside to predict later and only fit by using the rest.

In [13]: from trendypy.trendy import Trendy

In [14]: trendy = Trendy(n_clusters=2) # 2 for tech and banking

In [15]: trendy.fit([df.FB, df.GOOGL, df.AMZN, df.BAC, df.WFC])

In [16]: trendy.labels_
Out[16]: [0, 0, 0, 1, 1]

You can also use fit_predict method for this purpose, it’s essentially the same.

In [17]: trendy.fit_predict([df.FB, df.GOOGL, df.AMZN, df.BAC, df.WFC])
Out[17]: [0, 0, 0, 1, 1]

As expected, it successfully assigns FB, GOOGL and AMZN into the first cluster (i.e. 0) and BAC and WFC into the second (i.e. 1). So, we can name 0 as tech and 1 as banking.

Now, let’s make predictions on the prediction data that we set aside earlier (i.e. AAPL, c).

In [18]: trendy.predict([df.AAPL]) # expecting `0` since AAPL is a part of tech
Out[18]: [0]

In [19]: trendy.predict([df.c]) # expecting `1` since c is a part of banking
Out[19]: [1]

As seen above, it correctly predicts trends.

You can easily pickle the model object to be used later with to_pickle method.

In [20]: trendy.to_pickle('my_first_trendy.pkl')

And, that’s all.