Notes - LinkedIn course - Financial data analysis

Notes on data analysis on finance data
Author

Deepak Ramani

Published

May 10, 2023

Modified

June 14, 2023

For financial data analysis, it is important to know certain terminologies, their purpose, and how to calculate them.

I’m taking Matt Harrison’s Linkedin course on Python for Finance. Here I note down all the Pandas techniques and concepts explored in the course. As always if you find an error, don’t hesitate to contact me.

Loading data

We use the yfinance library to load our stock data. The stocks listed in NASDAQ stock exchange are identified using a unique symbol aka ticker . I was interested in seeing how Pfizer(PFE) stocks fair in the last 3 years.

import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 

pharma_df = yf.download('PFE JNJ', start='2015-01-01', end='2023-04-28')
pharma_df.head()
1
Displays first 5 rows
[                       0%                       ][*********************100%***********************]  2 of 2 completed
Adj Close Close High Low Open Volume
JNJ PFE JNJ PFE JNJ PFE JNJ PFE JNJ PFE JNJ PFE
Date
2015-01-02 83.076210 21.793076 104.519997 29.724857 105.550003 30.151802 104.129997 29.620493 105.050003 29.667933 5753600 16371571
2015-01-05 82.495979 21.674822 103.790001 29.563566 104.730003 29.800758 103.680000 29.421251 104.480003 29.743834 8079300 24786391
2015-01-06 82.090607 21.855684 103.279999 29.810247 104.989998 30.227703 102.940002 29.525618 104.339996 29.667933 7428000 29468681
2015-01-07 83.902840 22.154783 105.559998 30.218216 105.830002 30.237192 103.809998 29.962049 103.910004 30.094877 7931700 20248816
2015-01-08 84.562553 22.606922 106.389999 30.834915 106.489998 30.967743 105.750000 30.569260 106.059998 30.683111 9916000 49169522

Chaining

Matt introduces one of the features in Pandas called chaining. It allows reading the code as a recipe. One can simply go through from top to bottom and understand how the code works. We leverage pipe() pandas function. We can use it call any function.

From the two stocks, PFE and JNJ, we need only PFE. So, we can try to use chaining principle.

Without chaining

def fix_cols(df):
    cols = df.columns
    outer = [col[0] for col in cols]
    df.columns = outer
    return df

pfe_df1 = pharma_df.iloc[:,1::2]
pfe_df1 = fix_cols(pfe_df1)
pfe_df1
Adj Close Close High Low Open Volume
Date
2015-01-02 21.793076 29.724857 30.151802 29.620493 29.667933 16371571
2015-01-05 21.674822 29.563566 29.800758 29.421251 29.743834 24786391
2015-01-06 21.855684 29.810247 30.227703 29.525618 29.667933 29468681
2015-01-07 22.154783 30.218216 30.237192 29.962049 30.094877 20248816
2015-01-08 22.606922 30.834915 30.967743 30.569260 30.683111 49169522
... ... ... ... ... ... ...
2023-04-21 39.779552 40.209999 40.299999 39.910000 40.090000 19227100
2023-04-24 39.482765 39.910000 40.200001 39.709999 40.189999 17633700
2023-04-25 38.908978 39.330002 39.919998 39.279999 39.750000 24492400
2023-04-26 38.216469 38.630001 39.189999 38.400002 39.160000 22401400
2023-04-27 38.325291 38.740002 38.830002 38.310001 38.619999 22434000

2094 rows × 6 columns

With chaining

pfe_df = (pharma_df
          .iloc[:,1::2]
          .pipe(fix_cols)
         )
pfe_df
1
Retrieves only PFE stock data
2
Removes the ticker and just shows
Adj Close Close High Low Open Volume
Date
2015-01-02 21.793076 29.724857 30.151802 29.620493 29.667933 16371571
2015-01-05 21.674822 29.563566 29.800758 29.421251 29.743834 24786391
2015-01-06 21.855684 29.810247 30.227703 29.525618 29.667933 29468681
2015-01-07 22.154783 30.218216 30.237192 29.962049 30.094877 20248816
2015-01-08 22.606922 30.834915 30.967743 30.569260 30.683111 49169522
... ... ... ... ... ... ...
2023-04-21 39.779552 40.209999 40.299999 39.910000 40.090000 19227100
2023-04-24 39.482765 39.910000 40.200001 39.709999 40.189999 17633700
2023-04-25 38.908978 39.330002 39.919998 39.279999 39.750000 24492400
2023-04-26 38.216469 38.630001 39.189999 38.400002 39.160000 22401400
2023-04-27 38.325291 38.740002 38.830002 38.310001 38.619999 22434000

2094 rows × 6 columns

As you can see this makes an easier reading. We use the pipe() to call our fix_cols function. The resulting dataframe has only the outer level column names. Indeed, I agree that as more analysis are added, it gets complicated and harder to understand. Indeed, the intermediate calculation steps are not shown in the final version which makes it difficult to visualise the operation instantaneously.

Plotting

Since Pandas’ plotting functionality is built on top of Matplotlib, we use it easily to plot our data.

(pharma_df
 .iloc[:,1::2] 
 .pipe(fix_cols) 
 .Close
 .plot()
)
1
Fetches only the volume column. Dataframe is now a series.
2
Date is used at x-axis and the volume column values at y-axis.
<AxesSubplot:xlabel='Date'>

We can also adjust plot window size using Matplotlib’s figsize function.

pfe_vol = (pharma_df
 .iloc[:,1::2] 
 .pipe(fix_cols) 
 .Volume
 .plot(figsize=(10,2))
)
1
Adjusting the plot window size so we can visualise the volume easily

Resampling

Since we have date as our row index, we can leverage .resample() feature of Pandas to convert our PFE raw data into time periods we want to visualise. The .resample() takes in an argument which is appropriate for the index. In our case it is date. So, it can take monthly(M), weekly(W), daily(Y), quarterly(Q) etc. Bi monthly is possible with 2M.

(pfe_df
 .Close
 .resample('M')
 .mean()
 .plot()
)
1
Returns resampler object. Looks for an aggregate function.
2
Monthly average
<AxesSubplot:xlabel='Date'>

If we want to aggregate two columns, then we can use .agg() function.

(pfe_df
 .resample('M')
 .agg({'Close': 'max', 'Open': 'first'})
 .loc['jan 2020': 'apr 2023']
 .plot()
)
1
Close column with grouped by max values and Open with first non-null value.
<AxesSubplot:xlabel='Date'>

Candle Stick chart

With stock data, it is easier to visualise its pattern using a candle stick chart. Candle stick plot is not in Pandas. We can write our own plot function using matplotlib.

from matplotlib import dates
fig, ax = plt.subplots(figsize=(10,5))
def plot_candle(df, ax):
    #wick
    ax.vlines(x=df.index, ymin=df.Low, ymax=df.High, colors='k', linewidth=1)
    #red - decrease
    red = df.query('Close < Open')
    ax.vlines(x=red.index, ymin=red.Close, ymax=red.Open, colors='r', linewidth=3)
    #green - increase
    green = df.query('Close >= Open')
    ax.vlines(x=green.index, ymin=green.Close, ymax=green.Open, colors='g',linewidth=3)
    ax.xaxis.set_major_locator(dates.MonthLocator())
    ax.xaxis.set_major_formatter(dates.DateFormatter('%b-%y'))
    ax.xaxis.set_minor_locator(dates.DayLocator())
    ax.grid(color='grey')
    return df


(pfe_df
 .resample('d')
 .agg({'Close': 'last', 'Open': 'first', 'Low':'min', 'High':'max'})
 .loc['jan 2023': 'apr 2023']
 .pipe(plot_candle,ax)
)
Close Open Low High
Date
2023-01-01 NaN NaN NaN NaN
2023-01-02 NaN NaN NaN NaN
2023-01-03 51.259998 51.009998 50.820000 51.330002
2023-01-04 50.130001 50.290001 49.520000 50.630001
2023-01-05 49.660000 49.730000 48.919998 49.990002
... ... ... ... ...
2023-04-23 NaN NaN NaN NaN
2023-04-24 39.910000 40.189999 39.709999 40.200001
2023-04-25 39.330002 39.750000 39.279999 39.919998
2023-04-26 38.630001 39.160000 38.400002 39.189999
2023-04-27 38.740002 38.619999 38.310001 38.830002

117 rows × 4 columns

With the chart, we can determine the patterns such as bearish, bullish etc. We can change .locfor the period we want to see the pattern.

Calculations

In this section we learn certain metrics used on financial data.

Returns

How much percentage of return can be expected? With pandas, we can simply use .pct_change() function and get the values. Plotting them is as simple as shown previously.

(pfe_df
 .pct_change()
 .Close
 .plot()
)

Histogram can be an option but it doesn’t show negative swing. Somewhat appropriate would be to use bar plot.

(pfe_df
 .pct_change()
 .Close
 .iloc[-100:] #last 100 rows
 .plot.bar()
)

This plot shows the negative trends but the X-axis is illegible. We don’t know on which date the closing stock prices changed. This is because Pandas converts/groups whatever on the x-axis into categorical variables. For example, for categorical variable such as elephants, dogs and cats this works but for dates that isn’t correct.

What if we explicitly say the x-axis as dates.

fig, ax = plt.subplots(figsize=(10,4))
(pfe_df
 .pct_change()
 .Close
 .iloc[-100:]
 .plot.bar(ax=ax)
)
ax.xaxis.set_major_locator(dates.MonthLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('%b-%y'))
ax.xaxis.set_minor_locator(dates.DayLocator())

1970?!? Still Pandas converts dates to categorical variables.

The solution Matt suggests is to use matplotlib.

def my_bar(series, ax):
    ax.bar(series.index, series)
    ax.xaxis.set_major_locator(dates.MonthLocator())
    ax.xaxis.set_major_formatter(dates.DateFormatter('%b-%y'))
    ax.xaxis.set_minor_locator(dates.DayLocator())
    return series

fig, ax = plt.subplots(figsize=(10,4))
_ = ( pfe_df
 .pct_change()
 .Close
 .iloc[-100:]
 .pipe(my_bar, ax)
)

Looks good now.

Cumulative returns

Cumulative returns shows the investment amount gained or lost over time. The formula is given by \[ cumulative\_return = \frac{(current\_price - original\_price)}{(curent\_price)} \]

(pfe_df
 .Close
 .sub(pfe_df.Close[0])
 .div(pfe_df.Close[0])
 .plot()
)
<AxesSubplot:xlabel='Date'>

Another alternate way is to numpy’s cumprod function.

(pfe_df
 .Close
 .add(1)
 .cumprod()
 .sub(1)
 .plot()
)
/Users/dross/miniforge3/envs/mlops/lib/python3.9/site-packages/numpy/core/fromnumeric.py:57: RuntimeWarning:

overflow encountered in accumulate
<AxesSubplot:xlabel='Date'>

As you can see both plots give the same result.

If we’re to use .pipe here, we can do like so:

def calc_cumrets(df, col):
     ser = df[col]
     return (ser
             .sub(ser[0])
             .div(ser[0])
            )
(pfe_df
 .pipe(calc_cumrets,'Close')
 .plot()
)
<AxesSubplot:xlabel='Date'>

Lambda functions or anonymous functions

Using lambda functions we can make impropmtu functions and use it with our chaining.

We would traditionally call a function like so:

def get_returns(df):
    return calc_cumrets(df, 'Close')

get_returns(pfe_df)
Date
2015-01-02    0.000000
2015-01-05   -0.005426
2015-01-06    0.002873
2015-01-07    0.016598
2015-01-08    0.037344
                ...   
2023-04-21    0.352740
2023-04-24    0.342647
2023-04-25    0.323135
2023-04-26    0.299586
2023-04-27    0.303286
Name: Close, Length: 2094, dtype: float64

However, if we are to use lambda, then the above code can be written as:

(lambda df: get_returns(df))(pfe_df)
Date
2015-01-02    0.000000
2015-01-05   -0.005426
2015-01-06    0.002873
2015-01-07    0.016598
2015-01-08    0.037344
                ...   
2023-04-21    0.352740
2023-04-24    0.342647
2023-04-25    0.323135
2023-04-26    0.299586
2023-04-27    0.303286
Name: Close, Length: 2094, dtype: float64

Now, with cumulative returns calculation, it would be useful if those values can be assigned to a new column in the dataframe. It is here that a Pandas feature in .assign function is helpful. It helps create new columns. We can couple .assign and lambda together.

pfe_df = (pfe_df
 .assign(cum_rets=lambda df:calc_cumrets(df, 'Close'))
)

Volatility

Volatility is a statistical measure of the dispertion of the returns for a given market index in this case stocks. In most cases, higher the volatility, the riskier the stock. It is often measured from either standard deviation or variance between returns from that stock. Remember standard deviation is the measure of deviation of the data relative to its mean.

Just like mean(), we can calculate std().

(pfe_df
 .Close
 #.mean()
 .std()
)
6.69077286484766

The .assign() allows consective chaining methods to use these newly created columns. In the below code block, we can use the pct_change_close created in the first line in to the second line. Then, we can calculate 30 day rolling volatility. Rolling is nothing but a time frame in which the volatility is calculated. We can see that for the first 15 days the volatility is NaN(not a number) and on the 30th day, there is an entry.

(pfe_df
 .assign(pct_change_close=pfe_df.Close.pct_change())
 .pct_change_close
 .rolling(30)
 .std()
)
Date
2015-01-02         NaN
2015-01-05         NaN
2015-01-06         NaN
2015-01-07         NaN
2015-01-08         NaN
                ...   
2023-04-21    0.008945
2023-04-24    0.009058
2023-04-25    0.009192
2023-04-26    0.009718
2023-04-27    0.009543
Name: pct_change_close, Length: 2094, dtype: float64
#rolling volatility
(pfe_df
 .assign(close_volatility=pfe_df.rolling(30).Close.std(),
         percent_volatility=pfe_df.Close.pct_change().rolling(30).std())
 .iloc[:,-2:] # fetch only the last two columns
 .plot(subplots=True)

)
array([<AxesSubplot:xlabel='Date'>, <AxesSubplot:xlabel='Date'>],
      dtype=object)

We can also use .resample to calculate 15 day volatility as we have date as index.

# 15 day volatility
(pfe_df
 .assign(pct_change_close=pfe_df.Close.pct_change())
 .resample('15D')
 .std()
)
Adj Close Close High Low Open Volume cum_rets pct_change_close
Date
2015-01-02 0.423586 0.577756 0.554235 0.536322 0.565220 8.716662e+06 0.019437 0.009335
2015-01-17 0.408934 0.557768 0.552896 0.473576 0.428281 6.992238e+06 0.018764 0.011556
2015-02-01 0.864667 1.092056 1.124398 1.137237 1.027460 1.347935e+07 0.036739 0.013012
2015-02-16 0.103079 0.139368 0.153115 0.158683 0.142704 5.436585e+06 0.004689 0.006524
2015-03-03 0.232108 0.313819 0.310265 0.285296 0.329080 6.907642e+06 0.010557 0.008226
... ... ... ... ... ... ... ... ...
2023-02-19 0.912909 0.922787 0.942723 1.064260 1.025597 4.710137e+06 0.031044 0.011505
2023-03-06 0.510382 0.515904 0.386680 0.472090 0.610293 1.148802e+07 0.017356 0.011168
2023-03-21 0.414519 0.419005 0.428108 0.305953 0.396622 3.416600e+06 0.014096 0.009084
2023-04-05 0.495242 0.500601 0.483455 0.430794 0.424291 4.807599e+06 0.016841 0.008860
2023-04-20 0.647205 0.654208 0.613951 0.713320 0.629458 2.507162e+06 0.022009 0.010312

203 rows × 8 columns

# 15 day rolling volatility
(pfe_df
 .assign(pct_change_close=pfe_df.Close.pct_change())
 .rolling(window=15, min_periods=15)
 .std()
)
Adj Close Close High Low Open Volume cum_rets pct_change_close
Date
2015-01-02 NaN NaN NaN NaN NaN NaN NaN NaN
2015-01-05 NaN NaN NaN NaN NaN NaN NaN NaN
2015-01-06 NaN NaN NaN NaN NaN NaN NaN NaN
2015-01-07 NaN NaN NaN NaN NaN NaN NaN NaN
2015-01-08 NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
2023-04-21 0.591463 0.597864 0.638385 0.562327 0.596404 3.967966e+06 0.020113 0.009691
2023-04-24 0.657297 0.664410 0.695511 0.616351 0.626652 3.915626e+06 0.022352 0.009383
2023-04-25 0.771667 0.780017 0.784388 0.725364 0.709656 4.093416e+06 0.026241 0.008983
2023-04-26 0.957201 0.967559 0.939399 0.930587 0.843585 4.133347e+06 0.032551 0.009559
2023-04-27 1.047608 1.058944 1.047459 1.068349 1.011878 3.287318e+06 0.035625 0.008130

2094 rows × 8 columns

What happens if the assinged new column name is same as the pandas function name and we have to use further for our analysis? We can include that ‘assigned’ column within [](square) braces and use it. In the below example, we can see how pct_change conflicts with pandas and is therefore must be put inside [] to access it.

# 15 day rolling volatility
(pfe_df
 .assign(pct_change=pfe_df.Close.pct_change())
 .rolling(window=15, min_periods=15)
 .std()
 #.pct_change
 ["pct_change"]
 .plot()
)
<AxesSubplot:xlabel='Date'>

Moving averages or rolling windows

Moving average(MA) of a stock is calculated to help smooth out the price data by creating a constantly updated average price. It helps to mitigate the impacts of random, short-term fluctuations on the prices of the stock over a time period. There are two types of moving averages - simple which is just the arithmetic mean of the given prices over a specified number of days and exponential which is the weighted average that gives significance to the recent prices than old ones, making it an indicator that is more responsive to new infotmation.

MA is used to identify the tread direction of a stock or to determine its support and resistance level as it depends on the past prices. The longer the period for the MA, the greater the lag. A 200-day MA has much greater lag than 20-day MA. The gold standard used by investers are 50-day and 200-day MAs.

Shorter MA for short-term investment and longer MA for long-term. A rising MA means upward trend and declining means downward trend.

What is a Golden Cross?

A golden cross is a chart pattern in which a short-term moving average crosses above a long-term moving average. The golden cross is a bullish breakout pattern formed from a crossover involving a security’s short-term moving average such as the 15-day moving average, breaking above its long-term moving average, such as the 50-day moving average. As long-term indicators carry more weight, the golden cross indicates a bull market on the horizon and is reinforced by high trading volumes.

Lag

These lags can be calculated in Pandas using shift function. shift(1) means shift index one place down. shift(2) means two places down. For example,

(pfe_df
 .assign(s1=pfe_df.Close.shift(1),
         s2=pfe_df.Close.shift(2))
 [["s1","s2"]]
)
s1 s2
Date
2015-01-02 NaN NaN
2015-01-05 29.724857 NaN
2015-01-06 29.563566 29.724857
2015-01-07 29.810247 29.563566
2015-01-08 30.218216 29.810247
... ... ...
2023-04-21 39.849998 40.240002
2023-04-24 40.209999 39.849998
2023-04-25 39.910000 40.209999
2023-04-26 39.330002 39.910000
2023-04-27 38.630001 39.330002

2094 rows × 2 columns

the Close value in the first row will be on the second row for shift(1) and two rows down for shift(2).

Now for simple 3-day moving average, we need to average Close, s1, and s2. We can do it manually using a lambda and use the rolling pandas with window=3 specified.

(pfe_df
 .assign(s1=pfe_df.Close.shift(1),
         s2=pfe_df.Close.shift(2),
         ma3=lambda df_:df_.loc[:,["Close", "s1", "s2"]].mean(axis='columns'),
         ma3_builtin=pfe_df.Close.rolling(3).mean()
        )
[["s1","s2","ma3","ma3_builtin"]]
)
s1 s2 ma3 ma3_builtin
Date
2015-01-02 NaN NaN 29.724857 NaN
2015-01-05 29.724857 NaN 29.644212 NaN
2015-01-06 29.563566 29.724857 29.699557 29.699557
2015-01-07 29.810247 29.563566 29.864010 29.864010
2015-01-08 30.218216 29.810247 30.287793 30.287793
... ... ... ... ...
2023-04-21 39.849998 40.240002 40.100000 40.100000
2023-04-24 40.209999 39.849998 39.989999 39.989999
2023-04-25 39.910000 40.209999 39.816667 39.816667
2023-04-26 39.330002 39.910000 39.290001 39.290001
2023-04-27 38.630001 39.330002 38.900002 38.900002

2094 rows × 4 columns

Plotting MAs

We are getting comfortable with plotting. We select the columns needed to plotted - [‘Close’, ‘ma3_builtin’] for last 200 rows.

(pfe_df
 .assign(s1=pfe_df.Close.shift(1),
         s2=pfe_df.Close.shift(2),
         ma3=lambda df_:df_.loc[:,["Close", "s1", "s2"]].mean(axis='columns'),
         ma3_builtin=pfe_df.Close.rolling(3).mean()
        )
 [['Close', 'ma3_builtin']]
 .iloc[-200:]
 .plot()
)
<AxesSubplot:xlabel='Date'>

As we can see the MA smoothes out the little peaks and troughs.

Golden Cross

Some experts say if there is a crossover between MA-50 and MA-200, it is an indicator to buy or sell.

(pfe_df
 .assign(ma50=pfe_df.Close.rolling(50).mean(),
         ma200=pfe_df.Close.rolling(200).mean()
        )
 [["Close","ma50","ma200"]]
 .iloc[-650:]
 .plot()
)
<AxesSubplot:xlabel='Date'>

Technical analysis

Technical analysis studies the price and volumes of the investments. Studying the trends on prices and volumes give analyst to evaluate and identify trading opportunities.

Technical analysis tools are used to scrutinize the ways supply and demand for a security will affect changes in price, volume, and implied volatility. Past prices are used to determine future prices.

OBV- On-balance Volume

OBV is one such used for technical analysis. It is a momentum indicator that uses volume to predict changes in stock price.

What Does On-Balance Volume Tell You?

The actual value of the OBV is unimportant; concentrate on its direction. (source: fidelity)

  • When both price and OBV are making higher peaks and higher troughs, the upward trend is likely to continue.

  • When both price and OBV are making lower peaks and lower troughs, the downward trend is likely to continue.

  • During a trading range, if the OBV is rising, accumulation may be taking place—a warning of an upward breakout.

  • During a trading range, if the OBV is falling, distribution may be taking place—a warning of a downward breakout.

  • When price continues to make higher peaks and OBV fails to make higher peaks, the upward trend is likely to stall or fail. This is called a negative divergence.

  • When price continues to make lower troughs and OBV fails to make lower troughs, the downward trend is likely to stall or fail. This is called a positive divergence.

OBV calculation

If today’s close is greater than yesterday’s close then: OBV = Yesterday’s OBV + Today’s Volume

If today’s close is less than yesterday’s close then: OBV = Yesterday’s OBV – Today’s Volume

If today’s close is equal to yesterday’s close then: OBV = Yesterday’s OBV

\[ OBV = OBV_{prev} + \begin{cases} volume, \text{ if close > close}_{prev} \\ 0, \text{ if close = close}_{prev}\\ -volume, \text{ if close < close}_{prev} \end{cases} \]

where

OBV = current on-balance volume level

OBVprev = previous on-balance volume level

volume = Latest trading volume amount

Python - Naive approach

def calc_obv(df): 
    df = df.copy() 
    df["OBV"] = 0.0 
    for i in range(1, len(df)): 
        if df["Close"][i] > df["Close"][i - 1]: 
            df["OBV"][i] = df["OBV"][i-1] + df["Volume"][i] 
        elif df["Close"][i] < df["Close"][i - 1]: 
            df["OBV"][i] = df["OBV"][i-1] - df["Volume"][i] 
        else: 
            df["OBV"][i] = df["OBV"][i-1] 
    return df

calc_obv(pfe_df)

It takes a while to complete. That is too long. Another approach would be to use numpy’s where condition. Inserting the conditions needs some thinking and isn’t obvious as seeing the mathematical formula. So we avoid np.where even though it is faster than naive python approach.

Preferred choice to calculate OBV - np.select

The syntax of np.select is easy to understand - we specify the conditions list, their choices and a default value if the conditions aren’t satisfied.

np.select(condlist=[pfe_df.Close < 7.5, pfe_df.Close > 70],
          choicelist=[7.55, 72], default= 34)
array([34., 34., 34., ..., 34., 34., 34.])

So our code will be like so -

(pfe_df
 .assign(vol=np.select(condlist=[pfe_df.Close > pfe_df.Close.shift(1),
                                 pfe_df.Close == pfe_df.Close.shift(1),
                                 pfe_df.Close < pfe_df.Close.shift(1)],
                       choicelist=[pfe_df.Volume, 0, -pfe_df.Volume]),
        obv=lambda df_:df_.vol.cumsum()
        )
)
Adj Close Close High Low Open Volume cum_rets vol obv
Date
2015-01-02 21.793076 29.724857 30.151802 29.620493 29.667933 16371571 0.000000 0 0
2015-01-05 21.674822 29.563566 29.800758 29.421251 29.743834 24786391 -0.005426 -24786391 -24786391
2015-01-06 21.855684 29.810247 30.227703 29.525618 29.667933 29468681 0.002873 29468681 4682290
2015-01-07 22.154783 30.218216 30.237192 29.962049 30.094877 20248816 0.016598 20248816 24931106
2015-01-08 22.606922 30.834915 30.967743 30.569260 30.683111 49169522 0.037344 49169522 74100628
... ... ... ... ... ... ... ... ... ...
2023-04-21 39.779552 40.209999 40.299999 39.910000 40.090000 19227100 0.352740 19227100 -538012626
2023-04-24 39.482765 39.910000 40.200001 39.709999 40.189999 17633700 0.342647 -17633700 -555646326
2023-04-25 38.908978 39.330002 39.919998 39.279999 39.750000 24492400 0.323135 -24492400 -580138726
2023-04-26 38.216469 38.630001 39.189999 38.400002 39.160000 22401400 0.299586 -22401400 -602540126
2023-04-27 38.325291 38.740002 38.830002 38.310001 38.619999 22434000 0.303286 22434000 -580106126

2094 rows × 9 columns

Putting this as function,

def calc_obv(df, close_col="Close", vol_col="Volume"):
    close = df[close_col]
    vol = df[vol_col]
    close_shift = close.shift(1)
    return (df
             .assign(vol=np.select(condlist=[close > close.shift(1),
                                   close == close.shift(1),
                                   close < close.shift(1)],
                                  choicelist=[vol, 0, -vol]),
                     obv=lambda df_:df_.vol.fillna(0).cumsum()
                    )
             ["obv"]
    )

(pfe_df
    .assign(obv=calc_obv)
    .obv
    .plot()
)
<AxesSubplot:xlabel='Date'>

Accumulation distribution indicator

What is A/D and why is it needed?

A/D is an evolution of OBV to find the relationship between the price and volume flow of the stock. OBV, from its formula totally relies on the closing value and the volume is either added or substracted giving total bias either to the buyers or sellers. This is not realistic. So A/D brings proporation to the context. In other words it brings a multipler that ranges from -1 to +1. Realistically the value is between this range and therefore providing realistic volume flow.

Formula for A/D is given by \[ A\\/D = A\\/D_{prev} + (\text{money flow multiplier} * \text{current volume}) \] where Money flow multiplier and Current Volume is given by

\(MFM = \frac{(Close - Low) - (High - Close)}{(High - Low)}\) \(CV = \text{MFM} * \text{period volume}\)

What does A/D indicate?

It is a cumulative indicator that uses volume and price to assess whether a stock is accumulating or distributing.

Calculation

(pfe_df
 .assign(mfm=((pfe_df.Close-pfe_df.Low) - (pfe_df.High-pfe_df.Close)) / (pfe_df.High-pfe_df.Low),
         mfv=lambda df_:df_.mfm * df_.Volume,
         cmfv = lambda df_:df_.mfv.cumsum()
        )
)
Adj Close Close High Low Open Volume cum_rets mfm mfv cmfv
Date
2015-01-02 21.793076 29.724857 30.151802 29.620493 29.667933 16371571 0.000000 -0.607142 -9.939876e+06 -9.939876e+06
2015-01-05 21.674822 29.563566 29.800758 29.421251 29.743834 24786391 -0.005426 -0.250001 -6.196629e+06 -1.613650e+07
2015-01-06 21.855684 29.810247 30.227703 29.525618 29.667933 29468681 0.002873 -0.189188 -5.575108e+06 -2.171161e+07
2015-01-07 22.154783 30.218216 30.237192 29.962049 30.094877 20248816 0.016598 0.862063 1.745575e+07 -4.255863e+06
2015-01-08 22.606922 30.834915 30.967743 30.569260 30.683111 49169522 0.037344 0.333333 1.638984e+07 1.213398e+07
... ... ... ... ... ... ... ... ... ... ...
2023-04-21 39.779552 40.209999 40.299999 39.910000 40.090000 19227100 0.352740 0.538460 1.035302e+07 6.969870e+08
2023-04-24 39.482765 39.910000 40.200001 39.709999 40.189999 17633700 0.342647 -0.183673 -3.238837e+06 6.937482e+08
2023-04-25 38.908978 39.330002 39.919998 39.279999 39.750000 24492400 0.323135 -0.843740 -2.066523e+07 6.730830e+08
2023-04-26 38.216469 38.630001 39.189999 38.400002 39.160000 22401400 0.299586 -0.417721 -9.357525e+06 6.637254e+08
2023-04-27 38.325291 38.740002 38.830002 38.310001 38.619999 22434000 0.303286 0.653846 1.466838e+07 6.783938e+08

2094 rows × 10 columns

Making it as a function,

def calc_ad(df, close_col="Close",low_col="Low",high_col="High",vol_col="Volume"):
    close = df[close_col]
    low = df[low_col]
    high = df[high_col]
    vol = df[vol_col]
    return (df
             .assign(mfm=((close-low) - (high-close))/ (high-low),
                     mfv=lambda df_:df_.mfm * vol,
                     cmfv=lambda df_:df_.mfv.cumsum()       
                    )  
             .cmfv
           )
(pfe_df
    .assign(ad=calc_ad)
    .ad
    .plot()
)
<AxesSubplot:xlabel='Date'>

Comparing OBV and AD

(pfe_df
    .assign(obv=calc_obv,
            ad=calc_ad
           )
    [["obv","ad"]]
    .iloc[-650:]
    .plot()

)
<AxesSubplot:xlabel='Date'>

RSI - Relative strength index

Want to support my blog?

Buy Me A Coffee