Stock-Network-Portfolio-Selection.Rnw

%coding: utf-8
\documentclass[techreport]{tbsi-thesis}

%\usepackage{minted}
%\usepackage{float}   % add option [H] for floats
\usepackage{caption} % add \caption* option for table cations (as opposed to title)
\usepackage[utf8]{inputenc}
\usepackage{amsmath}

% Change bibiography page title to References
\renewcommand{\bibname}{References}

% Command for norm
\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}


% Define the portfolio selection problem
\newtheorem{portfoliochoice}{Portfolio choice}
\newtheorem{communicability}{Communicability Betweenness Centrality}


\hypersetup{
  colorlinks=true,
  linkcolor=blue,
  filecolor=magenta,
  urlcolor=cyan
}


\title{Stock Network Investment \\ An Application to the Brazilian Stock Market}

\author{David Cecchini\footnote{TBSI - Environmental Sciences and New Energy Technology - 2019380040}, 黄欣欣\footnote{TBSI - 精准医学与公共健康 - 2019214661}, 政雪翎\footnote{TBSI - Environmental Sciences and New Energy Technology - 2019214641}}

\collegeshield{images/TBSI}

\date{June, 2020}

\submissiondate{June, 2020}

%\subjectline{Optimization and Simulation}
\keywordsEN{quantitative investments; portfolio selection; stock networks; Brazilian stocks}

%\abstractEN{%
%  this is an abstract.
%}


%% PDF meta-info:
\subjectline{Optimization and Simulation}


\begin{document}
\begin{CJK*}{UTF8}{zhsong}

<<setup, include=FALSE>>=
knitr::opts_chunk$set(echo = FALSE, eval = TRUE, fig.path = "images/", message = FALSE, warning = FALSE, error = FALSE, highlight = TRUE)
@

<<engine="python", include=FALSE>>=
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import pickle
import os
from tabulate import tabulate
import dcor
import networkx as nx
from pypfopt import discrete_allocation
from pypfopt.expected_returns import mean_historical_return
from pypfopt.efficient_frontier import EfficientFrontier
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

from header import df_distance_correlation, build_corr_nx, plt_corr_nx, hist_plot, generate_mis
from header import centrality_to_portfolio_weights, is_irreducible, grid_search_threshold
from header import cumulative_returns, portfolio_daily_roi, end_of_year_returns, avg_annual_returns
from header import portfolio_std, mis_portfolio_std, portfolio_sharpe_ratio
from header import daily_rolling_max, daily_rolling_drawdown, max_daily_rolling_drawdown, lifetime_max_drawdown
from header import returns_over_max_drawdown, portfolio_growth_risk, collect_results
from header import TRAIN_RANGE, TEST_RANGE

#silence warnings
import warnings
warnings.filterwarnings("ignore")

investment_capital = 10000
@


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Title page, abstract, declaration etc.:
%% -    the title page (is automatically omitted in the technical report mode).
\frontmatter{}


\chapter{Background}

\section{Brazil Stock Exchange and Over-the-Counter Market}

\emph{Brasil, Bolsa, Balcão - B3} (Brazil, Exchange, Counter)\footnote{\url{http://www.b3.com.br}} is the biggest Brazilian exchange among the top exchanges by market cap in the world, ranking number 18\footnote{\url{https://en.wikipedia.org/wiki/List_of_stock_exchanges}}, with BRL 4 billion in capitalization (approximately USD 660 billion, value that changes considerably due to fluctuations of dollar to Brazilian real conversion rates) and 330 listed companies.

\emph{B3} is a fusion of traditional exchanges in Brazil (Sao Paulo Stock Exchange, Rio Stock Exchange, Brazilian Mercantile and Futures Exchange - BM\&F) and \emph{CETIP} (Central of Custody and Financial Settlement of Securities) to form the unified Brazilian exchange.


\section{Behavioral Finance Hypothesis}

Behavioral economics theory studies the limit from rational and irrational decisions made by economic agents. It is known that due to the psychological differences of the agents, they may behave in an irrational way, overreacting or underreacting to market changes.

Based on the assumption that the traders’ irrational decisions can cause mispricing to financial assets, investors can design trading strategies that take advantage of the mispriced assets and invest according to the real (fair) price to guarantee a positive return on investments. This kind of investment strategy is called financial behavioral investment.

Portfolio selection is an important part of the investor’s decisions since there are many different assets to invest in, each one with different expected returns and different risks. Modern Portfolio Theory (MPT)\footnote{\url{ https://en.wikipedia.org/wiki/Modern_portfolio_theory}} uses a mathematical approach to select stocks based on the duality \emph{Risk-Return}. The MPT was introduced by the Nobel prize Harry Markowitz\cite{Markowitz1952}, where he introduces the concept of diversification that allows a portfolio to obtain similar or higher returns with less risk by adding assets to it.

MPT is a theory of how risk-averse investors construct portfolios based on a certain level of market risk to achieve optimal or maximized expected returns, emphasizing that risk is an intrinsic component of high returns. It assumes that investors are risk-averse and prefer a portfolio with less risk in the case of a certain return. Only when they expect more return will they be willing to take more risks and shows that investors can build a multi-asset portfolio that will get the maximum return at a given risk level.

In addition to rational investors, the market also includes irrational and high-risk investors, who may influence the market price, and they cannot have unlimited access to credit. Therefore, the hypothesis of the theory is not fully established, and the effective boundary theory and modern portfolio theory have certain limitations. When asset returns are normally distributed, they do not correctly represent realistic assumptions.

In this project, we implement a portfolio selection strategy based on a stock network\cite{Tse2010}. The strategy is similar to MPT in the sense that it assesses the portfolio risk-return and selects the stocks that minimize the risk, and by doing so we would obtain better returns.

\section{Trading premises}

In Brazil, securities are processed \emph{B3} and regulated by the Securities Commission of Brazil (CVM) that is independent but directly linked with the Brazilian Ministry of Finance. It regulates markets such as the stock exchange, financial intermediaries, and public companies.

Since March 30, 2017, the Brazilian stock market is unified at \emph{B3}. Before that, most of the companies were listed on the Sao Paulo Stock Exchange (\emph{Bovespa}). The transaction premise has not changed during the period of this strategy, and it can be expected that the transaction premise of Brazil's financial market will not change in the short term.


\chapter{The strategy}

\section{Description}

Stock prices have complex dynamics and their movements depend on many factors from economic, financial, and behavioral aspects of the market and its agents. Our strategy aims to identify correlations between stocks and select the best for a long term investment. In short, we will use a special kind of correlation metric that is suitable for time series and build a network by connecting the stocks based on their "loss-spreading factor" (how the stocks are correlated, and to how many other stocks they are correlated).

Financial time series forecasting is a very complex problem, and Pearson's correlation may not be appropriate to measure the dependencies between the stocks because it detects only linear relations, not to mention that it can have a value equal to zero when the series are dependent\cite{Szekely2007}. We will use the Distance Correlation metric\cite{Szekely2007, Szekely2009}, which is a correlation measure that captures both linear and non-linear relations in the data. From its definition, it also allows time series with different dimensions to be compared but we will study only stocks in the same period in this study. According to \cite{Szekely2007}, the distance covariance is defined as: Let $X, Y$ be two real-valued random variables (vectors) and $(X_1, Y_1), \ldots, (X_n, Y_n)$ be an n-size sample. Then, we first compute the pairwise distances for all $j,k \in \{1, \ldots, n\}$:

\begin{align}
a_{j,k} &= \norm{X_j - X_k}\\
b_{j,k} &= \norm{Y_j - Y_k}
\end{align}

Then, define the matrices:

\begin{align}
A_{j,k} = a_{j,k} - \bar{a}_{j,\cdot} - \bar{a}_{\cdot,k} + \bar{a}_{\cdot,\cdot}\\
B_{j,k} = b_{j,k} - \bar{b}_{j,\cdot} - \bar{b}_{\cdot,k} + \bar{b}_{\cdot,\cdot}
\end{align}

where $\bar{a}_{\cdot,k}$ are the $k$-th columns mean, $\bar{a}_{j,\cdot}$ are the $j$-th row mean and $\bar{a}_{\cdot,\cdot}$ is the total mean. Same notation for matrix $B$. Then, the distance covariance is defined by:

\begin{equation}
  dCov_n^2(X,Y) := \frac{1}{n^2} \sum_{j=1}^{n}\sum_{k=1}^{n}A_{j,k}B_{j,k}
\end{equation}

And the correlation, as usual, is defined by:

\begin{equation}
  dCor(X,Y) := \frac{dCov(X,Y)}{\sqrt{dCov(X,X) dCov{Y,Y}}}
\end{equation}

With the distance correlation between all assets computed, we can define our weighted stock network using the winner-take-all method \cite{Tse2010}. This method defines the edges weight matrix using a threshold value $\rho_c$. We want this hyperparameter to be "big enough" (we want to limit the number of connections between low correlated stocks) but at the same time not too big so that the graph remains irreducible (fully connected). The final correlation matrix of the network is given by:

\begin{align}
Cor_{ij} =
  \begin{cases}
    \rho_{D}(X_{i}, Y_{j}), & \rho_D \leq \rho_{c} \\
    0, & \text{otherwise}.
  \end{cases}
\end{align}

where $\rho_D(X_i, Y_j)$ is the distance correlation between time series $X_i$ and $Y_j$.

The next step is to create the stock network. For this, we want to represent the stocks with weights that are inversely proportional to the distance correlation. Thus, we use $1-Cor_{ij} \in [0, 1]$ as the similarity matrix that will generate the network weights.

Our strategy will use network theory to identify portfolios that have minimal systemic risk. Thus, we define our strategy as:


\begin{portfoliochoice}
Given a set of assets $\mathbf{S}$ containing $N$ stocks, we want to find weights $\mathbf{w} \in [0, 1]^N$ such that $w_i$ is bigger if the corresponding stock has low systemic risk and $w_i$ is smaller if the corresponding stock has high systemic risk.
\end{portfoliochoice}

Our systemic risk measurement is described next.

\section{Risk management}

Our strategy aims to identify portfolios with minimal risk, so that our investment may obtain higher returns without being exposed to higher risks. Instead of using the variance as the main indicator of portfolio risk, in our strategy, we use network theory to assess the portfolio risk taking into account the systemic risk (the spread of losses due to correlation between stocks).

We build our network, as described in the previous section, with the similarity matrix defined as $1-Cor_{ij}$. This defined the distance between the nodes in the network. Since we adopted the \emph{Winners-Take-All}\cite{Tse2010} method to control how our stock network is connected, we limited the node distance to a certain threshold of $\rho_c$. Then, we use the \emph{Communicability Betweenness Centrality}\cite{Estrada2009} of the network to measure the portfolio risk. This measure is defined for each node as a fraction between the walks passing through this node and all the possible paths in the network. The more connections the node has the higher its communicability betweenness score. Also, the higher the score, the more the node is capable of spreading its impacts. Formally, following \cite{Estrada2009}:

\begin{communicability}
Let $G = (V, E)$ be a graph with $n$ nodes and $m$ edges with adjacency matrix $A$. Define $G(r) = (V, E')$ the graph constructed by removing all edges that connect to the node $r$, keeping the node itself. Let $G_{prq}$ represent the number of paths between nodes $p$ and $q$ that passes through the node $r$ and $G_{pq}$ be all possible paths between $p$ and $q$. Then, the communicability betweenness centrality is defined by:

\begin{equation}
  \omega_r = \frac{1}{(n-1)(n-2)}\sum_p \sum_q \dfrac{G_{prq}}{G_{pq}}
\end{equation}
\end{communicability}


In our strategy, we want to allocate our capital inversely proportional to the risk of the stock.


\chapter{Strategy implementation}

\section{Data preprocessing}

\subsection{Acquisition of the data}

The stock market is regulated by CVM but is operated by the stock exchange \emph{B3}. The historical data is publicly available on the \emph{B3} website\footnote{Available at: \url{http://www.b3.com.br/en_us/market-data-and-indices/data-services/market-data/historical-data/equities/historical-quotes/}}.

\begin{figure}[H]
  \centering
  \includegraphics[width=0.6\linewidth, keepaspectratio]{images/b3_historical_quotes.png}
  \caption{B3 publicly available historical quotes}
  \label{fig:B3hist}
\end{figure}

We collected historical data from 2013 to May 2020. The data contain the trading Day, Open, Close, Low, and High prices and the Volume traded for each trading day.

\begin{table}[H]
  \centering
  \caption{B3 Historical data}
  <<engine="python", echo = FALSE, results="asis">>=
  # Get the data
  df = pd.read_csv(r"data/20130102_20200529_daily.csv", index_col=0).head().copy()
  # df.index = df.index.strftime("%Y-%m-%d")
  df["Volume"] = df.Volume.apply(lambda x: "{:,.0f} million".format(int(x/1000000)))
  print(tabulate(df, headers=['Day', 'Ticker', 'Open', 'Low', 'High', 'Close', 'Volume', 'Company Name'], tablefmt="latex", floatfmt="0,.2f", numalign="right"))
  @
  \caption*{The collected data for the prices time series contains six columns: Day represents the time date of the series, Open, Close, High, and Low are, respectively, the prices when the trading start, ends, the highest and lowest prices in the day and Volume corresponds to the traded amount in the day.}
\end{table}

\subsection{Pre-selection of stocks}

We collected the historical data for all 330 companies, which may have more than one listing (for example, \emph{Banco Itaú} have the preferred - ITUB3 and ordinary - ITUB4 stocks listed). So, we defined the following criteria to pre-select the stocks that would be added to our analysis:

\begin{enumerate}
  \item Select only stocks that have minimum liquidity. We want our strategy to be freely available to trade the stocks on the portfolio, without incurring in liquidity risk. In this direction, we defined a threshold of average BRL 5.000,00 volume. We exclude most of the stocks in this selection, with $124$ remaining.
  \item Select only one type of stock per company. This allows us to remove highly correlated stocks because they are from the same company. With exclude 5 additional stocks with these criteria: \emph{BBDC3}, \emph{CMIG3}, \emph{ITUB3}, \emph{LAME3}, and \emph{PETR3}.
  \item We remove stocks with too many missing values.
  \item We remove stocks that aren't connected (using $\rho_c = 0.4$) in the network.
\end{enumerate}

For the last part, we did a visual analysis of the networks generated by the stocks time series and the correlation threshold. A threshold of $\rho_c=0.15$ keeps most of the network connections alive, so we could not identify the differences between the stocks \ref{fig:HClose015}

\begin{figure}[H]
  \centering
  \includegraphics[width=0.6\linewidth, keepaspectratio]{images/H_close_15.png}
  \caption{Close prices network for $\rho_c=0.15$}
  \label{fig:HClose015}
\end{figure}

For $\rho_c=0.325$, the network still keeps too many connections, as seen in image \ref{fig:HClose0325}.

\begin{figure}[H]
  \centering
  \includegraphics[width=0.6\linewidth, keepaspectratio]{images/H_close_325.png}
  \caption{Close prices network for $\rho_c=0.325$}
  \label{fig:HClose0325}
\end{figure}

Finally, for $\rho=0.4$ we observe that we remove some of the connections and can observe different relations between the stocks. The network can be seen in image \ref{fig:HClose04}.

\begin{figure}[H]
  \centering
  \includegraphics[width=0.6\linewidth, keepaspectratio]{images/H_close_04.png}
  \caption{Close prices network for $\rho_c=0.4$}
  \label{fig:HClose04}
\end{figure}

After we applied the above criteria to our stocks, we ended up it a total of $40$ assets that will possibly be part of our portfolio.

\subsection{Data transformation}

As described above, we use the time series of the $40$ stocks to build a distance correlation matrix. We used the \emph{Python} package \emph{dcor}\footnote{\url{https://dcor.readthedocs.io/en/latest/index.html}}, and obtained the distance matrix between all stocks:

\begin{table}[H]
  \centering
  \caption{Distance correlation between 5 stocks}
  <<engine="python", echo = FALSE, results="asis">>=
  # Distance correlation
  df_train_dcor = pd.read_csv("data/close_prices_dcor.csv", index_col=0)
  print(tabulate(df_train_dcor.iloc[0:5, 0:5], headers=['ABCB4', 'BBAS3', 'BBDC4', 'BRAP4', 'BRML3'], tablefmt="latex", floatfmt="0,.2f", numalign="right"))
  @
  \caption*{The total distance matrix is of dimension $40\times40$.}
\end{table}

Then we build the stock network using the threshold of $\rho_c=0.4$. As seen in image \ref{fig:DegreeHist}, the stocks on this network have an average degree (number of connections) of $9$, and its distribution is skewed right.

\begin{figure}[H]
  \centering
  \includegraphics[width=0.6\linewidth, keepaspectratio]{images/H_close_degree_plot.png}
  \caption{Network degree distribution}
  \label{fig:DegreeHist}
\end{figure}

With the network built, we can start working on our strategy.

\section{Strategy design}

\subsection{Minimal Risk Portfolio}

Similar to the theory of portfolio optimization, we are interested in finding a portfolio with minimum risk. we will call this strategy the Minimal Risk Portfolio (MRP). To this strategy, we start by computing the intra-portfolio risk, so that we can make investment decisions that invest less capital in riskier stocks. To compute the portfolio risk, as mentioned before, we use the \emph{communicability betweenness centrality} measure. We can observe the portfolio risk in image \ref{fig:PortfolioRisk}.

\begin{figure}[H]
  \centering
  \includegraphics[width=0.6\linewidth, keepaspectratio]{images/intraportfolio_risk.png}
  \caption{Intra-portfolio risk}
  \label{fig:PortfolioRisk}
\end{figure}

We read an intra-portfolio risk plot like this: \emph{VALE3} (\emph{Companhia Vale do Rio Doce}) is $\dfrac{0.41}{0.07} = 5.86$ times riskier than \emph{GGBR4} (\emph{Gerdau}), \emph{BBDC4} (\emph{Banco Bradesco}) is $\dfrac{8.33}{1.65} = 5.05$ times riskier than \emph{RENT3} (\emph{Localiza}), ... , and \emph{ITSA4} (\emph{Itaú S.A.}) is $\dfrac{8.59}{0.16} = 53.69$ times riskier than \emph{EMBR3} (\emph{Embraer})!

With this strategy, stocks that are more connected to others (more central in the network) have the highest susception to impacts. Thus, we will invest the capital based on the inverse of the risk. We will also use a \emph{softmax} function to smoothen the distribution and avoid investing too big a share in the least risky stock. We used a \emph{temperature} value of $1.5$:

\begin{align}
  \text{w'}_{r} &= \dfrac{1}{\omega_{r} \sum_{r'}\omega_{r'}^{-1}}\\
  e_r &= e^{\frac{ln(\text{w'}_{r})}{\text{temp}}}\\
  \text{w}_{r} &= \dfrac{e_r}{\sum_{r'}e_{r'}}
\end{align}

And we obtain the following distribution for a USD 10,000.00 initial capital investment:

\begin{figure}[H]
  \centering
  \includegraphics[width=0.6\linewidth, keepaspectratio]{images/PortfolioAllocation.png}
  \caption{Portfolio allocation based on intra-portfolio risk}
  \label{fig:PortfolioAllocation}
\end{figure}

We observe that with this strategy, around 35\% of the initial capital will be invested in one stock. This might seem counterintuitive since the idea is to diversify the portfolio, but according to the risk analysis, this stock is the least prone to financial impacts based on the training data.

\subsection{Maximal Independent Set}

Another strategy based on network analysis is the Maximal Independent Set (MIS)\footnote{\url{https://en.wikipedia.org/wiki/Maximal_independent_set}}. This strategy selects the non-adjacent stocks that are the most representative in the network, in such a way that the network remains connected by the selected stocks (they form a dominating set\footnote{\url{https://en.wikipedia.org/wiki/Dominating_set}}).

Since the number of independent sets can be very large, instead of finding all the independent sets in order to find the biggest one (the MIS), we simulate $500$ randomly selected independent sets, and from this sample, we select the maximum one.

\subsection{Efficient frontier}

The efficient frontier finds the optimal portfolio based on the risk-return associated with the possible portfolios and compare it with a risk-free interest rate. For a rational investor, it would not be advised to invest in a risky portfolio that has expected return less than or equal to the risk-free rate. By comparing the expected return of the portfolios with their risk level, measured by the volatility of its stocks, the rational investor can choose the set of stocks that minimizes the risk (volatility) given a fixed expected return value. As seen in image \ref{fig:efficient_frontier}, there are many combinations of stocks that can form a portfolio, each with its own values of risk and return. The tangent line represents the expected return given by the risk-free rate.

\begin{figure}[H]
  \centering
  \includegraphics[width=0.6\linewidth, keepaspectratio]{images/efficient_frontier.jpg}
  \caption{Efficient Frontier}
  \label{fig:efficient_frontier}
\end{figure}

Portfolios that are below the efficient frontier and concentrated on the right side of the image are sub-optimal because they do not provide sufficient returns for the level of risk and have a higher risk for a certain rate of return. The optimal portfolio, then, is the one that is at the intersection of the tangent line and the set of all possible portfolios.

In our implementations, we used the \emph{Python} package \emph{pypfopt}\footnote{\url{https://github.com/robertmartin8/PyPortfolioOpt}} that has the implementation for the efficient frontier portfolio. We use this portfolio as a benchmark for comparing with our strategies.

\section{Backtesting}

In this section, we will execute our strategy on historical data. We divided the data into training (used for fitting the network) and validation (used for backtesting). The training data contains the time series from 2013-01-02 to 2016-12-29, and the validation set contains data from 2017-01-02 to 2020-05-29.

Based on the portfolio risk fitted on the training set, we assume that we make our investment on the last day of the training data, and compare our strategy with traditional ones. We will compare the results of the following strategies:

\begin{itemize}
  \item Minimal risk portfolio (MRP) (seen above)
  \item Maximal Independent Set (MIS)
  \item Efficient Frontier as proposed by Markowitz \cite{Markowitz1952}
\end{itemize}

We will also compare the returns of these strategies with the historical returns of two indexes from the Brazilian market:

\begin{itemize}
  \item \emph{Ibovespa}: The benchmark index for the Brazilian market, representing the biggest companies listed on \emph{B3} (currently contains $77$ companies);
  \item \emph{SMLL index}: Index containing the smaller companies (small cap) (currently contains $90$ companies)
\end{itemize}

We have continuous allocation shares for the strategies, and we will use a discrete allocation methodology contained in the \emph{Python} package \emph{pypfopt}\footnote{\url{https://github.com/robertmartin8/PyPortfolioOpt}}. The MRP obtain the following shares distribution:

\begin{table}[H]
  \centering
  \caption{MRP initial shares}
  <<engine="python", echo = FALSE, results="asis">>=
  # Distance correlation
  df_alloc = pd.read_csv("data/mrp_alloc.csv")
  print(tabulate(df_alloc, headers=['Stock', 'Shares'], tablefmt="latex", floatfmt="0,.2f", numalign="right", showindex=False))
  @
  \caption*{By allocating multiples of the shares, we obtained the distribution shown above for the MRP portfolio. The total invested capital was $8,927.95$.}
\end{table}


The MIS strategy will invest as below:

\begin{table}[H]
  \centering
  \caption{MIS initial shares}
  <<engine="python", echo = FALSE, results="asis">>=
  # Distance correlation
  df_mis_alloc = pd.read_csv("data/mis_alloc.csv")
  print(tabulate(df_mis_alloc, headers=['Stock', 'Shares'], tablefmt="latex", floatfmt="0,.2f", numalign="right", showindex=False))
  @
  \caption*{By allocating multiples of the shares, we obtained the distribution shown above for the MIS portfolio. The total invested capital was $9,347.39$.}
\end{table}


And the EF this one:


\begin{table}[H]
  \centering
  \caption{MIS initial shares}
  <<engine="python", echo = FALSE, results="asis">>=
  # Distance correlation
  df_ef_alloc = pd.read_csv("data/ef_alloc.csv")
  print(tabulate(df_ef_alloc, headers=['Stock', 'Shares'], tablefmt="latex", floatfmt="0,.2f", numalign="right", showindex=False))
  @
  \caption*{By allocating multiples of the shares, we obtained the distribution shown above for the EF portfolio. The total invested capital was $9,986.05$.}
\end{table}

By investing in these portfolios, we can measure the performance in the validation period. The return evolution can be seen in image \ref{fig:TestReturnCovid}.

\begin{figure}[H]
  \centering
  \includegraphics[width=0.6\linewidth, keepaspectratio]{images/test_performance_include_covid19.png}
  \caption{Returns on the validation set}
  \label{fig:TestReturnCovid}
\end{figure}

Pictured above are the daily returns for MIS (solid green curve), MRP (solid blue curve), and the Efficient Frontier (solid red curve) portfolios from January 2017 to May 2020. The color-coded dashed curves represent the 20-day rolling averages of the respective portfolios.

We can observe the following:

\begin{enumerate}
  \item Efficient frontier has a much lower performance in the period;
  \item MRP and MIS portfolios have similar dynamics;
  \item The results from MRP and MIS suggests that these approaches have good performs; and
  \item We want to remark that the Brazilian market was not too stable in recent years due to uncertainties in the political, economic, and social perspectives.
\end{enumerate}
Next, let's observe the annual returns for each portfolio and compare them with the market.


\begin{table}[H]
  \centering
  \caption{Comparison of the strategies and indexes returns on the validation period}
  \resizebox{\linewidth}{!}{%
  <<engine="python", echo = FALSE, results="asis">>=
  # Distance correlation
  returns_summary = pd.read_csv("data/returns_summary.csv", index_col=0)
  returns_summary.index.name = "Year"
  cols=["MRP", "MIS", "EF",	"Ibovespa",	"SMLL",	"MRP Rates",	"MIS Rates",	"EF Rates"]
  print(tabulate(returns_summary, headers=cols, tablefmt="latex", floatfmt="0,.2f", numalign="right"))
  @
  \label{tab:comparisonhist}
  }
  %\caption*{}
\end{table}


MRP and MIS substantially outperformed both the Ibovespa and SMLL indexes, as well as the Efficient Frontier. The higher returns, in theory, should be obtained with the trade-off of increasing the risk of the portfolio. But as we will see in the next section, this did not occur with our strategies.

\section{Visualizing drawdowns}

Illustrated in image \ref{fig:Drawdowns} is the daily rolling 252-day drawdown for MIS (green), MRP (blue), and the Efficient Frontier (salmon) along with the respective rolling maximum drawdowns (solid curves).

\begin{figure}[H]
  \centering
  \includegraphics[width=0.8\linewidth, keepaspectratio]{images/drawdown.png}
  \caption{Drawdown plot of the strategies}
  \label{fig:Drawdowns}
\end{figure}

From this image, we note:

\begin{enumerate}
  \item the MRP and MIS portfolios have significantly smaller drawdowns than the Efficient Frontier portfolio;
  \item All portfolios have roughly the same maximum drawdown (around 40-45\%) achieved in the \emph{COVID-19} crisis period; and
  \item MRP rolling maximum drawdowns are, on average, less pronounced than MIS. These results suggest the communicability betweenness centrality has predictive power as a measure of relative or intra-portfolio risk, and more generally, that network-based portfolio construction is a promising alternative to the more radiational approaches like MPT.
  \item If we do not consider the turbulent period from mid-2018 (many political events in Brazil - the arrest of former president Luis Inácio Lula da Silva, presidential elections), we can see that the MRP and MIS shows a much lower drawdown curve.
\end{enumerate}


Finally, we can compare the performance metrics for the three portfolios:


\begin{table}[H]
  \centering
  \caption{Performance metrics for the three strategies}
  <<engine="python", echo = FALSE, results="asis">>=
  # Distance correlation
  backtest_stats = pd.read_csv("data/backtest_stats.csv", index_col=0)
  backtest_stats.index.name = "Metric"
  print(tabulate(backtest_stats, headers=["MRP", "MIS", "EF"], tablefmt="latex", floatfmt="0,.2f", numalign="right"))
  @
  %\caption*{}
\end{table}


MRP and MIS outperformed the Efficient Frontier on every metric. Also, we note from table \ref{tab:comparisonhist} that the return rates from MRP and MIS are greatly superior to that of EF in the years of 2017, 2018, and 2019. In 2020 we have a market crash due to the \emph{COVID-19} pandemic, and this is reflected in all strategies. These results suggest that our strategy has the potential to be used in a real-world investment.


\chapter{Conclusion}

\section{Conclusion remarks}

We designed an algorithm to generate a minimum risk portfolio (MRP) asset weights using tools from network science. First, an asset-related statistic is established, and then an appropriate centrality measure is used to extract the asset weight. As an intermediate step, we interpret the centrality score as a measure of relative risk because it captures asset volatility and their impact on the other assets in the network.

In addition, we designed a second strategy by allocating the capital by Maximal Independent Set (MIS). This strategy finds the subset of stocks that guarantee all the stocks in the network are connected. They are the most representative stocks.

Our strategies were compared with the Modern Portfolio Theory portfolio given by the Efficient Frontier (EF) method.

The portfolios were being assessed by cumulative return, rate of return, volatility, maximum withdrawal, risk-adjusted return, and risk-adjusted-performance. In all performance indicators, Hedgecra ȅ algorithm is significantly better than the Efficient frontier of the portfolio and market.

\section{Future work}

Our model estimates parameters of the network based on the historical data of the training set. The use of the network betweenness centrality measure proved to be effective for minimizing the portfolio risk. Unfortunately, these dependency relations between the assets are not constant in time (the series are not stationary)\cite{Kenett2012, Hommes2002} and our model fails to adapt dynamically for different periods, especially for crisis-non crisis periods when the assets' correlations can vary significantly.

To extend our model, we can make use of advanced random processes techniques such as Bayesian sampler (for example, the \emph{No-U-Turn Sampler}\cite{hoffman2011nouturn} or Sequential Monte Carlo \cite{Doucet2009}) to model our strategies parameters. These techniques can be extended to estimate time-varying parameters that could further improve the performance of the model in different periods.

Another approach would be to implement Copula\footnote{\url{https://en.wikipedia.org/wiki/Copula_(probability_theory)}} in our network, so we could model the dependency between the stock as a non-linear function that changes over time. Some new researches show results in this direction, for example, \cite{Chatzis2012, Kenourgios2011, Oh2018}.

Finally, because the distance correlation can be applied to time series of different lengths, it would be a good choice for online applications where each stock series may have different lengths of their historical data, as well as more recently listed companies would also be ready to be added to the analysis. We want to extend our strategy to dynamically re-evaluate the portfolio periodically by updating the network when new information is available and possible change the portfolio weights and composition through time.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Bibliography:
%%
\cleardoublepage
\phantomsection
\addcontentsline{toc}{chapter}{References}
\nocite{*}
\bibliographystyle{plainnat}
\bibliography{bibliography}


\clearpage\end{CJK*}
\end{document}