Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create PostPairsTrading.md #11

Open
wants to merge 69 commits into
base: 2-basic-study-of-pair-trading-using-linear-regression
Choose a base branch
from

Conversation

Kokechacho
Copy link
Contributor

Description:

This pull request aims to create a Markdown file for a post on pairs trading, aggregating all the content previously developed. The post will serve as a comprehensive guide to pairs trading, covering various aspects such as the concept, rationale, implementation, and practical examples.

Changes Proposed:

  1. Create a new Markdown file named "PostPairsTrading.md".

  2. Structure the Markdown file with appropriate headings and sections, including:

  • Introduction to Pairs Trading

-Rationale and Concept

-Methodology and Implementation

-Practical Examples

-Conclusion

  1. Incorporate content from previous materials, including explanations, code snippets, and visualizations, ensuring clarity and coherence.

  2. Review and proofread the content to ensure accuracy, consistency, and conciseness.

  3. Include references or citations for any external sources or materials used in the post.

Purpose:

The purpose of this pull request is to compile and organize the existing materials on pairs trading into a cohesive and informative Markdown file. By creating a structured and well-documented post, we aim to provide readers with a comprehensive resource for understanding and implementing pairs trading strategies effectively.

Copy link
Contributor

@chraberturas chraberturas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work 👍
Just try to explain more about the q code (even briefly).
And another thing, pls put all your images/gifs in a separated folder

PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Show resolved Hide resolved
Copy link
Member

@neutropolis neutropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just reviewed the first half of the article. I like the style, it's easy to follow, but I think that we need to refactor several snippets of code and reorder some sections to make the article flow better.

PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
Copy link

@nipsn nipsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall ok, but there were some small spelling mistakes.

The most important thing is that if you decide to use a set nomenclature for something, i.e. "Pairs Trading", keep it consistant, it can't be "pairs trading" one time, then "Pair trading" the next one, etc...

Same goes for "pykx", but in this case, it should always be "PyKX". I have noticed your GIFs has this mixed up as well so please fix them.

If this is going to our blog, all code mentions outside code blocks shoud be done using backticks "`" surrounding them. I'm not sure I have pointed to all of them, so please go through the post line by line and check for this.

Also, I'm not sure how the blog handles markdown quotes (lines beginning with > ). At the moment we can keep those, but we may need to change it later on before merging to the main branch on our blog.

Also, shouldn't it be "kdb+/q" or something similar? I have never seen "Q/kdb+" in the wild. Please make this change.

I have most likely missed some things, so I may need to give it a quick review later on.

PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved

Be sure to stay tuned for more posts and updates on this blog to deepen your knowledge even further.

Special thanks to [...] for [...]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be unfinished...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete the "special thanks", don't forget Javier.

PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved

For now, it's crucial to clarify **our spread formulation and understand what it represents**. With this knowledge, we can identify instances where one asset is overpriced while the other is underpriced.

One might argue that our calculations are heavily influenced by past data, and that we rely too much on historical changes that **may not accurately reflect the present reality**. This is indeed a **valid concern**. To address this issue, we can utilize **the Kalman Filter**, a mathematical method for filtering noise and predicting states in a dynamic system. But we'll delve into the Kalman Filter in our upcoming posts as previously mentioned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the second reference to a future Kalman filter post in this section. I think there's some redundancy with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true but I am figuring out how to put it. Because next post will explain two different things regarding pairs trading:

  1. Signal windows
  2. Kalman filter

Copy link
Member

@neutropolis neutropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor changes after the new review.

Also, we had the discussion about making the linear regression section more accessible, with visual content.

PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
![Prices](https://github.com/hablapps/pairstrading/blob/5-Post/resources/Prices%20gif.gif?raw=true)


The graphs illustrate the concept of cointegration between two indexes. The top two graphs show the prices of SP500 (left) and NASDAQ100 (right) over the same time period. We can observe that the price movements of these two indices follow similar patterns, suggesting some level of cointegration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we (very briefly) clarify it in the text?


The graphs illustrate the concept of cointegration between two indexes. The top two graphs show the prices of SP500 (left) and NASDAQ100 (right) over the same time period. We can observe that the price movements of these two indices follow similar patterns, suggesting some level of cointegration.

The bottom graph displays the prices of both indices together, providing a clearer comparison. The blue line represents SP500, and NASDAQ100 is represented by the red line. The close alignment of their price movements indicates that they are cointegrated to some extent. This means that, despite short-term deviations, the indices tend to move together as time goes on, maintaining a stable relationship.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I already mention this in a previous review. What do we consider short-term here? The graphic we are showing is just 15 seconds or so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've added an image that we think clarifies everything here.

PostPairsTrading.md Outdated Show resolved Hide resolved
@@ -0,0 +1,247 @@
# A Match Made in Trading: Step-by-Step Pairs Trading Guide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you ok with this title?

PostPairsTrading.md Outdated Show resolved Hide resolved

Using this approach, we will end up with something like this:

![SpreadsD](resources/spreads.gif)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't loading properly in Github.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still can't see this, so can't review.

spread: priceY[.streamPair.i][`bid] - ((priceX[.streamPair.i][`bid] * beta_lr)+alpha_lr);
```

> 💡 You may notice that we retrieve bid price data from our price stream using an index (`.streamPair.i`). This occurs because we simulate the arrival of these records dynamically, based on a delta time, and thus read from our real-time simulated table, utilizing our updated index `.streamPair.i` with each new record.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what is going on here, but it's hard to follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored

PostPairsTrading.md Show resolved Hide resolved

Now that we have selected a pair of cointegrated indices and understand how to calculate their relationships, let's see how we can create a real-time pair trading scenario.

> ⚠️ An important note is that this post will include a real-time simulation. In other words, if we wanted to develop a 100% real-time product, we would need to make slight adjustments to the code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this post include a real-time simulation? I think we aren't discussing that further on, I mean, we aren't showing data here. Maybe you just wanted to refer to the repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its more clear now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this warning further on? Later, you mention again that this is a simulation ("In our case, we are simulating real time; we do not have a 100% real-time product.") and I feel it's redundant. Could we integrate both references in the same paragraph?

PostPairsTrading.md Outdated Show resolved Hide resolved

After all this, you should be able to understand **our spread formulation and what it represents**. Basically, we can identify instances where one asset is overpriced while the other is underpriced.

One might argue that our calculations are heavily influenced by past data and that we rely too much on historical changes that **may not accurately reflect the present reality**. This is indeed a **valid concern**. To address this issue, we could implement **a rolling window approach** where the linear regression is continuously updated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we start here with future work, I'd move it to the conclusions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored

PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
ADF.q Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved

Using this approach, we will end up with something like this:

![SpreadsD](resources/spreads.gif)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still can't see this, so can't review.

PostPairsTrading.md Outdated Show resolved Hide resolved
resY: priceY[.streamPair.i];
s: resY[`bid] - alpha_lr+resX[`bid] * beta_lr;
enlist `dateTime`spread`mean!
("p"$(resX[`dateTime]);"f"$(s);"f"$(0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So mean is always 0? Do we need parentheses?

I think it's ok to have a simplification but I don't get the objective of this function, I mean, it just returns a dictionary but nobody is collecting the result into a table or sending it to the dashboard. Maybe I'm getting it wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when i simplificated i missed the part where we publish. I've just added it :)


A simple approach to window signals is to set these windows as twice the historical standard deviation of the spreads. Therefore, if either of these limits is reached, we should sell the overvalued index and buy the undervalued one, and then unwind our position when the spread returns to 0. Let's clarify this with a specific example:

![SpreadsD](resources/window_signals.gif)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice! 👍

Just one minor thing, what are the values in the X axis?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The x-axis should represent time, but we struggled to plot in real time because it froze when we tried to edit the x-axis labels.


Now that we have selected a pair of cointegrated indices and understand how to calculate their relationships, let's see how we can create a real-time pair trading scenario.

> ⚠️ An important note is that this post will include a real-time simulation. In other words, if we wanted to develop a 100% real-time product, we would need to make slight adjustments to the code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this warning further on? Later, you mention again that this is a simulation ("In our case, we are simulating real time; we do not have a 100% real-time product.") and I feel it's redundant. Could we integrate both references in the same paragraph?


KDB+/Q stands out as **a powerful tool in finance**, renowned for its ability to handle vast volumes of real-time data amidst the relentless dynamics of the market. In this article, we embark on an insightful exploration of Pairs Trading and its implementation in Q, offering a comprehensive guide to one of the most popular strategies in the trading world.

Our objective is to **provide an easy-to-understand explanation of some of the intricacies of pair trading**, bridging the gap between theory and practice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a derived objective, but not the main one. With the new perspective, the main objective is:

  • Implementing Pairs Trading strategy in q
    • Identify cointegrated indexes
    • Implement model to calculate spreads
  • Get it working in real-time
    • Integrate with tick architecture


![Architecture](resources/general-architecture.png)

As you can see, there are a few new components added to the picture. The details will be discussed as we go on, but I'm sure you can already recognize some familiar faces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should assume that this is their very first contact with kdb (so they shouldn't recognise elements from the diagram). Here I would clarify that this article is aimed at quants in order to motivate them to adopt kdb+/q, due to its incredible performance and concision. But also, we'll supply a brief introduction to Pairs Trading so other developers can benefit from it as well.


1. They have a similar trend, meaning the difference between both assets maintains a constant mean, and this difference fluctuates around that same mean.

2. This inherent relationship persists in the long run, meaning that our series is not dependent on time.
Copy link
Member

@neutropolis neutropolis Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should add a new section, which supplies a very brief intro to the components of the classical Tick Architecture. I would end up clarifying the next section. First, we'll identify cointegrated pairs. Then, we'll produce the Pairs Trading model to calculate spreads. Finally, we'll exploit such model from a real-time component and will visualise its outcomes.


## A pair in the hand is worth two in the bush

### ADF testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, this section becomes "Identifying cointegrated indexes" or something like that...


**P-values** help us decide whether to reject the null hypothesis. If the p-value is low, it indicates that we can reject the hypothesis that the time series is non-stationary, suggesting that our assets are cointegrated. The lower the p-value, the greater the confidence in rejecting the null hypothesis. It is very common to use a threshold of 0.05 on the p-value to reject hypotheses.

For the sake of simplicity, we will be using [PyKX](https://code.kx.com/pykx/2.4/index.html). This is necessary as we require importing our ADF test function and plotting a heatmap of our results. Developing these functionalities directly in Q would be time-consuming and prone to errors. Although an implementation of the ADF test in KDB+/Q would be more efficient and faster, the effort required would outweigh the benefits. Therefore, we rely on PyKX to streamline the process by leveraging relevant libraries from the Python ecosystem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... outweigh the benefits for this particular case where performance isn't critical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we import pykx right here as we did before?


> 💡 This strategy possesses financial characteristics: our **profitability remains unaffected by the broader market trends**, as our focus lies solely on the disparity between the two assets. It's about relative movements rather than absolute ones; we're indifferent to whether prices are rising or falling. This quality defines it as a **neutral market strategy**.

## Spreading spreads
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now let's go for the Pair Trading model that calculates spreads. Could we find a nicer name for this section?

alpha_lr:alphaF[t[`SP500]`close;t[`NASDAQ100]`close]
```

This precisely meets one of our objectives: getting **a comprehensive method for representing relative changes between both assets**. As we can deduce, our mean is now 0 because our assets are normalized, cointegrated and on the same scale. Therefore, ideally, the differential between their prices should be 0. Consequently, when our spread is below 0, we infer that asset X is overpriced, whereas if it's above 0, then asset Y is overpriced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we should rely on the diagram here. We can emphasise the HDB, the line that reads from the HDB and the RPT (but not emphasising dashboards).

```q
.z.ts: {.u.pub[`spreads;update priceY - alpha_lr+priceX * beta_lr from prices]}
\t 100
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss the pub method from the RPT component itself, where you accumulate on the prices table. Which component is responsible for cleaning the published spreads?


And there we have it! **A perfectly plotted spread series in real-time**, ready to be utilized for further analysis and exploitation.

## What's left to start making a profit?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should create another section for this, maybe we could place this at the end of the previous one.


> 💡 Signal windows play a pivotal role in implementing Pairs Trading strategies. They serve as indicators for determining when to execute buy and sell actions, acting as arbitrary thresholds that guide our algorithm's decision-making process. These windows are derived from the variance of our data, representing a static variance assumption due to our consideration of a time-independent cointegrated series.

## Conclusion and Future Work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content is fine, but requires some reordering. We should explicitly address the objectives that we stated in the introduction. Future work is fine.


We will be covering the following aspects using the KDB tick architecture as a base:

![Architecture](resources/general-architecture.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very important: we need to add a reference to the original picture from DefConQ


```q
pyhm:.pykx.import[`seaborn]`:heatmap
pyhm[pvalues;`xticklabels pykw syms;`yticklabels pykw syms;`cmap pykw `RdYlGn_r]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a very short sentence describing what is going on here. I'm unable to decipher the RdYlGn_r.

In our case we are going to use closing prices to our ADF test, so we have to index (`@`) by column **close** from each pair in our table. Additionally, we take (`#`) the last **tr** days of data for both indexes, and finally apply our **fcoint** function to each (`.'`) pair of data lists.

```q
matrix:fcoint .' 0f^@\:[;`close](@/:[t]')syms cross syms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to just select sym and close in rs? (Take a look at my proposed text #13 to understand why)

We simply need to declare an `sp` function that calculates the spread given the prices, but this is something we already know how to do.

```q
sp:{y - alpha_lr + beta_lr * x};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be defined in the previous section, since the model is an input for this step. I prefer hiding alphas and betas in this section. What do you think?


Be sure to stay tuned for more posts and updates on this blog to deepen your knowledge even further.

Special thanks to [...] for [...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete the "special thanks", don't forget Javier.


> 💡 This strategy possesses financial characteristics: our **profitability remains unaffected by the broader market trends**, as our focus lies solely on the disparity between the two assets. It's about relative movements rather than absolute ones; we're indifferent to whether prices are rising or falling. This quality defines it as a **neutral market strategy**.

## Determining how to calculate the spreads
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to discuss two aspects from this section with you. Let me know when you find the time! Just as a reminder for myself:

  • HDB -> RPT link
  • price_y and price_x source (rs?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants