Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Includes several changes to improve flow #13

Open
wants to merge 45 commits into
base: 5-Post
Choose a base branch
from

Conversation

neutropolis
Copy link
Member

I've been adapting some parts of the text. I couldn't cover all the aspects, so I'll add a few comments on your original PR as well.

@@ -1,22 +1,15 @@
# A Match Made in Trading: Step-by-Step Pairs Trading Guide
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of this title? ** Implementing Pairs Trading in Kdb+ with a Tick Architecture **
I know it's not very fancy, but at least it's descriptive :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine but perhaps too obvious, I think we have to come up with something more attractive.

@@ -29,14 +22,25 @@ The concept we're referring to is **cointegration** (although there are other me
> 💡 Which should not be confused with correlation; cointegration is a statistical property of two-time series, indicating a long-term relationship between them despite short-term fluctuations. Cointegrated series move together over time, sharing a common stochastic drift. On the other hand, correlation measures the strength and direction of the linear relationship between two variables at a specific point in time. While correlation captures the degree of association between variables, cointegration reflects a deeper, underlying relationship that persists over time.

Hence, we're interested in **cointegrated assets**, which are assets that exhibit the following characteristics:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this section about Pair Trading or Cointegration? Let me explain. I think if we want to introduce Pair Trading as a general strategy, it might be enough to mention that we are looking for two indices that have a relationship, and then in the chapter on Cointegration, we can explain this in more detail. So in this section, I would focus on explaining that the idea is to look for when these indices, which have a relationship, start to diverge in order to arbitrage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very interesting thing. I thought they were kind of synonyms, but I completely agree that we should keep things as simple as possible here and possibly skip cointegration until the proper section.

```
As can be seen, we assume that the HDB process is listening at a given `port` in the local machine. We just `hopen` a connection with it and get the `hdb` handle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we lose the consistency we have of explaining the code first and then showing it, because here we are showing it and then explaining it. I'm not sure if it's important, just mentioning it :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never asked for such constraint. I mean, one thing is showing several large paragraphs and later show the code at the end (which could be fine for certain situations). But here, the idea is to explain one fragment of code in each paragraph, or at least explain it little by little. It's ok if you just introduce it, show the code and right after that go deeper in certain details, while remaining in the very same paragraph.

```
The body of the function might seem pretty familiar to the SQL practicioner. In fact, we are exploiting **qSQL syntax** here, which leverages a syntax similar to SQL but optimised for kdb+. It might also interesting to say that `.z.d` represents the current date, so we are interested on the `n` days back from today.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... interested on the tr days back from today

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think n fits better here, since it's not a range, it's a number of days after my requested change. I'll adapt the text accordingly.


In our case, we are going to use closing prices to feed the ADF test, so we have to index (`@`) by column **close** from each pair in our table. Additionally, we take (`#`) the last **tr** days of data for both indexes, and finally apply our **fcoint** function to each (`.'`) pair of data lists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove (#) explanation, because we dont use it (bug also present in the original post :) )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the fragments that I wanted to change but I want to discuss with you, there's a comment about this in the original PR.

```q
upd:{.u.pub[`spread;([]time:1#y`time;spread:sp . y`bid)]};
```
This function essentially takes the current prices of SP500 and NASDAQ100 as input, calculates the spread, and sends it to its subscribers. In this sense, the dashboard should subscribe to the RPT in a similar fashion as the RPT subscribed to the TP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this section is bad, but it is a great simplification of what actually happens. I don't mind, I repeat, but I thought it wasn't the objective since the quant might have to work here and has no idea what they are doing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me review it so we can add more details in the right direction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few things, but I think the new version collects all your original ideas, but in a (hopefully) simpler and easy-to-read way. Do you miss something? We can discuss it.

## Determining how to calculate the spreads

After this initial market assesment, we can move on to coding the sctual Pair Trading model that calculates the spreads. The first approach we may try could simply be to subtract them and observe if the difference deviates significantly from zero, considering their scale difference.
After this initial market assesment, we can move on to coding the actual Pair Trading model that calculates the spreads. The first approach we may try could simply be to subtract them and observe if the difference deviates significantly from zero, considering their scale difference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should very briefly explain what it means to be overpriced or underpriced since we don't mention it in the entire section, yet it is the reason (I know it's also not present in the original post :) )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add whatever change here, I think you'll explain it better than me, but I can try if you want to.

```q
upd:{.u.pub[`spread;([]time:1#y`time;spread:sp . y`bid)]};
```
This function essentially takes the current prices of SP500 and NASDAQ100 as input, calculates the spread, and sends it to its subscribers. In this sense, the dashboard should subscribe to the RPT in a similar fashion as the RPT subscribed to the TP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I would explain in a different way or just remove how the dashboard connects to the RPT, because it's not the same, it's not the same in the sense that we can take advantage of the UI and we don't need to subscribe using 'u.sub'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can say that Dashboards does the subscription in the shadows. Will come back to this next week.

```q
t:`sym xgroup hdb(query_prices;4*365;syms)
```
To communicate with the process, we pass a list to the `hdb` handle with the first element being the function and the subsequent elements being the arguments (last 4 years & involved indexes). Once we get our data we finally `xgroup` by index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss closing the connection with hclose

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could get rid of it, but if you find it relevant, I can add a brief paragraph at the end.


Real-time components can manifest their interest for a particular table and for a subset of symbols. As a result from previous steps, we know we are interested on the quotes for SP500 and NASDAQ100:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should briefly explain why those two hardcoded symbols appear (`SP500`NASDAQ100), shouldn't we? I mean, perhaps the reader misses something more automated, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is one of the problems that I feel while connecting pieces. They seem like independent things while DFA and RPT are completely tied. What we do is weird and difficult to justify, but I'm not sure if we could change it easily. Let me give it a try.

Copy link

@nipsn nipsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comments but all very minor. Overall it will be perfect to me when Christian's comments are addressed.

I think my next review will only be grammatical

PostPairsTrading.md Outdated Show resolved Hide resolved

We'll proceed methodically, ensuring each question leads to a comprehensive answer. To start, we'll contextualize our current situation by addressing key questions such as **"What do we know about the market and how can we benefit from it?"**. This will lay the foundation for constructing a real-time simulated environment on Pairs Trading that will exemplify everything we have explained thus far.
We will use the [_Tick Architecture_](https://github.com/KxSystems/kdb-tick), the typical setup found in kdb+ systems, to introduce, contextualize and guide each step. As we acknowledge that the typical reader is a quant interested in learning the virtues of q, a concise introduction to this architecture will be provided. We have also aimed to briefly introduce pairs trading so that other developers can follow and benefit from this text. Please, feel free to skip the following section if you are familiar with it.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it KDB, kdb or Kdb? On the same note, is it Q or q? Very minor but this should be consistent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always use kdb+ and q, I think this is the most standard way of referring to them, taking into account Whitney's convention, but you're right, I'll adapt to the original style from the text: KDB+ and Q.

PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Show resolved Hide resolved

1. They have a similar trend, meaning the difference between both assets maintains a constant mean, and this difference fluctuates around that same mean.
The tick architecture is designed to handle high-frequency trading data efficiently. At its heart is the tickerplant (TP), a crucial component responsible for receiving and timestamping incoming data, then broadcasting it to other components such as the real-time database (RDB) and historical database (HDB). The TP ensures that data is distributed in a timely manner, allowing for real-time analytics and decision-making. The RDB stores recent data for quick access, while the HDB archives older data for long-term storage and analysis. Additionally, the feedhandler plays a vital role by interfacing with external data sources, making sure that the TP receives accurate and up-to-date information. This architecture guarantees seamless data flow and rapid access to both real-time and historical data, making it ideal for high-frequency trading applications.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should explain (in a single line perhaps) what it is at a higher level and how it fits our "project" before going deep into its components

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just copied your intro to the tick architecture, but I find it ok if you want to extend it. We don't talk much about the performance and the general benefits, perhaps it would be nice to briefly discuss it right here.

PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved
PostPairsTrading.md Outdated Show resolved Hide resolved

One valid concern is that our calculations might be heavily influenced by past data and rely too much on historical changes that may not accurately reflect the present reality. To address this, we could implement a rolling window approach where the linear regression is continuously updated, ensuring our model remains responsive to changes in the underlying data over time. Additionally, using the Kalman Filter to dynamically fit the alpha and beta of the linear regression can effectively filter noise and predict states in a dynamic system, allowing for real-time adjustments and providing a more accurate reflection of current market conditions. We will delve deeper into the topic of window signals as well, exploring more advanced techniques and their applications in real-time pair trading. This will further enhance our model's responsiveness and accuracy, providing a robust framework for effective trading strategies.
Namely, the performance of these technologies is superb, both in terms of speed and memory footprint. In fact, as stated by KX in one of the articles linked above, kdb+ supports a throughput of 35K ticks per second for a single node. If we scale the solution to several nodes, we should easily accommodate thousands of pairs in the system simultaneously. We leave scaling this solution up as future work.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"35K ticks per second ...", have you tested this? or where did this number come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clearly stated, isn't it? I can supply the link again if you find it necessary...

```q
matrix:fcoint .' 0f^@\:[;`close](@/:[t]')syms cross syms
cof:@[;1]co[<]::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good moment to introduce or comment on tacit programming. I propose:

As you can see, this function definition doesn't make explicit reference to its arguments. This is achieved thanks to how KDB+/q allows us to compose functions. This style of programming, where function arguments are not explicitly named, is known as tacit programming or point-free style. In KDB+/q, tacit programming is particularly powerful and concise, allowing us to create expressive and efficient code. It leverages the language's capabilities for function composition and implicit argument passing, resulting in elegant solutions like the one we've just seen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. I think it would fit nicely as a quoted hint, what do you think? Also, I'd supply a link as a reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants