Includes several changes to improve flow #13

neutropolis · 2024-06-07T11:49:06Z

I've been adapting some parts of the text. I couldn't cover all the aspects, so I'll add a few comments on your original PR as well.

chraberturas · 2024-06-07T13:24:22Z

PostPairsTrading.md

@@ -1,22 +1,15 @@
 # A Match Made in Trading: Step-by-Step Pairs Trading Guide


What do you think of this title? ** Implementing Pairs Trading in Kdb+ with a Tick Architecture **
I know it's not very fancy, but at least it's descriptive :)

It's fine but perhaps too obvious, I think we have to come up with something more attractive.

chraberturas · 2024-06-07T13:24:31Z

PostPairsTrading.md

@@ -29,14 +22,25 @@ The concept we're referring to is **cointegration** (although there are other me
 > 💡 Which should not be confused with correlation; cointegration is a statistical property of two-time series, indicating a long-term relationship between them despite short-term fluctuations. Cointegrated series move together over time, sharing a common stochastic drift. On the other hand, correlation measures the strength and direction of the linear relationship between two variables at a specific point in time. While correlation captures the degree of association between variables, cointegration reflects a deeper, underlying relationship that persists over time.

 Hence, we're interested in **cointegrated assets**, which are assets that exhibit the following characteristics:


Is this section about Pair Trading or Cointegration? Let me explain. I think if we want to introduce Pair Trading as a general strategy, it might be enough to mention that we are looking for two indices that have a relationship, and then in the chapter on Cointegration, we can explain this in more detail. So in this section, I would focus on explaining that the idea is to look for when these indices, which have a relationship, start to diverge in order to arbitrage

This is a very interesting thing. I thought they were kind of synonyms, but I completely agree that we should keep things as simple as possible here and possibly skip cointegration until the proper section.

chraberturas · 2024-06-07T13:24:34Z

PostPairsTrading.md

 ```
+As can be seen, we assume that the HDB process is listening at a given `port` in the local machine. We just `hopen` a connection with it and get the `hdb` handle.


Here we lose the consistency we have of explaining the code first and then showing it, because here we are showing it and then explaining it. I'm not sure if it's important, just mentioning it :)

I never asked for such constraint. I mean, one thing is showing several large paragraphs and later show the code at the end (which could be fine for certain situations). But here, the idea is to explain one fragment of code in each paragraph, or at least explain it little by little. It's ok if you just introduce it, show the code and right after that go deeper in certain details, while remaining in the very same paragraph.

chraberturas · 2024-06-07T13:24:37Z

PostPairsTrading.md

 ```
+The body of the function might seem pretty familiar to the SQL practicioner. In fact, we are exploiting **qSQL syntax** here, which leverages a syntax similar to SQL but optimised for kdb+. It might also interesting to say that `.z.d` represents the current date, so we are interested on the `n` days back from today.


... interested on the tr days back from today

I think n fits better here, since it's not a range, it's a number of days after my requested change. I'll adapt the text accordingly.

chraberturas · 2024-06-07T13:24:40Z

PostPairsTrading.md


+In our case, we are going to use closing prices to feed the ADF test, so we have to index (`@`) by column **close** from each pair in our table. Additionally, we take (`#`) the last **tr** days of data for both indexes, and finally apply our **fcoint** function to each (`.'`) pair of data lists.


remove (#) explanation, because we dont use it (bug also present in the original post :) )

This is one of the fragments that I wanted to change but I want to discuss with you, there's a comment about this in the original PR.

chraberturas · 2024-06-07T13:25:01Z

PostPairsTrading.md

 ```q
 upd:{.u.pub[`spread;([]time:1#y`time;spread:sp . y`bid)]};
 ```
+This function essentially takes the current prices of SP500 and NASDAQ100 as input, calculates the spread, and sends it to its subscribers. In this sense, the dashboard should subscribe to the RPT in a similar fashion as the RPT subscribed to the TP.


I don't think this section is bad, but it is a great simplification of what actually happens. I don't mind, I repeat, but I thought it wasn't the objective since the quant might have to work here and has no idea what they are doing

Let me review it so we can add more details in the right direction.

I've added a few things, but I think the new version collects all your original ideas, but in a (hopefully) simpler and easy-to-read way. Do you miss something? We can discuss it.

chraberturas · 2024-06-07T13:31:27Z

PostPairsTrading.md

 ## Determining how to calculate the spreads

-After this initial market assesment, we can move on to coding the sctual Pair Trading model that calculates the spreads. The first approach we may try could simply be to subtract them and observe if the difference deviates significantly from zero, considering their scale difference.
+After this initial market assesment, we can move on to coding the actual Pair Trading model that calculates the spreads. The first approach we may try could simply be to subtract them and observe if the difference deviates significantly from zero, considering their scale difference.


I think we should very briefly explain what it means to be overpriced or underpriced since we don't mention it in the entire section, yet it is the reason (I know it's also not present in the original post :) )

Feel free to add whatever change here, I think you'll explain it better than me, but I can try if you want to.

chraberturas · 2024-06-07T13:58:57Z

PostPairsTrading.md

 ```q
 upd:{.u.pub[`spread;([]time:1#y`time;spread:sp . y`bid)]};
 ```
+This function essentially takes the current prices of SP500 and NASDAQ100 as input, calculates the spread, and sends it to its subscribers. In this sense, the dashboard should subscribe to the RPT in a similar fashion as the RPT subscribed to the TP.


And I would explain in a different way or just remove how the dashboard connects to the RPT, because it's not the same, it's not the same in the sense that we can take advantage of the UI and we don't need to subscribe using 'u.sub'

Sure, we can say that Dashboards does the subscription in the shadows. Will come back to this next week.

chraberturas · 2024-06-07T14:03:54Z

PostPairsTrading.md

+```q
+t:`sym xgroup hdb(query_prices;4*365;syms)
+```
+To communicate with the process, we pass a list to the `hdb` handle with the first element being the function and the subsequent elements being the arguments (last 4 years & involved indexes). Once we get our data we finally `xgroup` by index.



I miss closing the connection with hclose

I think we could get rid of it, but if you find it relevant, I can add a brief paragraph at the end.

chraberturas · 2024-06-07T14:07:05Z

PostPairsTrading.md


+Real-time components can manifest their interest for a particular table and for a subset of symbols. As a result from previous steps, we know we are interested on the quotes for SP500 and NASDAQ100:


We should briefly explain why those two hardcoded symbols appear (`SP500`NASDAQ100), shouldn't we? I mean, perhaps the reader misses something more automated, right?

Yes, this is one of the problems that I feel while connecting pieces. They seem like independent things while DFA and RPT are completely tied. What we do is weird and difficult to justify, but I'm not sure if we could change it easily. Let me give it a try.

nipsn

Lots of comments but all very minor. Overall it will be perfect to me when Christian's comments are addressed.

I think my next review will only be grammatical

PostPairsTrading.md

nipsn · 2024-06-10T06:21:37Z

PostPairsTrading.md


-We'll proceed methodically, ensuring each question leads to a comprehensive answer. To start, we'll contextualize our current situation by addressing key questions such as **"What do we know about the market and how can we benefit from it?"**. This will lay the foundation for constructing a real-time simulated environment on Pairs Trading that will exemplify everything we have explained thus far.
+We will use the [_Tick Architecture_](https://github.com/KxSystems/kdb-tick), the typical setup found in kdb+ systems, to introduce, contextualize and guide each step. As we acknowledge that the typical reader is a quant interested in learning the virtues of q, a concise introduction to this architecture will be provided. We have also aimed to briefly introduce pairs trading so that other developers can follow and benefit from this text. Please, feel free to skip the following section if you are familiar with it.


Is it KDB, kdb or Kdb? On the same note, is it Q or q? Very minor but this should be consistent

I always use kdb+ and q, I think this is the most standard way of referring to them, taking into account Whitney's convention, but you're right, I'll adapt to the original style from the text: KDB+ and Q.

PostPairsTrading.md

nipsn · 2024-06-10T06:29:45Z

PostPairsTrading.md


-1. They have a similar trend, meaning the difference between both assets maintains a constant mean, and this difference fluctuates around that same mean.
+The tick architecture is designed to handle high-frequency trading data efficiently. At its heart is the tickerplant (TP), a crucial component responsible for receiving and timestamping incoming data, then broadcasting it to other components such as the real-time database (RDB) and historical database (HDB). The TP ensures that data is distributed in a timely manner, allowing for real-time analytics and decision-making. The RDB stores recent data for quick access, while the HDB archives older data for long-term storage and analysis. Additionally, the feedhandler plays a vital role by interfacing with external data sources, making sure that the TP receives accurate and up-to-date information. This architecture guarantees seamless data flow and rapid access to both real-time and historical data, making it ideal for high-frequency trading applications.


Maybe we should explain (in a single line perhaps) what it is at a higher level and how it fits our "project" before going deep into its components

I think I just copied your intro to the tick architecture, but I find it ok if you want to extend it. We don't talk much about the performance and the general benefits, perhaps it would be nice to briefly discuss it right here.

PostPairsTrading.md

Co-authored-by: Oscar Nydza <[email protected]>

PostPairsTrading.md

… into neutropolis/5-Post

…strading into neutropolis/5-Post

chraberturas · 2024-06-28T08:07:23Z

PostPairsTrading.md


-One valid concern is that our calculations might be heavily influenced by past data and rely too much on historical changes that may not accurately reflect the present reality. To address this, we could implement a rolling window approach where the linear regression is continuously updated, ensuring our model remains responsive to changes in the underlying data over time. Additionally, using the Kalman Filter to dynamically fit the alpha and beta of the linear regression can effectively filter noise and predict states in a dynamic system, allowing for real-time adjustments and providing a more accurate reflection of current market conditions. We will delve deeper into the topic of window signals as well, exploring more advanced techniques and their applications in real-time pair trading. This will further enhance our model's responsiveness and accuracy, providing a robust framework for effective trading strategies.
+Namely, the performance of these technologies is superb, both in terms of speed and memory footprint. In fact, as stated by KX in one of the articles linked above, kdb+ supports a throughput of 35K ticks per second for a single node. If we scale the solution to several nodes, we should easily accommodate thousands of pairs in the system simultaneously. We leave scaling this solution up as future work.


"35K ticks per second ...", have you tested this? or where did this number come from?

I think it's clearly stated, isn't it? I can supply the link again if you find it necessary...

chraberturas · 2024-06-28T08:12:07Z

PostPairsTrading.md

 ```q
-matrix:fcoint .' 0f^@\:[;`close](@/:[t]')syms cross syms
+cof:@[;1]co[<]::


I think this is a good moment to introduce or comment on tacit programming. I propose:

As you can see, this function definition doesn't make explicit reference to its arguments. This is achieved thanks to how KDB+/q allows us to compose functions. This style of programming, where function arguments are not explicitly named, is known as tacit programming or point-free style. In KDB+/q, tacit programming is particularly powerful and concise, allowing us to create expressive and efficient code. It leverages the language's capabilities for function composition and implicit argument passing, resulting in elegant solutions like the one we've just seen.

I like it. I think it would fit nicely as a quoted hint, what do you think? Also, I'd supply a link as a reference.

… into neutropolis/5-Post

Includes several changes to improve flow

29ef30b

neutropolis requested review from nipsn and chraberturas June 7, 2024 11:49

neutropolis mentioned this pull request Jun 7, 2024

Create PostPairsTrading.md #11

Open

chraberturas requested changes Jun 7, 2024

View reviewed changes

nipsn suggested changes Jun 10, 2024

View reviewed changes

neutropolis and others added 8 commits June 10, 2024 09:02

Address Christian's suggestions

2fe16e9

Update PostPairsTrading.md

4eae809

Co-authored-by: Oscar Nydza <[email protected]>

Update PostPairsTrading.md

df6fd68

Co-authored-by: Oscar Nydza <[email protected]>

Update PostPairsTrading.md

24c40fc

Co-authored-by: Oscar Nydza <[email protected]>

Update PostPairsTrading.md

8eee997

Co-authored-by: Oscar Nydza <[email protected]>

Update PostPairsTrading.md

5f5d91a

Co-authored-by: Oscar Nydza <[email protected]>

Update PostPairsTrading.md

27f719c

Co-authored-by: Oscar Nydza <[email protected]>

Update PostPairsTrading.md

4eedcee

Co-authored-by: Oscar Nydza <[email protected]>

neutropolis commented Jun 10, 2024

View reviewed changes

PostPairsTrading.md Outdated Show resolved Hide resolved

neutropolis and others added 15 commits June 10, 2024 11:14

Update PostPairsTrading.md

575786a

Minor change to Tick introduction and new illustrations

a8c23af

ML model refactor and some minor changes

43db9d6

minor tweaks

0ab2ce1

My revision (til step 1, included)

6de9863

Minor changes

d50c65a

unique syms

f5bc8a2

Minor changes on last sections

1fddb78

Merge branch 'neutropolis/5-Post' of github.com:hablapps/pairstrading…

8d3f0a2

… into neutropolis/5-Post

refactored matrix/pvalues code

348fed4

Merge branch 'neutropolis/5-Post' of https://github.com/hablapps/pair…

2ca7900

…strading into neutropolis/5-Post

very picky minor change

e2f1c59

My last changes on this revision

2991690

pvalues code and explaination refactored and title

b50cef7

done some tasks

2f5d63b

nipsn and others added 14 commits June 26, 2024 14:09

new diagrams

75c4273

spreads calculation refactored

6ef1087

replaced linear regression image

9e9554b

ms diagram

fc803d1

Updates introduction and first section

610a24f

Update PostPairsTrading.md

7015745

Fixes link

244a805

Updates real-time section

d836d31

adf test refactored into cointegration test

15825f9

Minor changes after re-reading article

fb70c2e

Updates conclusions

9939ce5

Supplies several references

58ae84f

adressed coint task and math latex flavour

22e5039

Merge branch 'neutropolis/5-Post' of https://github.com/hablapps/pair…

84e6565

…strading into neutropolis/5-Post

chraberturas reviewed Jun 28, 2024

View reviewed changes

neutropolis and others added 7 commits June 28, 2024 12:13

(Hopefully) my final changes

cd274ce

Merge branch 'neutropolis/5-Post' of github.com:hablapps/pairstrading…

512dfab

… into neutropolis/5-Post

tacit programming comment

eb12368

added Jesus in acknowledgements

ca34a55

Update PostPairsTrading.md

7552ca5

Minor changes (Javier's feedback)

9a88eab

adressed Javier tasks

758de2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Includes several changes to improve flow #13

Includes several changes to improve flow #13

neutropolis commented Jun 7, 2024

chraberturas Jun 7, 2024

neutropolis Jun 7, 2024

chraberturas Jun 7, 2024

neutropolis Jun 7, 2024

chraberturas Jun 7, 2024

neutropolis Jun 7, 2024

chraberturas Jun 7, 2024

neutropolis Jun 7, 2024

chraberturas Jun 7, 2024

neutropolis Jun 7, 2024

chraberturas Jun 7, 2024

neutropolis Jun 7, 2024

neutropolis Jun 10, 2024

chraberturas Jun 7, 2024

neutropolis Jun 7, 2024

chraberturas Jun 7, 2024

neutropolis Jun 7, 2024

chraberturas Jun 7, 2024

neutropolis Jun 7, 2024

chraberturas Jun 7, 2024

neutropolis Jun 7, 2024

nipsn left a comment

nipsn Jun 10, 2024

neutropolis Jun 10, 2024

nipsn Jun 10, 2024

neutropolis Jun 10, 2024

chraberturas Jun 28, 2024

neutropolis Jun 28, 2024

chraberturas Jun 28, 2024

neutropolis Jun 28, 2024

		@@ -1,22 +1,15 @@
		# A Match Made in Trading: Step-by-Step Pairs Trading Guide

		@@ -29,14 +22,25 @@ The concept we're referring to is cointegration (although there are other me
		> 💡 Which should not be confused with correlation; cointegration is a statistical property of two-time series, indicating a long-term relationship between them despite short-term fluctuations. Cointegrated series move together over time, sharing a common stochastic drift. On the other hand, correlation measures the strength and direction of the linear relationship between two variables at a specific point in time. While correlation captures the degree of association between variables, cointegration reflects a deeper, underlying relationship that persists over time.

		Hence, we're interested in cointegrated assets, which are assets that exhibit the following characteristics:

		```
		As can be seen, we assume that the HDB process is listening at a given `port` in the local machine. We just `hopen` a connection with it and get the `hdb` handle.

		```
		The body of the function might seem pretty familiar to the SQL practicioner. In fact, we are exploiting qSQL syntax here, which leverages a syntax similar to SQL but optimised for kdb+. It might also interesting to say that `.z.d` represents the current date, so we are interested on the `n` days back from today.


		In our case, we are going to use closing prices to feed the ADF test, so we have to index (`@`) by column close from each pair in our table. Additionally, we take (`#`) the last tr days of data for both indexes, and finally apply our fcoint function to each (`.'`) pair of data lists.


		Real-time components can manifest their interest for a particular table and for a subset of symbols. As a result from previous steps, we know we are interested on the quotes for SP500 and NASDAQ100:


		We'll proceed methodically, ensuring each question leads to a comprehensive answer. To start, we'll contextualize our current situation by addressing key questions such as "What do we know about the market and how can we benefit from it?". This will lay the foundation for constructing a real-time simulated environment on Pairs Trading that will exemplify everything we have explained thus far.
		We will use the [_Tick Architecture_](https://github.com/KxSystems/kdb-tick), the typical setup found in kdb+ systems, to introduce, contextualize and guide each step. As we acknowledge that the typical reader is a quant interested in learning the virtues of q, a concise introduction to this architecture will be provided. We have also aimed to briefly introduce pairs trading so that other developers can follow and benefit from this text. Please, feel free to skip the following section if you are familiar with it.


		1. They have a similar trend, meaning the difference between both assets maintains a constant mean, and this difference fluctuates around that same mean.
		The tick architecture is designed to handle high-frequency trading data efficiently. At its heart is the tickerplant (TP), a crucial component responsible for receiving and timestamping incoming data, then broadcasting it to other components such as the real-time database (RDB) and historical database (HDB). The TP ensures that data is distributed in a timely manner, allowing for real-time analytics and decision-making. The RDB stores recent data for quick access, while the HDB archives older data for long-term storage and analysis. Additionally, the feedhandler plays a vital role by interfacing with external data sources, making sure that the TP receives accurate and up-to-date information. This architecture guarantees seamless data flow and rapid access to both real-time and historical data, making it ideal for high-frequency trading applications.


		One valid concern is that our calculations might be heavily influenced by past data and rely too much on historical changes that may not accurately reflect the present reality. To address this, we could implement a rolling window approach where the linear regression is continuously updated, ensuring our model remains responsive to changes in the underlying data over time. Additionally, using the Kalman Filter to dynamically fit the alpha and beta of the linear regression can effectively filter noise and predict states in a dynamic system, allowing for real-time adjustments and providing a more accurate reflection of current market conditions. We will delve deeper into the topic of window signals as well, exploring more advanced techniques and their applications in real-time pair trading. This will further enhance our model's responsiveness and accuracy, providing a robust framework for effective trading strategies.
		Namely, the performance of these technologies is superb, both in terms of speed and memory footprint. In fact, as stated by KX in one of the articles linked above, kdb+ supports a throughput of 35K ticks per second for a single node. If we scale the solution to several nodes, we should easily accommodate thousands of pairs in the system simultaneously. We leave scaling this solution up as future work.

Includes several changes to improve flow #13

Are you sure you want to change the base?

Includes several changes to improve flow #13

Conversation

neutropolis commented Jun 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nipsn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment