Replacing Tokio's `TcpStream` with `async-std` `TcpStream` #475

b-yap · 2024-01-18T07:26:23Z

Somehow in K8s, Tokio's TcpStream is stuck at "write ready" and is never "read ready".
Even after connecting using Rust's Tcpstream and then converting it to Tokio's, the stream never changes to "read ready".

That's when I decided to use ~~Rust's Tcpstream completely.~~ async-std library.

In terms of code changes, these are the difference you will see:

Tokio	Async-Std	~~Rust Std~~
Can split the stream to read/write halves, with the write half owned by a struct (`Connector`)	provided an example but `Connector` now owns the `TcpStream`, so I found no need to split	~~Can only try to clone~~
Everything is async	Async	~~Doesn't need async~~
Need to add timeout logic	potentially will need it	~~Can set it in the stream~~

This also implies removing some tokio features that were added primarily for TCP reasons: net and io-util.

How to begin the review:

clients/stellar-relay-lib/src/connection/connector/connector.rs
- As mentioned above, the whole TcpStream is owned by the Connector:
```
     pub struct Connector {
         ...
         - write_stream_overlay: OwnedWriteHalf,
         + tcp_stream: TcpStream,
     }
```
- renamed fn new(...) to fn start(...) where it also starts connecting to Stellar Node
clients/stellar-relay-lib/src/overlay.rs
- StellarOverlayConnection's fn connect(...) will directly use Connector's fn start(...) . No more fn create_stream(...)` calls.
- renamed fn disconnect() to fn stop() to make it similar to other structs (Agent, Connector).
clients/stellar-relay-lib/src/connection/helper.rs
- removal of fn create_stream(...)
clients/stellar-relay-lib/src/connection/connector/message_reader.rs
- all r_stream: &mut tcp::OwnedReadHalf is replaced with mut stream: TcpStream, removing async in these methods.
- in fn read_message_from_stellar(...):
  - instead of "peeking" first, I changed it to actually "reading" the 1st 4 bytes, to avoid double operations. On the previous logic, it usually 1. peeks first, 2. and then read. Doing only 1 operation saves time.
  - ~~to pass the stream around, it has to be cloned; hence the call totry_clone()~~
  - a reference of the Connector is being passed around. This eliminates cloning.
clients/vault/src/oracle/testing_utils.rs
- created a helper function: fn specific_stellar_relay_config(...) to specifically connect to certain config. And there are only 3 choices, based on the index of its list:
  - testnet have: sdftest1, sdftest2, sdftest3
  - mainnet have: Iowa, Frankfurt, Singapore

TorstenStueber · 2024-01-18T17:18:59Z

@b-yap as this could be a bug in Tokios TCP implementation, would it make sense to report a bug ticket to Tokio?

gianfra-t

Very nice you were able to solve this issue @b-yap !! Super strange why the stream gets stuck and is not able to read only in Kubernetes.

I just left some comments that in general are about using the non async version of the TcpStream and if they may block the thread in a meaningful way, when reading particularly.

clients/stellar-relay-lib/src/connection/connector/message_reader.rs

bogdanS98

Everything looks fine except I'm not really sure if synchronous execution is going to achieve the same performance as async. Is there any particular reason why std::net::TcpStream was used instead of async_std::net::TcpStream?

clients/stellar-relay-lib/src/connection/connector/message_reader.rs

b-yap · 2024-01-22T06:42:30Z

@bogdanS98 @gianfra-t
After Gian's comment about using spawn_blocking(...), I am also finding std as NOT the best solution.

I just checked async_std's repo and the last update was about 9 months ago?
I'm running through their issues and found:

TcpStream read task does not return although data is available async-rs/async-std#1063

adding trace logs for the stream's readiness to read data

* async-std * works, but still in progress * working test withouth extra signals * remove comments and re-add connector drop trait * cleanup * fix the failing test about current slot * fix the failing test about current slot, by connecting to specifically different nodes * update config files * use a different account for testing * fix rustfmt --------- Co-authored-by: Gianfranco <[email protected]>

gianfra-t

All looks good to me @b-yap! Amazing that you found out how to solve this weird problem.

gianfra-t · 2024-01-26T22:01:39Z

clients/vault/src/oracle/agent.rs

 		let shutdown_sender = ShutdownSender::new();

 		// We use a random secret key to avoid conflicts with other tests.
 		let agent = start_oracle_agent(
-			get_test_stellar_relay_config(true),
+			specific_stellar_relay_config(true, 0),


Just curious, why do we want to get always the first node choice here? and then on the other test always the second.

ah, sometimes an error already connected peer happens. I tried connecting to different nodes just to see if it works 🤞. The next best thing is to actually connect with different stellar accounts. Currently we only have 1 test account to play with.

ebma

Looks good to me too, great job 🎉 I think these changes even simplify the connectivity logic we had because we don't have to pass the two halfs of the stream around, which is even better.

Did you already test with Zoltan if this implementation works in Kubernetes @b-yap?

bogdanS98

Great! Looks good to me 👍🏼

b-yap · 2024-01-30T05:01:03Z

Couldn't have done it without you guys, @bogdanS98 @gianfra-t .

@ebma yes I did

b-yap requested a review from a team January 18, 2024 07:44

gianfra-t reviewed Jan 18, 2024

View reviewed changes

clients/stellar-relay-lib/src/connection/connector/message_reader.rs Outdated Show resolved Hide resolved

clients/stellar-relay-lib/src/connection/connector/message_reader.rs Outdated Show resolved Hide resolved

bogdanS98 reviewed Jan 19, 2024

View reviewed changes

clients/stellar-relay-lib/src/connection/connector/message_reader.rs Outdated Show resolved Hide resolved

clients/stellar-relay-lib/src/connection/connector/message_reader.rs Outdated Show resolved Hide resolved

b-yap mentioned this pull request Jan 25, 2024

Replace tokio's TcpStream with async-std #481

Merged

b-yap and others added 13 commits January 26, 2024 13:12

adding trace logs to every function call.

13d839e

adding trace logs for the stream's readiness to read data

adding trace logs for ready status

68c9809

show peer_addr and local_addr

fc3fb12

adding std::net::TcpStream

828da4b

convert std::net::TcpStream to tokio

4ec63a4

completely change tokio to std::net

de9ac93

code cleanup, ready for review

2d390fb

remove unnecessary features of tokio

1a71b00

from async to just normal funcs

3ab5d1c

testing fixes for test not stopping

d282826

set timeout for tests to 180

d089c0f

fixes warnings

3e8c6a9

b-yap force-pushed the connection-issues-investigation branch from f88ef24 to c52db4f Compare January 26, 2024 05:13

b-yap changed the title ~~Replacing Tokio's TcpStream with Rust's std lib TcpStream~~ Replacing Tokio's TcpStream with ~~Rust's std lib TcpStream~~ async-std TcpStream Jan 26, 2024

b-yap changed the title ~~Replacing Tokio's TcpStream with ~~Rust's std lib TcpStream~~ async-std TcpStream~~ Replacing Tokio's TcpStream with async-std TcpStream Jan 26, 2024

b-yap requested review from a team, bogdanS98 and gianfra-t January 26, 2024 06:40

gianfra-t approved these changes Jan 26, 2024

View reviewed changes

ebma approved these changes Jan 29, 2024

View reviewed changes

bogdanS98 approved these changes Jan 29, 2024

View reviewed changes

b-yap merged commit 56b9601 into main Jan 30, 2024
2 checks passed

b-yap deleted the connection-issues-investigation branch January 30, 2024 05:00

bogdanS98 mentioned this pull request Feb 21, 2024

Investigate Spacewalk Kubernetes issue #479

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replacing Tokio's `TcpStream` with `async-std` `TcpStream` #475

Replacing Tokio's `TcpStream` with `async-std` `TcpStream` #475

b-yap commented Jan 18, 2024 •

edited

Loading

TorstenStueber commented Jan 18, 2024

gianfra-t left a comment

bogdanS98 left a comment

b-yap commented Jan 22, 2024 •

edited

Loading

gianfra-t left a comment

gianfra-t Jan 26, 2024

b-yap Jan 29, 2024

ebma left a comment

bogdanS98 left a comment

b-yap commented Jan 30, 2024

Replacing Tokio's TcpStream with async-std TcpStream #475

Replacing Tokio's TcpStream with async-std TcpStream #475

Conversation

b-yap commented Jan 18, 2024 • edited Loading

How to begin the review:

TorstenStueber commented Jan 18, 2024

gianfra-t left a comment

Choose a reason for hiding this comment

bogdanS98 left a comment

Choose a reason for hiding this comment

b-yap commented Jan 22, 2024 • edited Loading

gianfra-t left a comment

Choose a reason for hiding this comment

gianfra-t Jan 26, 2024

Choose a reason for hiding this comment

b-yap Jan 29, 2024

Choose a reason for hiding this comment

ebma left a comment

Choose a reason for hiding this comment

bogdanS98 left a comment

Choose a reason for hiding this comment

b-yap commented Jan 30, 2024

Replacing Tokio's `TcpStream` with `async-std` `TcpStream` #475

Replacing Tokio's `TcpStream` with `async-std` `TcpStream` #475

b-yap commented Jan 18, 2024 •

edited

Loading

b-yap commented Jan 22, 2024 •

edited

Loading