-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacing Tokio's TcpStream
with async-std
TcpStream
#475
Conversation
@b-yap as this could be a bug in Tokios TCP implementation, would it make sense to report a bug ticket to Tokio? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice you were able to solve this issue @b-yap !! Super strange why the stream gets stuck and is not able to read only in Kubernetes.
I just left some comments that in general are about using the non async
version of the TcpStream
and if they may block the thread in a meaningful way, when reading particularly.
clients/stellar-relay-lib/src/connection/connector/message_reader.rs
Outdated
Show resolved
Hide resolved
clients/stellar-relay-lib/src/connection/connector/message_reader.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks fine except I'm not really sure if synchronous execution is going to achieve the same performance as async. Is there any particular reason why std::net::TcpStream
was used instead of async_std::net::TcpStream
?
clients/stellar-relay-lib/src/connection/connector/message_reader.rs
Outdated
Show resolved
Hide resolved
clients/stellar-relay-lib/src/connection/connector/message_reader.rs
Outdated
Show resolved
Hide resolved
@bogdanS98 @gianfra-t I just checked |
adding trace logs for the stream's readiness to read data
* async-std * works, but still in progress * working test withouth extra signals * remove comments and re-add connector drop trait * cleanup * fix the failing test about current slot * fix the failing test about current slot, by connecting to specifically different nodes * update config files * use a different account for testing * fix rustfmt --------- Co-authored-by: Gianfranco <[email protected]>
f88ef24
to
c52db4f
Compare
TcpStream
with Rust's std lib TcpStream
TcpStream
with ~~Rust's std lib TcpStream
~~ async-std
TcpStream
TcpStream
with ~~Rust's std lib TcpStream
~~ async-std
TcpStream
TcpStream
with async-std
TcpStream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me @b-yap! Amazing that you found out how to solve this weird problem.
let shutdown_sender = ShutdownSender::new(); | ||
|
||
// We use a random secret key to avoid conflicts with other tests. | ||
let agent = start_oracle_agent( | ||
get_test_stellar_relay_config(true), | ||
specific_stellar_relay_config(true, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why do we want to get always the first node choice here? and then on the other test always the second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, sometimes an error already connected peer
happens. I tried connecting to different nodes just to see if it works 🤞. The next best thing is to actually connect with different stellar accounts. Currently we only have 1 test account to play with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too, great job 🎉 I think these changes even simplify the connectivity logic we had because we don't have to pass the two halfs of the stream around, which is even better.
Did you already test with Zoltan if this implementation works in Kubernetes @b-yap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Looks good to me 👍🏼
Couldn't have done it without you guys, @bogdanS98 @gianfra-t . @ebma yes I did |
Somehow in K8s, Tokio's TcpStream is stuck at "write ready" and is never "read ready".
Even after connecting using Rust's Tcpstream and then converting it to Tokio's, the stream never changes to "read ready".
That's when I decided to use
Rust's Tcpstream completely.async-std
library.In terms of code changes, these are the difference you will see:
Rust StdConnector
)Connector
now owns theTcpStream
, so I found no need to splitCan only try to cloneDoesn't need asyncCan set it in the streamThis also implies removing some tokio features that were added primarily for TCP reasons:
net
andio-util
.How to begin the review:
TcpStream
is owned by the Connector:fn new(...)
tofn start(...)
where it also starts connecting to Stellar NodeStellarOverlayConnection
'sfn connect(...)
will directly useConnector
'sfn start(...) . No more
fn create_stream(...)` calls.fn disconnect()
tofn stop()
to make it similar to other structs (Agent
,Connector
).fn create_stream(...)
r_stream: &mut tcp::OwnedReadHalf
is replaced withmut stream: TcpStream
, removingasync
in these methods.fn read_message_from_stellar(...)
:to pass the stream around, it has to be cloned; hence the call totry_clone()
Connector
is being passed around. This eliminates cloning.fn specific_stellar_relay_config(...)
to specifically connect to certain config. And there are only 3 choices, based on the index of its list: