Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create BackupWriter and BackupReader traits for async unit backup #443

Closed
wants to merge 1 commit into from

Conversation

joschisan
Copy link
Contributor

@joschisan joschisan commented May 1, 2024

This pr is reproposing the original design in #399 before it was switched to using AsyncRead and AsyncWrite. The initial design proposed here is currently implemented in our fork and allows a very clean integration with our async database abstraction over RocksDB. Please find the integration into Fedimint here https://github.com/fedimint/fedimint/blob/master/fedimint-server/src/atomic_broadcast/backup.rs

The current API using AsyncWrite is quite low level and therefore quite unpractical for us, however I did not notice at the time that the api design in the original pr was changed otherwise I would have brought it up at the time. My apologies for the unnecessary back and forth this caused you; I should have caught this at the time.

I am reopening the issue now as we intend to switch from our fork back to upstream.

Please note that reading the backup was never really the issue and I am proposing it with two traits here for symmetry which makes it a little cleaner imo. We could therefore also stick to AsyncRead instead of defining a new trait or just take a byte vector / boxed slice directly since we are reading it all into a vector at once anyway.

I am aware that there were concerns around introducing new traits for this, however, after implementing both sides integrating via these traits it seems to me that this interface is the cleanest on both sides so it seems worth it.

Copy link

github-actions bot commented May 1, 2024

Please make sure the following happened

  • Appropriate tests created
  • Infrastructure updated accordingly
  • Updated existing documentation
  • New documentation created
  • Version bumped if breaking changes

@timorleph
Copy link
Contributor

We would strongly prefer not to change the API again, and our reasons for preferring the already existing traits mentioned in #399 still hold. You should be able to implement the Async* traits for your Backup* structs with no major problems, although maybe parts of them will look a tad awkward. The awkwardness is somewhat intentional, since the backup is supposed to be more akin to a transaction log from a DB, which should be lower level than a DB.

@dpc
Copy link

dpc commented May 10, 2024

You should be able to implement the Async* traits for your Backup* structs with no major problems, although maybe parts of them will look a tad awkward

"tad awkward" is an understatement.

It's such a complication for no good reason. AsyncWrite/Read are just a ill suited (too low level, overly generic, forcing completely pointless complexity on the implementation) abstraction used here for the wrong reasons.

The items here are not going to be 16MBs (will they?), to justify needing streaming API that AsyncWrite/Read. The alepbhft is already (and most probably forever) only ever going to call write_all + flush, because most probably fundamentally it needs to read the whole thing in memory to validate it, doesn't have a need to deal with streaming of very large sets of data, so it will always have whole thing contiguous in memory and will just write everything all at once. So what could have been a single async fn save_item(&[u8]) -> io::Result<()> turns into implementing way more complex https://docs.rs/futures-io/latest/futures_io/trait.AsyncWrite.html unloved nightmare because theoretically alephbft code could change in the future and start calling multiple writes (it won't, but the interface here is ill suited and more generic). The whole thing will probably break in the future, because the whole async io traits are not really stable and figured out yet (that's why they are in 3rd party crate, no `std), AFAIR.

To sum up "tad awkward":

  • extra dependency
  • extra gnarly (and possibly buggy) code
  • extra performance hit
  • extra breaking change when the IO-traits dust settles

Anyway @joschisan it is definitely possible to implement impl AsyncWrite for SomeOurStructDoingTheJob that does a whole gathering (and copying, wasting the performance) in poll_write dance, and write to db on flush&close. We either need to find some helper, or write that raw-polling-async code ourselves. Not much point debating here. @maan2003 You know your async ecosystem. Do you know of a good existing wrapper like that?

@maan2003
Copy link
Contributor

this poll dance for async write doesn't look trivial.
I don't think it is possible to convert async fn save(&self, &mut [u8]) into async write, because caller can just poll_write with a different buffer next time.

I don't know of any such wrappers.

@dpc
Copy link

dpc commented May 10, 2024

this poll dance for async write doesn't look trivial.
I don't think it is possible to convert async fn save(&self, &mut [u8]) into async write, because caller can just poll_write with a different buffer next time.

The implementation needs to copy bytes into its own buffer that gets cleared on flush/close, no way around it. If that's what you mean.

@maan2003
Copy link
Contributor

oh right, you can just make all async stuff happen on flush

@kostekIV
Copy link
Contributor

oh right, you can just make all async stuff happen on flush

or on write making flush a noop – this is relatively common.

Anyway, we are closing this.

@kostekIV kostekIV closed this May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants