-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async I/O #399
Comments
Worth noting that in the async case I'd also like |
@sgrif maybe something like the "async iterator" in JS? Which would return an |
That's a possibility as well, but it's untrue to how I'll actually have access to the data, which is in batches. If we're doing true async IO I also will not have the length ahead of time, so it'd have to be an actual iterator not Vec<Future>, at which point why not just do a true stream? |
Please note that futures-rs also has a Stream trait! |
|
@sgrif off-topic, but curious as to the reasons for moving off of libpq. Could you also post a link to the podcast? |
@macobo - I'm guessing it's The Bike Shed. |
Sorry that I didn't reply -- I had an unexpected baby. Yes, that is the correct link to the podcast. |
Well there is tokio-postgres now which you can use for your async stuff. |
Just want to clarify, will async make the 1.0 milestone? |
No. |
So... I've fully implemented a PoC for this twice now. I'm going to close this for the time being. This is not a statement that async I/O will never happen, but it is a statement that it is not currently on the roadmap, and I do not consider this to be actionable at this time. Neither tokio nor futures have stable APIs at the moment (tokio in particular is supposedly going to have a major overhaul at some point). At absolute minimum, we need to wait for things to settle down there. However, there are problems that need to be solved in those APIs as well. I found that in order to get to an API that I was happy with, I had to resort to some pretty nasty hacks which had a lot of gotchas and I'm not comfortable shipping. It seemed that ownership hits you way harder here as well. It's virtually impossible to have anything that isn't With the state of things as they are today, we would also need to re-implement connection pooling to make this work. This is on top of what would already be an enormous amount of work. This is not just "make it async". When we switch to tokio, we can no longer use the client libraries provided to us by the database creators. We need to learn the wire protocol that database uses, and implement it at the TCP level. Finally, I hear a lot of people saying that the lack of async support is the big blocker for them using Diesel, but I'm not buying it. PostgreSQL is the only backend likely to get an async driver any time soon (I do not know the MySQL wire protocol, and it is mostly undocumented). The Anyway with all that said, I'm going to close this issue. Open issues in this repo should represent something that is actionable, and that you could reasonably open a pull request to fix. I do not consider this to be actionable with the state of async in Rust today. I will keep my eye on that situation, and if that changes, I will open a new issue for this. But you should not expect async Diesel in the near future. |
I understand your frustrations with the current API, and understand you don't want to build async support just yet, only to re-do it later. All of that makes perfect sense, and this is your project so absolutely do what you feel is best. I just want to point out that your rationale for "async isn't a big deal" is full of selection bias (i.e. inaccurate), and you're doing yourself a disservice by believing those stats.
You're using the stats for However, this is inaccurate, people who view async as a blocker are not using sync-Rust instead. They are simply not using Rust. At least for me, I do not use Rust for any web based applications because, while hyper is pretty good, there is no great ecosystem around async in Rust yet. |
I want to re-iterate: I am not trying to say that "async isn't a big deal". I am trying to say it is not currently a priority, and it is not feasible for us to attempt to accomplish it at this point in time.
While you're probably right, and it may not have been a great stat to bring up, I'm not sure I agree with "if I can't use Diesel I'm just not going to use Rust" being the norm in this case. I'm also not trying to use the stats for
This is what those people would be using. But again, I'm probably calling too much attention to what was by far the smallest part of my reasoning for closing this (which I should also mention was discussed with the rest of the core team). This is mostly about it not being something we can do right now, which is why I wanted to close this issue. If circumstances change and async-diesel becomes feasible in the future, we'll open a new issue. But I want to avoid keeping issues open which cannot reasonably be closed with a pull request (this would of course be a very large pull request) |
Just a heads up on Pleingres, “An asynchronous, Tokio-based, 100% Rust postgres driver”, as mentioned in the Pijul blog. |
I don't know if this exists yet, but perhaps documentation (e.g. an example on the website) about how to integrate Diesel into an async project without blocking the whole event loop. If this example does connection pooling, then this seems like a reasonable solution until the async ecosystem in Rust stabilizes some. I'm using Diesel on a site that I hope will have high volume database operations (mostly simple CRUD, but relatively high volume, as in potentially hundreds or even thousands more-or-less concurrently). I'll be figuring out how to do pooling properly, but I honestly wouldn't go through this effort if I wasn't dead-set on using Rust and another solution has an out-of-the-box solution (e.g. Go is "good enough"). And honestly, an async interface into a thread pool of synchronous connections is probably better for a high traffic site anyway, which is probably the bulk of the "async or bust" crowd. That's not to say that async in Diesel isn't useful, just that we can probably solve the problem for most people with documentation. |
This method is considered as a workaround Just use use futures::prelude::*;
use futures_cpupool::CpuPool;
use diesel;
pub type Conn = diesel::pg::PgConnection;
pub fn exec_async<T, E, F, R>(&self, f: F) -> impl Future<Item = T, Error = E>
where
T: Send + 'static,
E: From<::diesel::result::Error> + Send + 'static,
F: FnOnce(&Conn) -> R + Send + 'static,
R: IntoFuture<Item = T, Error = E> + Send + 'static,
<R as IntoFuture>::Future: Send,
{
lazy_static! {
static ref THREAD_POOL: CpuPool = {
CpuPool::new_num_cpus()
};
}
let pool = /* clone a ref of diesel's connection pool */;
THREAD_POOL.spawn_fn(move || {
pool
.get()
.map_err(|err| E::from(err))
.map(|conn| f(&conn))
.into_future()
.and_then(|f| f)
})
} Then you can exec_async(|conn| {
diesel::insert_into(??).values(??).execute(conn)
}).and_then(...) |
not saying it's "wrong" |
@ivanovaleksey I can't say it's correct, as I didn't test it (e.g. I'm not sure about the trait bounds). In theory it should work. However,
So it works, but I'm not a fan of that approach. |
@ivanovaleksey Pretty much what @chpio said. This code simply masks the blocking operations behind a thread pool. This doesn't help in the high load use case where large number of incoming DB queries will cause unacceptable delays for each of them. |
See also actix-diesel example https://github.com/actix/examples/blob/f8e3570bd16bcf816070af20f210ce1b4f2fca69/diesel/src/main.rs#L64-L70 Perhaps offering a canonical pool implementation might be useful, that way diesel could have async/futures API without using async drivers (yet). This sync driver / async API is how slick (scala) works https://stackoverflow.com/a/29597868/1240268 and might suffice to appease the "people who view async as a blocker" (at least some of them). Aside:
IME it's desirable to have a fixed size (with potentially multiple pools), that way all connections are made on startup... and generally your db server won't allow an unlimited number of connections anyway. |
This article by the SQLAlchemy author is a very good read on the topic (while it's about Python many facts listed there still hold in the Rust world): |
I just want to understand something and hope someone can clarify. Interesting discussion so far. Based on #399 (comment) I am understanding that various async database libraries are able to provide async features but with the said drawbacks. It's not implemented in Diesel yet because these drawbacks are unsound due to limitations here - rust-lang/rust#62290. Following #399 (comment) the specific case where noticeable difference in performance can be observed is in general what? I'd like to understand what the very specific use cases @weiznich mentioned. It'll help me and others reading this weigh up correctly all the useful information already here. |
I expect that you can observe a measurable difference in performance if you:
|
There is also the very common use case of the application is using an async framework like |
This is true of using Diesel within any async context; until an io uring non-blocking solution is viable, the only reasonable way of using Diesel is with |
@Sytten Can you provide some numbers for that claim? Otherwise this is just guessing. |
Actix Web uses a thread-per-core single-threaded tokio runtime, which disables work-stealing, so when you block the runtime it's worse than in a standard multi-threaded tokio runtime. The thing is though, this isn't a common use case, it's an easily avoidable anti-pattern. Diesel should only be used in conjunction with |
I think @Bajix resumed it well, I use https://github.com/smol-rs/blocking personally but I know actix-diesel is a thing and |
I think @weiznich's point is whether making Diesel async from the ground up will make it any more performant than running queries on a blocking thread-pool. With a crate like tokio-diesel integration in an async environment is just as seamless. |
Just speculation but I'd imagine that any async solution predicated on internally using a blocking thread pool would be at best marginally faster than doing this externally as is now done. Async connection management can definitely be improved, but that could be an external crate. I think a better question is what things would make an endeavor to go full async worthwhile, and that's where the possibility of using io uring looks of particular interest. Were Diesel to be full async on top of io uring, then this could facilitate thread-per-core architecture with syscall level async I/O, and that would be considerably faster, but AFAIK more work needs to be done on the io uring side. |
@Sytten First of all it's your choice to life in an async world, not my choice. Again: Async is nothing that is really needed for handling many request, as long as you don't plan to handle larger amounts of traffic than for example crates.io. That written: Diesel supports this use case already by using the corresponding @ibraheemdev Thanks for clarifying that, that's exactly my point. |
Would you be open to a PR integrating tokio/async-std-diesel into diesel under a feature flag? |
I'm all for async APIs, but I don't think integrating "fake" (e.g. not using real async IO) async helpers in |
@ibraheemdev I do not see any reason why this needs to be in diesel itself. Beside of the issues mentioned by @lnicola there are these problems:
Because of these reasons I would prefer to keep it outside of diesel. This forces the work of maintaining this code on other persons + allows to figure out a good API by experimentation. What I'm willing to provide any basic building blocks in diesel that are required to build such a crate + I happily will answer diesel related questions coming during development. |
Not sure why the need for personal attacks. Rereading my comments I specifically didn't ask for it to be implemented by anybody. I just pointed out a common use-case for ORM in an environment that supports async. Many popular projects like SQLAlchemy in python made the decision to support both interface once the language started to support async officially. If thats not a priority for you, let the people who need it implement it and move on with you life. |
I’ll take this opportunity to plug bb8-diesel again, which provides an async connection pool for diesel; (it also wraps the connection types, such that all blocking connections are correctly flagged to the async runtime). I’ve used it before for multiple projects and found it just works. Unless you want/ need the underlying DB connections to be async, I see no reason why this would need to be supported within diesel 👍 |
@Sytten Thats because of this statement:
In the context of this thread and your previous postings here I interpret this statement as request to implement it because you need it for your use case. If that's not what you wanted to say then I misread that statement.
Python and rust are quite different, especially in terms of performance characteristics. So something that is required in python to achieve good performance is probably not required in rust. That's the reason why I'm asking for numbers.
Thats something I've suggested over and over in this thread again, but for some reason people just coming back here and continue to complain that it is not implemented yet. Most of them seem to be unwilling to put the required work in, they just want to take it for free. |
I think one of the more constructive things that can be done here is to implement a "proper" benchmark of To me, it looks like the async version is indeed faster, but both are performing quite admirably, especially in comparison to Django, Rails, Spring or whatever people coming from other languages will be used to. In the end, there are a lot of options. If you like To be honest, I don't even like ORMs that much 😅 so I don't have any bone to pick here, but I don't think it's such a tragedy if |
I think it'll be useful for everybody wanting async to read this comment. I arrived here on the search for native async diesel implementation, not 3rd party. I was confused why it did not exist yet so I read every comment. I now agree that Diesel should not be async right now. I want async but if the benefit is not significant then there's no point. I think that's how every software developer looks at a problem. No one has shown any evidence that sync Diesel to be significantly bad compared to any async alternative now. This tells me that async at this time may be overrated for use cases people have now. If you use any of the async libraries mentioned, you've already solved your problem. I think everybody likes Diesel because it is a sound library. If a sound library doesn't implement a popular feature, there must be a good reason. The reason is async solutions today are not sound or too complex - if it isn't then please make a pr. The closest thing to a real solution is this getting solved rust-lang/rust#62290 or somebody hacking this https://github.com/tokio-rs/io-uring. I like sound solutions. I'd prefer sound async instead of async with gotchas. If you absolutely wanted the best async possible today, I don't think you'd be using Diesel right now. If you want a convenient ORM that's good enough for broad range today including some async, use Diesel. |
I totally understand that our own benchmarks are not perfect by any means. That's the reason why I've ask for different implementations, but nobody seems to border. There cannot be any concurrency in those benchmarks as you cannot execute more than one statement per database connection at the same time. That's another reason why I believe that async database connections are not that important for performance. Instead use a async connection pool as bb8.
I don't believe those are that relevant either, as both benchmarks use different queries (note: Many updates for diesel vs one update for tokio-postgres) + have different handling for prepared statements (the tokio-postgres variant caches the prepared statement for all queries, diesel does only cache statements that are generally safe to cache. For example update statements are not cached) + both benchmarks use different actix configurations. (I do not have any in-depth actix knowledge, so some of this could be due to different requirements, but some of those options seem to be relevant, like the timeout thing)
That's basically that point I've written above quite often already 😉. Most people don't need to use async/await at all as they won't see the traffic where this really is necessary. |
(There I assume my previous attempt on the topic to be true) ...
diesel::async_transaction!(conn,|closure|{body}).await;
... I except this to turn into this: { async {
let mut conn: &mut Connection = conn; //moving in a reference, ensuring the type.
conn.transaction().await?.perform(|ref mut closure|{body}).await //note a ref mut binding mode, it guarantees that user cannot "leak" our connection.
}} Is it both an ergonomic API and safe API? |
@tema3210 I'm not sure I understand how this does solve the problem. The underlying problem with the transaction API is that we cannot really restrict the closure to the right lifetime in terms of the returned future. As far as I the closure remains in place for your proposed solution, so that does not really solve the problem. |
I've published a first preview diesel-async. It provides a drop in replacement of diesels |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Love the progress being made in diesel-async @weiznich! I wrote up deadpool-diesel-async on top of your work for pooling those sweet sweet async diesel connections. Heads up, I haven't written much async rust library code so it could use a healthy dose of scrutiny |
Food for thought/analysis/experiment: https://github.com/danielhenrymantilla/higher-order-closure.rs |
With futures-rs and tokio the Rust ecosystem recently got a whole new chapter in the async IO story. It would be pretty great for diesel to work with this as well.
From an enduser standpoint, a straightforward API change would be to 'just' use an
Async PgConnection
instead of the regularPgConnection
(DB specific for more efficient implementations?). All traits generic overConnection
(e.g.LoadDsl
) would get an associated type to determine if they are returning e.g. aQueryResult<T>
or aFuture<T>
.The text was updated successfully, but these errors were encountered: