-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk Transition Feature Request #414
Comments
Hey @skatenerd, thanks for your suggestion, this is something that I've thought about in the past, I know @lawrencejones has also, but neither of us have found the time to dig further. Also as you rightly pointed out there are a few things to consider, but it is definitely achievable and would be hugely valuable to many users (GoCardless included haha). Another potential complication are that recent changes have meant that MySQL users create transitions in a slightly bespoke way, meaning this would likely have to have be implemented in two different ways. Callbacks will also need some careful consideration. In short, it's definitely realistic, the primary blocker is unfortunately finding the time. That said I will aim to dedicate some time to give this more thought soon (even if that means a hacky POC) though I don't want to make any promises 😬 . |
One note with regards to callbacks, I was thinking maybe we could just expose a lower-level API here. It's kind of more tricky, as introduces new questions (like, what about when individual transitions in the bulk job are illegal), so it might make sense to build in more restrictions as to what it would do. I don't have free time to help with this these days but I'm asking my employer if they might let me do this on company time. |
Hey @skatenerd, thanks for opening this issue. I agree entirely that this is complex. I've considered this before, and some of the roadblocks would be: Postgres conflicts, ie. unique index violationsWe use Postgres constraints as a form of concurrency control right now, so that no transition could race another and try taking the same resource into two different states. For us to get the benefits of bulk transitioning, we'd need to handle those conflicts via the Postgres command that attempts the transition, so via an Any alternative would mean our bulk transition API would fail even for a single error, which might not be what we want. TransitionsState transitions are the reason I decided not to look at bulk transitioning, at least for our use case. As transitions are pure Ruby, they can do all sorts of things that we can't push into the database layer in a bulked fashion. The majority of the time spent transitioning our objects is spent in transitions, so unless we can somehow batch the transition steps, the speedup we get from bulk transitioning is limited. Of course, this might be different for you. There may also be an interesting middleground where we support bulk transitioning for state machines which define transition hooks in a form that can be pushed into a SQL dialect, and processed in bulk inside the database. That's an interesting proposition for sure. Lock lifetimesIf we're bulk transitioning, we'll hold locks for at least as long as the bulk update takes to process. This might hurt for OLTP systems that require locks are short-lived, say a system serving an HTTP API where people expect API requests to translate into statesman transitions. Longer transitions could mean we hurt API performance. Fwiw, we've compromised in this area already by wrapping our high throughput transition code in small transactions, held for about 3-5s. It gives us a performance boost that we're happy to exchange for the odd request taking ~3s longer, if it happens to collide with the work. But any bulk API would have to make this compromise clear. These were my thoughts on the issue. As with all performance chasing, it would be good to get an idea of where you're spending your time right now. Is it in the Do you have any of this data on hand? |
As an addendum, the constraints we've created via Statesman transitions are I imagine any work on bulk transitioning would benefit from an awareness of this. |
Thanks for all of the thoughtful responses.
What does this mean exactly? Are you referring to the before/after hooks? I don't have data on latency right now, but if I take 1000 models and arbitrarily transition them ( As for long lock lifetimes, this is just speculation, but an optimistic approach may be the way to go. IOW, having an approach that works happily if no concurrent transitions happen on the selected models, but which fails or lets the user try again in the case of contention. I really don't know what semantics would make the most users happy here, though. Perhaps a practical thing to do would be to build some bulk-transitioning functionality into my own app, and if it works well for my team's needs, then I'll present it here and see what people think of the semantics/API/etc. This is going to be really tough to do in a database-agnostic way. |
I am wondering if it makes sense to have some sort of "bulk transition" or "bulk trigger" functionality, where I can supply a big list of models (or IDs), and an event, and get the behavior that would come with
mymodels.each { |model| model.transition(from: :foo, to: :bar) }
.I think this is a use case many people probably have, and one which has pretty slow performance. I am hoping there's a way to transition many records, using fewer queries. I understand that this might not be possible, with concurrency concerns, etc. I am also totally open to having a significantly restricted API for the "bulk" version of a given task.
Anyway, please let me know if this seems realistic, or if there's a recommended workaround that I might use to do something like this in my own app!
I guess this is going to be tough since Rails 6 is the first version to offer bulk-inserts.
I've been thinking about what an "optimistic" approach to concurrency would look like here. Basically, look up all of the sort-keys of the existing most-recent transition records. Then begin a single transaction where the new transitions are inserted, and the relevant
most_recent
flags are updated. This way, if someone moves the DB into an unexpected state under my nose, my whole transaction will fail due to uniqueness constraint on the sort-keys.Alternatively, I could actively
lock
/select for update
those most-recent transition records, which would keep anyone from executing any harmful transitions under my nose.The text was updated successfully, but these errors were encountered: