Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only use requires_new when needed #1108

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nikolai-b
Copy link

SAVEPOINTs are costly:
https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/
They were added in fc2ba56 to stop the events with the wrong version being committed
but if no version is specified then they are not needed.

I've not added any specs as I wanted to know your thoughts on this.

SAVEPOINTs are costly:
https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/
They were added in fc2ba56 to stop the events with the wrong version being committed
but if no version is specified then they are not needed.
@mostlyobvious
Copy link
Member

mostlyobvious commented Sep 9, 2021

I must say that's intriguing — I didn't find a drawback in this solution last time I was briefly thinking about it. But I think such a change would require much more verification and test cases to prove we're not letting in any data corruption.

For sure:

  • the wrapping transaction in app code using RES might or might not be there, that's something outside our control
  • there are inserts into two tables event_store_events and event_store_events_in_streams that must both either succeed or fail
  • WrongExpectedEventVersion with passed expecteced_version is one reason to fail, the other may come from different broken database constraint (event id uniqueness?) — what happens is such case?

Did you have a change to measure performance impact on RES with sample payloads/queries in you application? We're only adding one savepoint here.

RES also supports MySQL. Do you perhaps know if situation there is any different from Postgres?

@mostlyobvious
Copy link
Member

@paneq Any shower thoughts? 🙃

@paneq
Copy link
Member

paneq commented Sep 9, 2021

The included benchmark is totally not convincing for me right now. The conclusion in the article says:

Even if we take into account that the transactions in test 2 are one longer, this is still a performance regression of 60% compared with test 1.

But the author does not include 3rd test case where transactions are 50% longer (90 groups of statements instead of 60) but without savepoints (or with a different non-important statement) to prove that the savepoints are the problem and not the length itself. I think the author expected 50% more operation to take 50% more time. But the test is devised in a way that there is contention on read. I am not sure that scales linearly.

So first, I would like to see it reproduced. It does not sound hard but I don't have the time.

The PostgreSQL JDBC driver even has a connection parameter “autosave” that you can set to “always” to automatically set a savepoint before each statement and rollback in case of failure.

Which is also not our case. We don't use it after every active record statement. We use it per API call. Clients usually call event_store.publish once or a few times in a transaction.

Also, the original test uses prepared statements which Rails apps and event store don't. Every Active Record call from rails app makes a networking call to PG. I suspect that if there is any performance effect of using savepoints in PG, it's negligable compared to making one more SQL query.

@nikolai-b
Copy link
Author

Thanks yes, that article benchmark isn't as useful as I first thought, apologies. @paneq how about here where they make the exact same critique of the benchmark as you and add a select instead.

Our production was almost grinding to a halt, we found that the wait_event SubtransSLRU peaked at such times. We re-worked our code to not use SAVEPOINT and the problem went away. Since then I've been keeping an eye on how many SAVEPOINTs we issue and noticed that it has been increasing and tracked it down to this gem. I do not believe it is currently a performance issue for us so I'm unable to provide benchmarks of any use but I think it could be an issue in the future. As you said, for publish causes only one SAVEPOINT whereas in our previous problem there were over 64 per transaction. There is nothing to stop someone trying to publish 65 events one transaction which would hit this limit.

  • there are inserts into two tables event_store_events and event_store_events_in_streams that must both either succeed or fail

but these are in a transaction already so that will happen, or am I missing something? The existing test shows this, if you remove the subtransaction (i.e. the requires_new) and these test lines it passes.

The only reasons I can think to use subtransactions are:

  1. when ActiveRecord::Rollback is raised so that the outer transaction is committed but the inner one fails
  2. to avoid the inner transaction poisoning the outer one, e.g.
ActiveRecord::Base.transaction do
  user = build_user
  ActiveRecord::Base.transaction(requires_new: true) do
    # issue some invalid SQL
  rescue ActiveRecord::StatementInvalid
  end
  user.save!
end

here the user will always be saved.

Are you able to explain why you are using the subtransaction in the first place?

  • WrongExpectedEventVersion with passed expecteced_version is one reason to fail, the other may come from different broken database constraint (event id uniqueness?) — what happens is such case?

Yes, this is something I was very unsure about. Looking at the code I could only see RES refusing to add an event when the version mismatched but I wasn't sure about the other cases. To be able to answer this I'd need to understand why you are using a subtransaction (as above).

Thank you for taking the time to considering this.

@paneq
Copy link
Member

paneq commented Sep 19, 2021

Just FYI, I won't have the time soon to commit to this discussion.

@swistak35
Copy link
Contributor

I will try to elaborate on that later, but just for the sake of leaving some information: We did have encounter a situation in which lot of savepoints was serious performance issue (thousands of savepoints de facto brought our application down). That was on MySQL 5.7 (what is curious, we did not have that issue on MySQL 5.6).

@nikolai-b
Copy link
Author

If you need more convincing about SAVEPOINT being a bottleneck gitlab have a new post about why.

@mostlyobvious
Copy link
Member

Just wanted to say I'm revisiting this PR now. I've re-read linked articles and the comments you've all left in this thread.

Summary of all mentioned problems with SAVEPOINT:

  • cybertec post describes the effect of >64 subtransactions on Postgres (>64 SAVEPOINT without RELEASE, nested in each other) and it's effect on SubtransSLRU
  • gitlab post adds the issue with single SAVEPOINT within a long-running transaction for Postgres replicas
  • @swistak35 shared a story about MySQL 5.7 SAVEPOINT issues, that weren't an issue on MySQL — I cannot tell whether this involved nested SAVEPOINTs (without RELEASEs)
  • @nikolai-b shared a story on Postgres and overflowing SubtransSLRU with >64 nested SAVEPOINTs

Correct me if I'm wrong and understood the issues differently than you did.

I also want to respond to one statement about RES use of SAVEPOINT before digging deeper how we can perhaps not use SAVEPOINT at all:

As you said, for publish causes only one SAVEPOINT whereas in our previous problem there were over 64 per transaction. There is nothing to stop someone trying to publish 65 events one transaction which would hit this limit.

It is not possible with event_store.append to create nested SAVEPOINTs. As soon as event records are inserted to respective tables, there is a RELEASE. The event_store.publish is an event_store.append + pubsub. Pub-sub triggers only after append, so any event handler appending events will start new SAVEPOINT + RELEASE without being nested in the previous one. That said, calling event_store.append/publish 65 times would not overflow SubtransSLRU — these SAVEPOINTs are not nested within each other.

# executable sample: https://github.com/RailsEventStore/rails_event_store/commit/8dbabc0358a82488c454e3eec807a7d8ed5c9646

BEGIN
SAVEPOINT active_record_1
RELEASE SAVEPOINT active_record_1
SAVEPOINT active_record_1
RELEASE SAVEPOINT active_record_1
SAVEPOINT active_record_1
RELEASE SAVEPOINT active_record_1
SAVEPOINT active_record_1
RELEASE SAVEPOINT active_record_1
SAVEPOINT active_record_1
RELEASE SAVEPOINT active_record_1
SAVEPOINT active_record_1
...
COMMIT

@mostlyobvious
Copy link
Member

Are you able to explain why you are using the subtransaction in the first place?

I don't think there's a good reason to use subtransaction in RES as it is used currently. We definitely want to perform two INSERT statements in one transaction. So we wrap them in such.

Given event_store.append/publish may be called in application code that is already wrapped in a transaction, we should be good to use that one from application code. And you've pointed that out already @nikolai-b.

These three examples perform the same in terms of RES expectations:

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "activerecord"
  gem "pg"
end

require "active_record"

ActiveRecord::Base.establish_connection(ENV.fetch("DATABASE_URL"))
ActiveRecord::Schema.define do
  create_table :transactions, force: true do |t|
    t.string :label
    t.integer :amount
  end

  create_table :events, force: true do |t|
    t.string :name
    t.string :event_id
  end

  add_index :events, :event_id, unique: true
end

Event = Class.new(ActiveRecord::Base)
Transaction = Class.new(ActiveRecord::Base)

ActiveRecord::Base.logger =
  Logger
    .new(STDOUT)
    .tap do |l|
      l.formatter = proc { |severity, datetime, progname, msg| "#{msg}\n" }
    end

raise unless Transaction.count == 0
raise unless Event.count == 0
#  Transaction Count (0.1ms)  SELECT COUNT(*) FROM "transactions"
#  Event Count (0.1ms)  SELECT COUNT(*) FROM "events"

begin
  ActiveRecord::Base.transaction do
    Transaction.create(label: "kaka", amount: 100)

    event_id = SecureRandom.uuid
    Event.create(name: "deposited", event_id: event_id)
    Event.create(name: "deposited", event_id: event_id)
  end
rescue ActiveRecord::RecordNotUnique
end
raise unless Transaction.count == 0
raise unless Event.count == 0
#  TRANSACTION (0.0ms)  BEGIN
#  Transaction Create (0.2ms)  INSERT INTO "transactions" ("label", "amount") VALUES ($1, $2) RETURNING "id"  [["label", "kaka"], ["amount", 100]]
#  Event Create (0.1ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "628db1c3-a802-47bb-bcde-b68898fc8d1f"]]
#  Event Create (0.2ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "628db1c3-a802-47bb-bcde-b68898fc8d1f"]]
#  TRANSACTION (0.0ms)  ROLLBACK
#  Transaction Count (0.1ms)  SELECT COUNT(*) FROM "transactions"
#  Event Count (0.1ms)  SELECT COUNT(*) FROM "events"
  
begin
  ActiveRecord::Base.transaction do
    Transaction.create(label: "kaka", amount: 100)

    ActiveRecord::Base.transaction do
      event_id = SecureRandom.uuid
      Event.create(name: "deposited", event_id: event_id)
      Event.create(name: "deposited", event_id: event_id)
    end
  end
rescue ActiveRecord::RecordNotUnique
end
raise unless Transaction.count == 0
raise unless Event.count == 0
#  TRANSACTION (0.0ms)  BEGIN
#  Transaction Create (0.1ms)  INSERT INTO "transactions" ("label", "amount") VALUES ($1, $2) RETURNING "id"  [["label", "kaka"], ["amount", 100]]
#  Event Create (0.1ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "fcc5c949-c40e-4101-b0ba-60ab49fea0fd"]]
#  Event Create (0.1ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "fcc5c949-c40e-4101-b0ba-60ab49fea0fd"]]
#  TRANSACTION (0.0ms)  ROLLBACK
#  Transaction Count (0.0ms)  SELECT COUNT(*) FROM "transactions"
#  Event Count (0.0ms)  SELECT COUNT(*) FROM "events"
  
begin
  ActiveRecord::Base.transaction do
    Transaction.create(label: "kaka", amount: 100)

    ActiveRecord::Base.transaction(requires_new: true) do
      event_id = SecureRandom.uuid
      Event.create(name: "deposited", event_id: event_id)
      Event.create(name: "deposited", event_id: event_id)
    end
  end
rescue ActiveRecord::RecordNotUnique
end
raise unless Transaction.count == 0
raise unless Event.count == 0
#  TRANSACTION (0.0ms)  BEGIN
#  Transaction Create (0.1ms)  INSERT INTO "transactions" ("label", "amount") VALUES ($1, $2) RETURNING "id"  [["label", "kaka"], ["amount", 100]]
#  TRANSACTION (0.0ms)  SAVEPOINT active_record_1
#  Event Create (0.1ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "e7edef32-6dba-4935-bc69-ccb4d28aa68e"]]
#  Event Create (0.1ms)  INSERT INTO "events" ("name", "event_id") VALUES ($1, $2) RETURNING "id"  [["name", "deposited"], ["event_id", "e7edef32-6dba-4935-bc69-ccb4d28aa68e"]]
#  TRANSACTION (0.0ms)  ROLLBACK TO SAVEPOINT active_record_1
#  TRANSACTION (0.0ms)  ROLLBACK
#  Transaction Count (0.0ms)  SELECT COUNT(*) FROM "transactions"
#  Event Count (0.0ms)  SELECT COUNT(*) FROM "events"

The only reason why I'd personally prefer requires_new: true in the past was the wording used in the Rails/ActiveRecord documentation and the exclusive choice of raise ActiveRecord::Rollback exception in those examples. It tricked me into thinking it described something else.

This ActiveRecord::Rollback exception is a special one that gets very particular handling in transaction blocks. It finally makes all three examples listed above behave differently.
However we're not raising ActiveRecord::Rollback in RES, so dropping requires_new: true should not be an issue.

This was pointed out previously by @nikolai-b too:

The only reasons I can think to use subtransactions are:
when ActiveRecord::Rollback is raised so that the outer transaction is committed but the inner one fails

@mostlyobvious mostlyobvious self-assigned this Feb 2, 2024
@mostlyobvious
Copy link
Member

I'll rebase this PR next and review existing tests around transactions and expectations described earlier.

In the meantime I've compared use-vs-no-use of SAVEPOINT on Postgres 15.5 and the difference is not going to be noticeable:

      Warming up --------------------------------------
            with_savepoint     2.938M i/100ms
         without_savepoint     2.943M i/100ms
      Calculating -------------------------------------
            with_savepoint     29.249M (± 0.6%) i/s -    146.915M in   5.023031s
         without_savepoint     29.366M (± 0.7%) i/s -    147.173M in   5.012007s

It's still worth not doing something not necessary. And there are cases where even a single SAVEPOINT is a problem, as described in the Gitlab article.

@nikolai-b
Copy link
Author

Thank you so much for the time you've invested in this issue, it is very impressive.

The issue we experienced with SAVEPOINT in postgresql is not that it always makes pg slower but that it can cause issues when combined with other things, replicas, long running txs, lots of savepoints. Sorry to link to yet another blog post but basically SAVEPOINT can bite. I tried modifying your sample code to

perform_100_appends_in_outer_transaction =
  lambda do
    ActiveRecord::Base.transaction do
      id = RailsEventStoreActiveRecord::Event.limit(1).lock("FOR UPDATE").ids[0]
      100.times { event_store.append(mk_event.call) }
      RailsEventStoreActiveRecord::Event.where(id: id).update_all(id: id)
    end
  end

but to cause the performance issue I think it needs to do the update inside the SAVEPOINT. I've also tried using multiple threads to simulate many clients but still no slow down.

I don't think in RES the existing SAVEPOINT does much harm (as you release it quickly) but I'd argue that it isn't needed and for that reason should be removed.

mostlyobvious added a commit that referenced this pull request Feb 6, 2024
mostlyobvious added a commit that referenced this pull request Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

4 participants