Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --transaction-isolation flag #1441

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

timvaillancourt
Copy link
Collaborator

@timvaillancourt timvaillancourt commented Aug 14, 2024

A Pull Request should be associated with an Issue.

Related issue: #1262

Further notes in https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md
Thank you! We are open to PRs, but please understand if for technical reasons we are unable to accept each and any PR

Description

This PR resolves #1262 by adding a --transaction-isolation flag that supports both REPEATABLE-READ (default - what GitHub tests) and READ-COMMITTED

In case this PR introduced Go code changes:

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

Signed-off-by: Tim Vaillancourt <[email protected]>
@arthurschreiber
Copy link
Member

arthurschreiber commented Oct 19, 2024

Do we really need this flag? I'm absolutely sure that mysql replication uses READ_COMMITTED when applying RBR changes from the binlog. As gh-ost does not support statement based replication, there's no point in using REPEATABLE_READ for the changelog applier and we should be able to use READ_COMMITTED always.

For the table copy part, I don't see a reason how READ_COMMITTED would have any negative side-effects either. Right now, REPEATABLE_READ might copy an "old" version of the row data, but the changelog applier will fix that up afterwards.

With READ_COMMITTED, we'll always read the "latest" version of the data, so in theory there could be less changes that need to be applied by the changelog applier, but I don't see any negative sides to this. 🤔

@timvaillancourt
Copy link
Collaborator Author

timvaillancourt commented Oct 19, 2024

@arthurschreiber I pondered using READ_COMMITTED 100% in an earlier thread when setting transaction isolation was introduced, but there was some concerns around using a new isolation level. Allowing users that previously were using READ_COMMITTED before the enforcement was introduced seemed like the easiest way forward

But for the most part I agree, REPEATABLE_READ isolation is usually not required and can actually introduce stale results as you mention. There is one spot where a snapshot read might have a benefit, however: the calculation of the min/max chunk ranges. A snapshot isolation guarantees both the min and max query are operating on the same data - which sounds like a good thing but I'm not 100% sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression in master: SET transaction_isolation = 'repeatable_read';
2 participants