-
Notifications
You must be signed in to change notification settings - Fork 32
add per-ref locking to gracefully handle push races #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hey @massi-ang! wanted to check if you had a chance to skim these changes or had any opinions on the proposal please let me know if there's any additional context or detail i can provide -- goes without saying, but we really appreciate you taking the time to maintain the library! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this PR and the effort put in creating it.
The original design idea was to keep things as simple as possible and have as little failure scenarios as possible. Introducing a lock mechanism adds an additional failure scenario in case any of the locks fails to be deleted. As with the multiple bundle scenario, this should be detected and you should provide an easy way to fix it.
I like your clever implementation, but I wonder if using conditional writes wouldn't be simpler. By using conditional write with If-None-Match, you can have each client write a LOCK object under the /remote_ref/ prefix, but only one would succeed. It only requires updating the acquire_lock and release_lock function. It might also have the advantage to reduce the failure scenarios since only one additional object is written instead of the at least 2 of your implementation, hence one less object that might fail to be written or deleted.
git_remote_s3/remote.py
Outdated
| # Otherwise, we proceed with pushing the new bundle | ||
| current_contents = self.get_bundles_for_ref(remote_ref) | ||
| if len(current_contents) > 1: | ||
| return f'error {remote_ref} "multiple bundles exists on server. Run git-s3 doctor to fix."?\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error could also point mention to upgrade git-remote-s3 to the latest version, since this should be the only case we get duplicate bundles on the remote. Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep exactly,
good point
git_remote_s3/remote.py
Outdated
| self.release_lock(remote_ref, lock_key) | ||
| except Exception: | ||
| logger.info(f"failed to release lock {lock_key} for {remote_ref}") | ||
| # Clean up temp bundle that might've been written but never got pushed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this comment. This line is to remove the local bundle even if it got pushed
git_remote_s3/remote.py
Outdated
| if lock_key: | ||
| try: | ||
| self.release_lock(remote_ref, lock_key) | ||
| except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be more specific with the exception you are catching. We want to be sure the lock has been deleted otherwise
any further git push (by any client) will fail, since this lock that failed to be delete will always be the first one lexicographically. Can you propose a way to recover from such situation or at least to notify the user in the eventuality this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you propose a way to recover from such situation or at least to notify the user in the eventuality this happens?
This is an excellent call out. I was thinking of a couple options here, but, again, I really want to respect the simplicity of the existing design:
- add lock TTL + takeover
- on 412's (lock exists), read the lock LastModified
- if the lock is older than TTL, try a conditional delete and immediately reacquire the lock
- if the delete races and fails, back and retry
2a. Add retries and notify user on repeated failures
- if client fails to acquire after X retries or Y seconds, return a clear error containing the culprit lock key, age, etc and tell the user to run doctor
2b. add lock cleanup to git-remote-s3 doctor
- have cmd list locks older than TTL by age
- add
--force-clear-locksto remove stale locks or add interactive option - use conditional delete to avoid race with active lock
I'm slightly biased towards 2a + 2b as this still manages to keep the client implementation fairly simple.
I'd love to get your opinions here or any other ideas you might have!
|
Thank you for taking the time give a thoughtful review @massi-ang! I really wanted to minimize the amount of complexity I introduced with the locking design, especially since you've done a great job of keeping this client design simple and transparent. I think using S3 conditional writes is a great idea. You're definitely right: doing so significantly simplifies the locking design and reduces failure modes! I hadn't even considered leveraging built-in S3 features! I agree that the design introduces a failure mode when the lock isn't deleted, I think this could also be framed as trading the multi-bundle failure mode for the left-over lock failure mode. |
|
Hey @massi-ang, wanted to double-check if you've had a chance to check out these changes I made to address your comments!
please let me know if you've got any other ideas or comments! |
git-remote-s3 doctorcli