SQLite experimental write hint allows smarter locking #301
Replies: 3 comments
-
Actually, the Fossil SCM (based on SQLite) does that all the time. All significant operations start a transaction without any hint/knowledge of whether it will eventually be read-only or read-write. Similarly, fossil's commands often COMMIT even when there was no writing because it's simpler to do so than to ask if it there's anything worth committing before doing so. So most commands in fossil end up doing something like:
But MAGIC will sometimes set the transaction into a rollback state so that it won't really commit. (Fossil predates SAVEPOINT and internally uses what amounts to a single reference-counted transaction hidden behind a pseudo-nested-transaction C API.) |
Beta Was this translation helpful? Give feedback.
-
BTW: that has now been dubbed The Hashimoto Protocol1 and i've just gotten the OK to enable it in our wasm builds. Footnotes
|
Beta Was this translation helpful? Give feedback.
-
|
I implemented a web demo to compare blocking VFS implementations with and without using the experimental write hint. The demo uses browser APIs currently supported only on Chromium-based browsers (e.g. Chrome, Edge). The demo is online here. Code and a description are here. Some observations with it are here. This is really an exploration of the experimental SQLite feature. Although it was built with wa-sqlite, that was just a convenience. So maybe general discussion should take place either in that GitHub repo or in the SQLite forum. Discussion specific to wa-sqlite can stay here. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
TL;DR SQLite can now send an experimental PRAGMA to notify a VFS that a write transaction follows. This allows a VFS to improve its locking implementation in interesting ways.
SQLite rollback journal locking model
The SQLite standard locking model (WAL mode uses a different model) has four pager-visible states:
In general, the SHARED state is for reading and the EXCLUSIVE state is for writing. The RESERVED state is for a transaction that intends to write but does not yet prevent other readers.
What is the purpose of the RESERVED state?
In order to maximize concurrency, SQLite minimizes the time spent in the EXCLUSIVE state, when only one connection can access the database. Typically all transaction changes to database pages are cached in memory and then at commit the connection enters the EXCLUSIVE state and writes those pages to the VFS.
The issue with deferring the EXCLUSIVE state until the last possible moment, is if more than one connection wants to write, we don't want them to do all their work and then have all but one have to throw that work away. SQLite uses the RESERVED state to identify and arbitrate writers as early as possible so they can avoid doing a lot of wasted work.
How does deadlock happen with SQLite?
Deadlock happens when more than one connection in the SHARED state wants to enter the RESERVED state, i.e. with concurrent write transactions. One connection will reach the RESERVED state, but will be unable to enter the EXCLUSIVE state until all other connections exit the shared state. The other connections will be unable to enter the RESERVED state because it is already occupied.
Applications using SQLite must handle this deadlock situation, which is signaled by SQLITE_BUSY, by issuing ROLLBACK if within an explicit transaction and retrying the entire transaction.
Why should I use BEGIN IMMEDIATE for multi-statement write transactions?
Most applications should use BEGIN IMMEDIATE for multi-statement transactions that might modify the database, as this tells SQLite to enter the RESERVED state as soon as possible, before any database contents are read or written. Otherwise, SQLite won't know to request RESERVED until encountering a write statement.
Just for completeness, using BEGIN DEFERRED (same as just plain BEGIN) for a write transaction can theoretically provide more concurrent performance in some unusual cases. For example, if the application decides mid-transaction whether to include a write statement or not, then executing it concurrently most of the time could be worth a higher chance of having to abandon work the rest of the time. However, it is difficult to come up with a practical scenario where that would be true.
How does the new experimental write hint help?
When a transaction starts with BEGIN IMMEDIATE/EXCLUSIVE or is an auto-commit write statement, SQLite knows that it will eventually be requesting the RESERVED state (assuming nothing exceptional happens first). Until now, however, the VFS didn't know, and it turns out that the VFS can do some interesting things to take advantage of that information. Here are some possibilities.
Single-lock read concurrency
A VFS doesn't have to implement the full SQLite locking model as long as the atomicity, consistency, and isolation guarantees (the ACI in ACID) are maintained. For example, a VFS can restrict access to a single connection total in the SHARED, RESERVED, and EXCLUSIVE states - i.e. only one connection at a time can access the database. This can be implemented with a single platform exclusive mode lock acquired/released when moving from/to the UNLOCKED state.
OPFSCoopSyncVFS is an example of a VFS that does this - it supports multiple connections but no concurrent transactions. It does this for two reasons: (1) currently only Chromium browsers support multiple open OPFS access handles on a file, and (2) mating a synchronous SQLite WebAssembly build with an asynchronous locking API (with this hack) is so slow that we only want to do it once per transaction.
With the new write hint, however, this VFS could be modified for Chromium to still use a single lock but in shared mode for read transactions and exclusive mode for write transactions. This would require the use of BEGIN IMMEDIATE/EXCLUSIVE for multi-statement write transaction, so a write statement within a BEGIN DEFERRED transaction would now be an error. That is a change to the SQLite semantics, but for many applications it would be a winning trade-off for some read concurrency.
Reduced deadlock
Deadlock is something to be avoided, if possible. In the fully implemented SQLite locking model, deadlock is easy to resolve with rollback and retry, but this generally results in a loss of efficiency and fairness. Connections typically choose to wait a small amount of time before retrying, but tuning the wait period can depend on the platform speed and load.
With the new write hint, if all potentially contending connections use BEGIN IMMEDIATE/EXCLUSIVE, connections that will eventually request the RESERVED state can be exclusively admitted to the SHARED state - i.e. all but one of those connections is blocked until the winning connection finishes its transaction. Being blocked at the SHARED state is much better than being deadlocked at the RESERVED state because no retry is needed so both efficiency and fairness are optimal.
It appears that with some extra state and logic a VFS that uses the write hint could still interoperate with a legacy VFS that doesn't (or a VFS that sometimes uses BEGIN DEFERRED for a write transaction). A mixed client environment contention with only cooperating clients would be deadlock-free while involving one or more non-cooperating clients could still produce deadlock as before. See the discussion here and a proof of concept implementation for Linux here.
What happens next?
The write hint signal has been implemented as an experimental internally-generated PRAGMA that will be in SQLite 3.52, whenever it is released. It requires a compile-time flag to be active. Since SQLite is often embedded into an application, as is required for web apps, the feature can be used as soon as someone wants to incorporate it.
It will be up to custom VFS writers to try it out and make a case for it to become a permanent and supported part of SQLite in a couple years or so.
Beta Was this translation helpful? Give feedback.
All reactions