Skip to content

Comments

Compression target via rocksdb#3

Merged
ming1 merged 12 commits intomainfrom
compress-rocksdb
Jul 9, 2025
Merged

Compression target via rocksdb#3
ming1 merged 12 commits intomainfrom
compress-rocksdb

Conversation

@ming1
Copy link
Collaborator

@ming1 ming1 commented Jul 8, 2025

Add rocksdb based compression target.

ming1 added 3 commits July 8, 2025 08:42
This patch introduces a new `compress` target for rublk, which uses
the RocksDB key-value store as a backing device to provide a
compressed userspace block device.

The implementation includes several key features and optimizations:

- **RocksDB Backend:** Leverages RocksDB for storing block data, with
  the key as the block address (u64) and the value as the 512-byte
  block content.

- **Configurable Compression:** The compression algorithm can be
  selected via a command-line argument, with LZ4 as the default.

- **Optimized Reads/Writes:** I/O is optimized by using batch
  operations (`multi_get` for reads and `WriteBatch` for writes) to
  reduce overhead.

- **Configuration:** The device size and RocksDB data directory are
  configurable via command-line arguments. Device configuration is
  persisted in a JSON file for recovery.

- **Cache Settings:** The device is configured with the
  `UBLK_ATTR_VOLATILE_CACHE` attribute, and RocksDB is tuned with
  a block cache and bloom filters to improve read performance.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Offload the blocking `db.flush()` operation to a background thread
to prevent it from stalling the main ublk I/O handler.

This is achieved by:
- Spawning a dedicated thread to perform the RocksDB flush.
- Using an `eventfd` to notify the main thread when the flush
  operation is complete.
- The main I/O handler now polls the `eventfd` and completes the
  queued flush commands only after the background flush has
  finished.

This change makes the flush operation asynchronous from the perspective
of the ublk I/O loop, improving responsiveness. Also, `WriteOptions`
are now used to disable WAL and sync for normal writes, as flushes
are handled explicitly.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Remove the extra 'ublk_compress' subdirectory and store the RocksDB
database files directly in the directory provided by the --dir
command-line argument.

This simplifies the directory structure and makes the behavior more
intuitive for users.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
@ming1 ming1 force-pushed the compress-rocksdb branch 3 times, most recently from 548875f to 7b37638 Compare July 8, 2025 09:31
Implement support for the `UBLK_IO_OP_DISCARD` command in the compress
target.

This is achieved by:
- Advertising discard support to the ublk core by setting the
  `UBLK_PARAM_TYPE_DISCARD` parameter type and configuring the
  discard granularity and limits.
- Using the optimized `db.delete_range_cf()` RocksDB method to delete
  all keys within the specified sector range in a single, efficient
  operation. This avoids the overhead of iterating over a potentially
  very large range and creating a large `WriteBatch`.
- Opening the RocksDB instance with an explicit column family ("default")
  to get the required handle for the `delete_range_cf` operation.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
@ming1 ming1 force-pushed the compress-rocksdb branch from 7b37638 to 2c8321c Compare July 8, 2025 11:37
@ming1 ming1 changed the title Compress rocksdb Compression target via rocksdb Jul 8, 2025
@ming1 ming1 force-pushed the compress-rocksdb branch 3 times, most recently from 09da1b8 to f276799 Compare July 9, 2025 03:38
ming1 added 3 commits July 9, 2025 04:00
Add a suite of integration tests for the `compress` target to
validate its functionality, including basic I/O, recovery, and
use as a standard block device.

The following tests have been added:
- `test_ublk_add_del_compress`: Verifies basic device creation,
  I/O operations (read/write), and deletion.
- `test_ublk_compress_recover`: Tests the device recovery mechanism
  after a simulated failure.
- `test_ublk_format_mount_compress`: Validates the device by
  formatting it with an ext4 filesystem, mounting it, and performing
  file I/O.

To support this, a reusable test harness `__test_ublk_add_del_compress`
has been created, and the test device size has been set to 8G.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
To improve performance and avoid blocking the main ublk I/O loop,
this patch offloads blocking RocksDB read operations to a dedicated
worker thread.

This is achieved by:
- Spawning a new thread to handle all `db.multi_get()` calls.
- Using a channel to send read jobs from the main I/O handler to
  the worker thread. The buffer address is safely passed by casting
  it to a u64.
- Using a second channel for the worker to send completion data
  (or errors) back to the main thread.
- Integrating with the `ublk` event loop by using an `eventfd` to
  notify the main thread when a read completion is ready. The
  `io_handler` now polls this `eventfd` and processes the
  completion queue.

This change makes read operations asynchronous from the perspective
of the ublk I/O loop, significantly improving responsiveness for
concurrent workloads.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Refactor the flush command handling to align with the asynchronous
offloading pattern used by the read command.

This change introduces a dedicated job/completion channel and a
background thread specifically for flush operations. This approach
removes the more complex shared `pending_tags` mutex and simplifies
the `io_handler` logic.

Now, both `READ` and `FLUSH` commands are handled symmetrically,
each with its own worker thread and eventfd for notification, making
the asynchronous flow more consistent and easier to maintain.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
@ming1 ming1 force-pushed the compress-rocksdb branch 2 times, most recently from 9f94908 to 47a09cd Compare July 9, 2025 05:55
ming1 added 5 commits July 9, 2025 13:06
Ensure that the logical and physical block sizes for a `compress`
target device are immutable after first creation.

This is achieved by:
- Adding `logical_block_size` and `physical_block_size` fields to
  the `ublk_compress.json` configuration file.
- On first run, the block sizes are determined from command-line
  arguments or defaults and saved to the JSON file.
- On subsequent runs for an existing device, these values are read
  from the JSON file, and any block size arguments from the command
  line are ignored.

This guarantees that the device geometry remains consistent across
restarts, preventing potential data corruption or filesystem errors.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Change the storage format for the compress target to improve
performance and better align with typical I/O patterns. Instead of
storing a fixed 512-byte sector per key, each key now stores one
full logical block.

The key is still the 512-byte sector offset of the start of the
logical block.

This change modifies the read, write, and discard handlers to
calculate I/O operations in terms of logical blocks rather than
sectors, reducing the number of required database operations for
any given I/O request.

The RocksDB block size has also been adjusted to be a multiple of
the logical block size for better cache alignment.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Wire up the ublk read-only flag with the RocksDB backend to ensure
consistent behavior.

This is achieved by:
- Checking the read-only flag from the device parameters when adding
  a new `compress` target.
- If the device is read-only, the RocksDB database is now opened in
  secondary (read-only) mode using `DB::open_cf_as_secondary`.
- The I/O handler now rejects any `WRITE` or `DISCARD` commands with
  an `EACCES` error if the device is in read-only mode, providing a
  fast-path failure without attempting to modify the database.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
 cargo:warning=/usr/include/c++/13/cstdint:38:10: fatal error: bits/c++config.h: No such file or directory
 cargo:warning=   38 | #include <bits/c++config.h>
 cargo:warning=      |          ^~~~~~~~~~~~~~~~~~

Also install rustc 1.85.0, otherwise some package may not be built.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
It takes too long, and not reliable, so remove test ci.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
@ming1 ming1 force-pushed the compress-rocksdb branch from 47a09cd to de590be Compare July 9, 2025 13:07
@ming1 ming1 merged commit 77270c4 into main Jul 9, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant