feat(bcf): Reduce allocations by using CStr8#478
Draft
killercup wants to merge 4 commits intorust-bio:masterfrom
Draft
feat(bcf): Reduce allocations by using CStr8#478killercup wants to merge 4 commits intorust-bio:masterfrom
killercup wants to merge 4 commits intorust-bio:masterfrom
Conversation
Previously, every time a field is written a `CString` was allocated as the methods only took `&[u8]` slices. Now, this work is put on the caller -- who probably either has a static list of tags or can pre-alloc and keep references to them around.
When writing strings, use a simple guess to alloc a vec buffer of the right size.
Pull Request Test Coverage Report for Build 16223472312Details
💛 - Coveralls |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As of today, every time a field is written, a
CStringis allocated as the methods only take&[u8]slices, which have no guaranteed\0termination.Solution
This PR puts the work on the caller -- who probably either has a static list of tags or can pre-allocate and keep references to them around. It changes many of the method signatures to take a
&CStr8, which is a type that ensures the data is both UTF-8 and null-terminated.I don't have isolated measurements, but alongside some other allocation fixes, this improves the runtime performance of the tool I'm working with by 1.41x when writing large-ish (~500MB) BCF files.
Alternatives considered
Take a
CStrThis is a type from the std lib, so there is no external dependency being introduced. Since the tags are also used in error messages, this would introduce UTF-8 checks, albeit in the cold path.
Introduce a custom type.
The cstr8 crate seems to do what we need here but is not very popular and also at version 0.1.x. Making a new custom type would put us in charge of it, with the ability to optimize further.
Do nothing. This costs a bunch of performance.
Downsides of this approach and future actions
Please let me know what you think of this. I'm using this branch in the project I'm working on and found no issues, but I also don't use all of rust-htslib. I'll keep this branch up to date.