implement preds.py as a Rust wheel with PyO3 & Maturin#3181
Open
keithamus wants to merge 2 commits intospeced:mainfrom
Open
implement preds.py as a Rust wheel with PyO3 & Maturin#3181keithamus wants to merge 2 commits intospeced:mainfrom
keithamus wants to merge 2 commits intospeced:mainfrom
Conversation
992d6ba to
43e9887
Compare
This change is a proof of concept for how possible it would be to take individual python modules and re-implement them in Rust, in a "Ship of Theseus" style Rust rewrite. This takes some of the simplest parts of the codebase - a list of predicate free functions - and turns them into Rust checks. These functions become quite a bit more trivial in Rust as Rust implements many of these functions on `char`, so we simply delegate to those methods. These functions where chosen not because they're a slow path, but more because they are straightforward functions that return booleans and show demonstrate a good proof of concept. They certainly don't make things _slower_, however. The next steps would be to take some of the bigger functions in preds.py that do full string comparison (such as isXMLishTagname), and port those. This was avoided for now as these might be slightly more controversial, as we might want to include dependencies for fast string matching. I'm not very well versed in Python, this was mostly cobbled together using https://medium.com/@MatthieuL49/a-mixed-rust-python-project-24491e2af424 and https://colliery.io/blog/rust-python-pattern/ as guides. Some completely non empirical evaluation of timings, I ran the tests with/without Rust bindings: ```sh $ time BIKESHED_USE_RUST=0 ./bikeshed.py test --no-update Running tests |████████████████████| 502/502 [100%] in 3:04.5 (2.72/s) ✔ All tests passed. ________________________________________________________ Executed in 184.82 secs fish external usr time 176.82 secs 717.00 micros 176.82 secs sys time 1.99 secs 0.00 micros 1.99 secs $ time BIKESHED_USE_RUST=1 ./bikeshed.py test --no-update Running tests [R] |████████████████████| 502/502 [100%] in 3:04.4 (2.72/s) ✔ All tests passed. ________________________________________________________ Executed in 184.72 secs fish external usr time 177.16 secs 481.00 micros 177.16 secs sys time 1.96 secs 172.00 micros 1.96 secs ``` As expected both take essentially the same time. This may prove 1 of 2 things: - I'm an idiot and haven't done this right. - The FFI boundary between Python/Rust isn't costing us much (at least for simple checks like these).
43e9887 to
7ae646b
Compare
Collaborator
|
Oooh, hell yeah. I've been meaning to do this exact thing for a while. Note that I've got a huge change to the custom parser living in a branch, tho the edits to this file in particular are minimal. Don't go converting parser.py, tho. 😁 Another good test case will be the serializer, I think. Self-contained, lots of string building that I think is slow in Python, and a noticable chunk of time in large documents. I won't be able to dig into this more immediately, as my vacation starts today, but I'm very excited to find in when I get back. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change is a proof of concept for how possible it would be to take
individual python modules and re-implement them in Rust, in a "Ship of
Theseus" style Rust rewrite.
This takes some of the simplest parts of the codebase - a list of
predicate free functions - and turns them into Rust checks. These
functions become quite a bit more trivial in Rust as Rust implements
many of these functions on
char, so we simply delegate to thosemethods.
These functions where chosen not because they're a slow path, but more
because they are straightforward functions that return booleans and show
demonstrate a good proof of concept. They certainly don't make things
slower, however.
The next steps would be to take some of the bigger functions in preds.py
that do full string comparison (such as isXMLishTagname), and port those.
This was avoided for now as these might be slightly more controversial,
as we might want to include dependencies for fast string matching.
I'm not very well versed in Python, this was mostly cobbled together using
https://medium.com/@MatthieuL49/a-mixed-rust-python-project-24491e2af424
and https://colliery.io/blog/rust-python-pattern/ as guides.
Some completely non empirical evaluation of timings, I ran the tests
with/without Rust bindings:
As expected both take essentially the same time. This may prove 1 of 2
things:
for simple checks like these).