-
Notifications
You must be signed in to change notification settings - Fork 336
crc: crc32_iscsi_by16_10 opt for short data #380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@pablodelara Hi, sorry to bother. This pull request has been created for some time. I was wondering if there is anything we could do to improve this work? |
pablodelara
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor comments, thanks!
| je .exact_16_left | ||
| jl .less_than_16_left | ||
|
|
||
| vmovdqu xmm7, [arg2] ; load the plaintext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment saying there are a between 17 and 32 bytes at this stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Done
| add arg2,2 | ||
|
|
||
| .less_than_2: | ||
| test arg3,1 ; check if 1 byte remaining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be shorter to do "or arg3,arg3"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but the test instr in other labels before will not change the result of arg3. If arg3 is 0b10, after processed in less_than_4, arg3 keeps the value of 0b10. Then or arg3, arg3 will lead to a incorrect not-taken branch. So we had to check only the least significant bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, thanks!
c286843 to
0231370
Compare
- Add fastpath for short data - Change code align from 16 to 64 - Code reordering when dealing final 16 bytes after folding Signed-off-by: Maodi Ma <mamaodi@hygon.cn>
|
This PR is merged now, thanks! |
This pull request contains 3 modifications:
Part 1: Do the same thing as part of https://github.com/intel/isa-l/pull/375, adding fastpath for short data. The threshold is changed to 8 bytes based on tested performance.
Part 2: Change the entrance alignment of
crc32_iscsi_by16_10from 16 to 64. This improves bandwidth about 5%-17% on our Intel(R) Xeon(R) Platinum 8480+ platform for short data under 16 bytes. But it does no impact on AMD or Hygon platform.Part 3: Adjust the ordering of some instructions in
128_done. This brings ~5% improvement on Hygon platform for scenarios that highly depend on it, like 17-31 bytes. This does no impact on Intel or AMD platform.This commit has no affect on long data processing.
Performance data are listed below (iteration/s):
Only 15 bytes scenario on Hygon platform got a negtive affect. Overall, the performance got improved.
Besides, you can see there is improvement for data longer than 32 bytes only on Intel platform. That is 64 bytes alignment working.