crc: crc32_iscsi_by16_10 opt for short data #380

MaodiMa · 2025-12-30T03:45:36Z

This pull request contains 3 modifications:

Part 1: Do the same thing as part of https://github.com/intel/isa-l/pull/375, adding fastpath for short data. The threshold is changed to 8 bytes based on tested performance.

Part 2: Change the entrance alignment of crc32_iscsi_by16_10 from 16 to 64. This improves bandwidth about 5%-17% on our Intel(R) Xeon(R) Platinum 8480+ platform for short data under 16 bytes. But it does no impact on AMD or Hygon platform.

Part 3: Adjust the ordering of some instructions in 128_done. This brings ~5% improvement on Hygon platform for scenarios that highly depend on it, like 17-31 bytes. This does no impact on Intel or AMD platform.

This commit has no affect on long data processing.

Performance data are listed below (iteration/s):

	Hygon			Intel(R) Xeon(R) Platinum 8480+			AMD Ryzen 7 9700X (Zen5)
byte(s)	baseline	opt	speedup	baseline	opt	speedup	baseline	opt	speedup
1	138.78	156.11	12.49%	158.66	176.37	11.16%	549.02	613.71	11.78%
2	138.77	166.52	20.00%	159.21	187.70	17.90%	549.45	682.67	24.25%
3	131.47	166.54	26.67%	166.99	187.85	12.49%	577.76	681.54	17.96%
4	138.77	166.49	19.98%	159.93	187.82	17.44%	547.13	680.96	24.46%
5	138.77	166.51	19.99%	159.85	187.78	17.48%	546.41	681.42	24.71%
6	138.80	178.42	28.55%	161.03	201.47	25.12%	546.91	692.49	26.62%
7	138.78	156.11	12.49%	161.28	202.22	25.38%	547.06	692.48	26.58%
8	138.77	146.93	5.88%	161.71	168.60	4.26%	546.87	576.47	5.41%
9	138.77	146.93	5.88%	161.78	170.56	5.42%	546.67	577.29	5.60%
10	138.77	156.14	12.51%	161.96	177.43	9.55%	547.19	613.67	12.15%
11	138.76	146.94	5.89%	161.65	177.20	9.62%	547.05	613.85	12.21%
12	138.77	156.13	12.50%	161.15	177.44	10.10%	546.54	613.98	12.34%
13	138.78	146.94	5.88%	161.03	177.67	10.33%	546.33	614.06	12.40%
14	138.77	146.95	5.89%	161.30	188.50	16.86%	546.09	670.60	22.80%
15	138.74	124.90	-9.98%	161.29	188.91	17.12%	546.62	668.72	22.34%
16	146.93	192.16	30.79%	167.09	216.45	29.54%	554.33	692.55	24.93%
17	113.53	124.88	10.00%	158.29	177.64	12.22%	426.22	469.70	10.20%
18	113.54	124.88	9.99%	157.90	178.06	12.77%	425.06	470.97	10.80%
19	113.54	124.90	10.00%	158.30	178.32	12.65%	426.31	472.10	10.74%
20	113.54	124.90	10.00%	158.33	178.22	12.56%	427.78	470.23	9.92%
21	113.54	124.90	10.00%	158.37	178.70	12.84%	425.41	468.79	10.20%
22	113.53	124.89	10.01%	158.40	178.21	12.50%	425.39	469.31	10.32%
23	113.54	124.90	10.00%	158.38	178.25	12.55%	427.54	471.33	10.24%
24	113.53	124.91	10.02%	158.31	177.99	12.43%	425.99	465.73	9.33%
25	113.53	124.88	10.00%	158.55	177.91	12.21%	425.58	470.15	10.47%
26	113.53	124.90	10.02%	158.14	178.21	12.69%	427.32	468.44	9.62%
27	113.54	124.89	10.00%	157.84	178.75	13.25%	426.95	468.28	9.68%
28	113.55	124.89	9.99%	157.80	178.12	12.88%	425.31	468.46	10.15%
29	113.53	124.89	10.00%	157.82	178.38	13.02%	425.65	470.07	10.43%
30	113.55	124.90	10.00%	157.81	177.99	12.79%	424.60	469.40	10.55%
31	113.54	124.91	10.02%	157.80	177.58	12.53%	426.31	469.33	10.09%
32	131.47	143.09	8.84%	152.36	187.85	23.29%	507.03	554.00	9.26%
33	99.91	99.92	0.01%	152.96	180.36	17.92%	329.83	330.85	0.31%
34	99.91	99.92	0.01%	152.88	180.02	17.75%	328.65	331.69	0.92%

1024	37.67	37.71	0.12%	78.34	78.33	-0.01%	86.60	86.59	-0.02%
2048	19.68	19.67	-0.06%	40.90	40.90	0.00%	43.30	43.29	-0.02%
3072	13.27	13.26	-0.08%	28.70	28.70	0.00%	28.71	28.86	0.52%
4096	10.02	10.02	-0.01%	21.79	21.79	0.02%	21.56	21.65	0.39%
5120	8.04	8.04	0.00%	17.57	17.57	0.00%	17.26	17.32	0.33%
6144	6.72	6.72	0.07%	14.71	14.71	0.01%	14.39	14.43	0.26%
7168	5.77	5.77	-0.01%	12.65	12.66	0.04%	12.34	12.37	0.23%
8192	5.06	5.06	0.01%	11.10	11.11	0.06%	10.80	10.82	0.20%

Only 15 bytes scenario on Hygon platform got a negtive affect. Overall, the performance got improved.
Besides, you can see there is improvement for data longer than 32 bytes only on Intel platform. That is 64 bytes alignment working.

MaodiMa · 2026-01-23T06:34:46Z

@pablodelara Hi, sorry to bother. This pull request has been created for some time. I was wondering if there is anything we could do to improve this work?

pablodelara

Two minor comments, thanks!

pablodelara · 2026-01-26T23:06:11Z

crc/crc32_iscsi_by16_10.asm

-	je	.exact_16_left
-	jl	.less_than_16_left
-
 	vmovdqu	xmm7, [arg2]		; load the plaintext


Could you add a comment saying there are a between 17 and 32 bytes at this stage?

pablodelara · 2026-01-26T23:08:10Z

crc/crc32_iscsi_by16_10.asm

+	add	arg2,2
+
+.less_than_2:
+	test	arg3,1			; check if 1 byte remaining


Might be shorter to do "or arg3,arg3"

Sorry, but the test instr in other labels before will not change the result of arg3. If arg3 is 0b10, after processed in less_than_4, arg3 keeps the value of 0b10. Then or arg3, arg3 will lead to a incorrect not-taken branch. So we had to check only the least significant bit.

Fair enough, thanks!

- Add fastpath for short data - Change code align from 16 to 64 - Code reordering when dealing final 16 bytes after folding Signed-off-by: Maodi Ma <mamaodi@hygon.cn>

pablodelara · 2026-01-29T15:55:13Z

This PR is merged now, thanks!

pablodelara reviewed Jan 26, 2026

View reviewed changes

MaodiMa force-pushed the crc32c_avx512 branch from c286843 to 0231370 Compare January 28, 2026 03:46

crc: crc32_iscsi_by16_10 opt for short data

0231370

- Add fastpath for short data - Change code align from 16 to 64 - Code reordering when dealing final 16 bytes after folding Signed-off-by: Maodi Ma <mamaodi@hygon.cn>

pablodelara closed this Jan 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crc: crc32_iscsi_by16_10 opt for short data #380

crc: crc32_iscsi_by16_10 opt for short data #380

Uh oh!

MaodiMa commented Dec 30, 2025

Uh oh!

MaodiMa commented Jan 23, 2026

Uh oh!

pablodelara left a comment

Uh oh!

pablodelara Jan 26, 2026

Uh oh!

MaodiMa Jan 28, 2026

Uh oh!

pablodelara Jan 26, 2026

Uh oh!

MaodiMa Jan 28, 2026

Uh oh!

pablodelara Jan 28, 2026

Uh oh!

pablodelara commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

crc: crc32_iscsi_by16_10 opt for short data #380

crc: crc32_iscsi_by16_10 opt for short data #380

Uh oh!

Conversation

MaodiMa commented Dec 30, 2025

Uh oh!

MaodiMa commented Jan 23, 2026

Uh oh!

pablodelara left a comment

Choose a reason for hiding this comment

Uh oh!

pablodelara Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

MaodiMa Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

pablodelara Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

MaodiMa Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

pablodelara Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

pablodelara commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants