Optimization of mat_add_f32 and mat_sub_f32: leveraging LOB to merge … by AlbertHuang-CPU · Pull Request #291 · ARM-software/CMSIS-DSP

AlbertHuang-CPU · 2026-01-09T08:20:34Z

This style of C codes can result in the asm using LOB instruction which benefit both speed and code size as well as readbility.
The changes have been validated by using the Benchmarks and Tests under CMSIS-DSP/Testing/ directory.
The tests passed.
And the optimization result looks good.
e.g. for the Tests with Arm Clang 6.23 in FPGA(MPS3) environment,
the optimized: total cycle 72826; Program Size: Code=66424 RO-data=244968 RW-data=28 ZI-data=2098048
while the origin result: total cycle = 73043, Program Size: Code=66456 RO-data=244968 RW-data=28 ZI-data=2098048
Similarly we can make such optimization to f16, q15, mat_cmplx_mult, mat_vec_mult, etc.

…the tail process

christophe0606 · 2026-01-15T10:33:06Z

@AlbertHuang-CPU The reason why this optimization is not yet applied to all CMSIS-DSP functions is because there are still lots of cases where the compiler is generating worse code.
But, since in this case it works, I merge the PR.

Optimization of mat_add_f32 and mat_sub_f32: leveraging LOB to merge …

c172284

…the tail process

christophe0606 merged commit 7bfa537 into ARM-software:main Jan 15, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization of mat_add_f32 and mat_sub_f32: leveraging LOB to merge …#291

Optimization of mat_add_f32 and mat_sub_f32: leveraging LOB to merge …#291
christophe0606 merged 1 commit intoARM-software:mainfrom
AlbertHuang-CPU:main

AlbertHuang-CPU commented Jan 9, 2026

Uh oh!

christophe0606 commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlbertHuang-CPU commented Jan 9, 2026

Uh oh!

christophe0606 commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants