`CallGraphNode!` optimizations #104

wheeheee · 2026-02-09T14:31:46Z

Was playing around with this stuff locally while writing the previous PR, but it's probably better split off from that.

Rewrote _ispow24 to remove a loop that doesn't optimize out
Check N > 0 for more safety
Made apparent the mutual exclusivity of branches (N a power of 2 / 3 / (1 or prime)) in CallGraphNode!
Use the Primes.Factorization object
Slightly reduce allocations, using searchsortedlast with isqrt(N) and exploit an inequality
Some purely cosmetic formatting

These are mostly changes to the implementation. The only algorithmic change is in cumprod(Ns), which can be changed to cumprod(reverse(Ns)) together with other minor changes to yield the same heuristic used previously. Is there a reason for the current order, or is it just preference? It feels slightly more natural to implement the way it is in this PR.

Using Cthulhu to descend into optimized and typed CallGraph{ComplexF64} shows that the PR makes the function much smaller. As a rough proxy, inlining cost decreases from ~2000 to ~800, although obviously CallGraphNode! still won't be inlined.

codecov · 2026-02-09T14:35:16Z

Codecov Report

❌ Patch coverage is 97.56098% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 95.99%. Comparing base (af2da8e) to head (5985765).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/callgraph.jl	96.15%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #104      +/-   ##
==========================================
- Coverage   96.22%   95.99%   -0.24%     
==========================================
  Files           4        4              
  Lines         424      424              
==========================================
- Hits          408      407       -1     
- Misses         16       17       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dannys4 · 2026-02-09T15:02:44Z

I assume you were referring to this line?

cumprod(Ns[end:-1:1])[end:-1:1]

the idea, if I recall correctly, was to knock out big factors asap so that I minimized the number of strides in lower recursions (i.e., if you wait to do larger radices until the end, you'll have to take a lot of strides). I have absolutely no idea if that actually benchmarked better bc that obv makes the lower recursions take larger strides. If this were a performance critical package, I would assume that one needs to optimize according to a heuristic that accounts for the page size.... perhaps another day.

dannys4 · 2026-02-09T15:15:45Z

src/callgraph.jl

+        N1_idx = searchsortedlast(N_cp, N_isqrt)
+        N1 = N_cp[N1_idx]       # N1 <= N_isqrt <= N_fsqrt
+        if N1_idx != lastindex(N_cp) && (abs(N_cp[N1_idx+1] - N_fsqrt) < (N_fsqrt - N1))
+            N1 = N_cp[N1_idx+1] # can be >= N_fsqrt


I try not to be too stingy on the codecov stuff, but I'm a little worried about introducing new logic here and then not adding a test for it. My understanding is that you're basically looking at the integer square root and seeing if N_cp[i] or N_cp[i+1] is closer to the floating-point sqrt. So worst-case, the output wouldn't be wrong, just lower performance runtime (since N1 will divide N nmw). I think it's fine, in that case, to not have a test. I just wanted some self-documentation because I've completely forgotten how this code works over the past 5 years.

I couldn't immediately think of some numbers to exercise that code path, and I think it is probably quite hard (~~maybe impossible, but I'm lazy to put pen to paper for this~~ [2,2,3] would do it I think) to hit that case, so I just put it there to be extra safe.

Also N1_idx != lastindex(N_cp) is in fact impossible to hit by the very nature of the problem, but it's probably good to put it there to help the compiler elide a bounds check and exception.

ah, but [2,2,3] would hit the first case...

dannys4 · 2026-02-09T15:16:10Z

btw thanks for the PRs in recent days, very exciting and the benchmarks look great!

wheeheee · 2026-02-10T02:50:54Z

I assume you were referring to this line?
cumprod(Ns[end:-1:1])[end:-1:1]
the idea, if I recall correctly, was to knock out big factors asap so that I minimized the number of strides in lower recursions (i.e., if you wait to do larger radices until the end, you'll have to take a lot of strides). I have absolutely no idea if that actually benchmarked better bc that obv makes the lower recursions take larger strides. If this were a performance critical package, I would assume that one needs to optimize according to a heuristic that accounts for the page size.... perhaps another day.

Yeah. Would have been better to let this cook for a bit longer and figure out a better strategy, but I don't immediately see anything too jarring in the benchmarks so I guess it's ok to let things go. (had a wild idea about randomizing the order of elements in Ns...but that's non-deterministic...)

dannys4 · 2026-02-10T02:55:06Z

figure out a better strategy

I think most of the FFTA use-case is for small arrays---I'll leave the better strategies to johnson and frigo, since it's really difficult to design cache-oblivious algorithms for mixed radix FFTs afaik.

randomizing the order of elements in Ns

lord have mercy.

wheeheee · 2026-02-10T03:00:48Z

FFTW's GPL license is really a pity, would have been wonderful if parts of it could have been translated directly into Julia 🥲

wheeheee added 3 commits February 9, 2026 21:58

popcnt, tz, minor stuff / formatting

3035d19

searchsortedlast?

a44b491

formatting

5985765

dannys4 reviewed Feb 9, 2026

View reviewed changes

dannys4 approved these changes Feb 9, 2026

View reviewed changes

dannys4 added this pull request to the merge queue Feb 9, 2026

Merged via the queue into JuliaMath:main with commit 54e2152 Feb 9, 2026
8 checks passed

wheeheee deleted the small_optim branch February 10, 2026 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`CallGraphNode!` optimizations #104

`CallGraphNode!` optimizations #104

Uh oh!

wheeheee commented Feb 9, 2026

Uh oh!

codecov bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

dannys4 commented Feb 9, 2026

Uh oh!

dannys4 Feb 9, 2026

Uh oh!

wheeheee Feb 10, 2026 •

edited

Loading

Uh oh!

wheeheee Feb 10, 2026

Uh oh!

dannys4 commented Feb 9, 2026

Uh oh!

Uh oh!

wheeheee commented Feb 10, 2026

Uh oh!

dannys4 commented Feb 10, 2026 •

edited

Loading

Uh oh!

wheeheee commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CallGraphNode! optimizations #104

CallGraphNode! optimizations #104

Uh oh!

Conversation

wheeheee commented Feb 9, 2026

Uh oh!

codecov bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dannys4 commented Feb 9, 2026

Uh oh!

dannys4 Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

wheeheee Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wheeheee Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

dannys4 commented Feb 9, 2026

Uh oh!

Uh oh!

wheeheee commented Feb 10, 2026

Uh oh!

dannys4 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wheeheee commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`CallGraphNode!` optimizations #104

`CallGraphNode!` optimizations #104

codecov bot commented Feb 9, 2026 •

edited

Loading

wheeheee Feb 10, 2026 •

edited

Loading

dannys4 commented Feb 10, 2026 •

edited

Loading