Skip to content

Improvements/optimize#85

Draft
BrunoSanchez wants to merge 5 commits intomasterfrom
improvements/optimize
Draft

Improvements/optimize#85
BrunoSanchez wants to merge 5 commits intomasterfrom
improvements/optimize

Conversation

@BrunoSanchez
Copy link
Collaborator

Made some performance analysis and FFT caching.
Used AI tools to drive the analysis.

- Add comprehensive profiling toolkit with context managers, decorators, and utilities
- Include TimingMetrics and SubtractMetrics data classes for tracking performance
- Provide MetricsCollector for aggregating metrics across multiple runs
- Add benchmark_function() and compare_implementations() for A/B testing
- Support both simple timing and detailed cProfile analysis

This module enables systematic measurement of optimization improvements
and identification of performance bottlenecks in ProperImage operations.
- Add profile_subtract.py demonstrating 4 profiling modes (basic, detailed, config, benchmark)
- Add PROFILING.md with comprehensive usage guide and examples
- Include profiling for different subtract() configurations
- Document context managers, decorators, and benchmarking utilities

Examples show how to profile subtract() operations and compare
different optimization strategies.
- Establish performance baselines for subtract() operation
- Profile all configuration combinations (beta, shift, iterative)
- Identify key bottlenecks: PSF measurement (60%), FFT ops (21%), beta optimization (24%)
- Document 12x slowdown when shift parameter is enabled
- Provide optimization roadmap with prioritized recommendations

Test setup: 256x256 images, 30 simulated sources, 15x15 stamp size
Baseline timings:
- beta_only: 0.2802s
- beta_shift: 3.4094s (12x slower)
- iterative_beta: 0.2404s
- iterative_full: 3.3870s
- Precompute D_hat_n_ifft and D_hat_r_ifft outside optimization loops
- Replace repeated _ifftwn() calls with cached results scaled by 1/norm
- Apply optimization to all code paths: beta-only, shift-only, iterative

Performance improvements:
- beta_only: 22% faster (0.280s → 0.219s)
- iterative_beta: 17% faster (0.240s → 0.200s)
- beta_shift: 47% faster (3.409s → 1.801s)
- iterative_full: 49% faster (3.387s → 1.719s)

Shift configurations see biggest gains (~2x faster) due to more iterations
and higher FFT recomputation frequency. No new dependencies, fully
backward compatible, mathematically equivalent.
- Add comprehensive analysis of FFT caching performance gains
- Document 17-49% speedup across all configurations
- Explain optimization strategy and code changes
- Identify remaining bottlenecks (shift still 8x slower than beta-only)
- Recommend next optimization steps

This quick win provides significant performance improvement with minimal
code changes and no new dependencies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant