Conversation
- Add comprehensive profiling toolkit with context managers, decorators, and utilities - Include TimingMetrics and SubtractMetrics data classes for tracking performance - Provide MetricsCollector for aggregating metrics across multiple runs - Add benchmark_function() and compare_implementations() for A/B testing - Support both simple timing and detailed cProfile analysis This module enables systematic measurement of optimization improvements and identification of performance bottlenecks in ProperImage operations.
- Add profile_subtract.py demonstrating 4 profiling modes (basic, detailed, config, benchmark) - Add PROFILING.md with comprehensive usage guide and examples - Include profiling for different subtract() configurations - Document context managers, decorators, and benchmarking utilities Examples show how to profile subtract() operations and compare different optimization strategies.
- Establish performance baselines for subtract() operation - Profile all configuration combinations (beta, shift, iterative) - Identify key bottlenecks: PSF measurement (60%), FFT ops (21%), beta optimization (24%) - Document 12x slowdown when shift parameter is enabled - Provide optimization roadmap with prioritized recommendations Test setup: 256x256 images, 30 simulated sources, 15x15 stamp size Baseline timings: - beta_only: 0.2802s - beta_shift: 3.4094s (12x slower) - iterative_beta: 0.2404s - iterative_full: 3.3870s
- Precompute D_hat_n_ifft and D_hat_r_ifft outside optimization loops - Replace repeated _ifftwn() calls with cached results scaled by 1/norm - Apply optimization to all code paths: beta-only, shift-only, iterative Performance improvements: - beta_only: 22% faster (0.280s → 0.219s) - iterative_beta: 17% faster (0.240s → 0.200s) - beta_shift: 47% faster (3.409s → 1.801s) - iterative_full: 49% faster (3.387s → 1.719s) Shift configurations see biggest gains (~2x faster) due to more iterations and higher FFT recomputation frequency. No new dependencies, fully backward compatible, mathematically equivalent.
- Add comprehensive analysis of FFT caching performance gains - Document 17-49% speedup across all configurations - Explain optimization strategy and code changes - Identify remaining bottlenecks (shift still 8x slower than beta-only) - Recommend next optimization steps This quick win provides significant performance improvement with minimal code changes and no new dependencies.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Made some performance analysis and FFT caching.
Used AI tools to drive the analysis.