-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomersperformancerust
Description
Problem
Currently, Durak only has Python-based benchmarks (benchmarks/benchmark_rust_vs_python.py). We lack native Rust benchmarks using the industry-standard criterion framework.
This creates several issues:
- No micro-level performance tracking for individual Rust functions
- Cannot detect performance regressions in CI (blocked: [CI/CD] Add Automated Performance Regression Detection #106)
- Cannot validate optimization claims from [Enhancement] Add Cargo Release Profile Optimizations for 10-20% Performance Gain #111 (Cargo release optimizations)
- No baseline metrics for vowel harmony ([Enhancement] Add Vowel Harmony Support to Rust Suffix Stripping #108), morphotactics, etc.
Proposal
Add criterion-based benchmarks for core Rust functions in benches/:
Setup
# Cargo.toml
[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }
[[bench]]
name = "core_benchmarks"
harness = falseBenchmark Suite Structure
benches/
├── core_benchmarks.rs # Main benchmark suite
└── README.md # How to run & interpret results
Functions to Benchmark
Priority 1 (Hot Path):
fast_normalize()- Turkish normalizationtokenize_with_offsets()- Regex tokenizationstrip_suffixes()- Naive suffix strippingstrip_suffixes_validated()- Validated stripping with harmony
Priority 2 (Algorithmic):
vowel_harmony::check_vowel_harmony()- Harmony validationmorphotactics::validate_sequence()- Morphotactic constraintsroot_validator::is_valid_root()- Root validity checks
Priority 3 (Resource Access):
get_lemma_dict()- Dictionary loading (should be ~0 overhead)lookup_lemma()- HashMap lookup
Example Benchmark
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use durak_core::{fast_normalize, tokenize_with_offsets};
fn bench_normalization(c: &mut Criterion) {
let text = "İSTANBUL'dan geliyorum! 🇹🇷";
c.bench_function("fast_normalize", |b| {
b.iter(|| fast_normalize(black_box(text)))
});
}
fn bench_tokenization(c: &mut Criterion) {
let text = "Merhaba dünya! Bu bir test. https://example.com";
c.bench_function("tokenize_with_offsets", |b| {
b.iter(|| tokenize_with_offsets(black_box(text)))
});
}
criterion_group!(benches, bench_normalization, bench_tokenization);
criterion_main!(benches);Usage
# Run benchmarks
cargo bench
# View HTML reports
open target/criterion/report/index.html
# Compare against baseline
cargo bench --baseline main
git checkout feature-branch
cargo bench --baseline main # Shows % changeBenefits
- Quantifiable performance data: "10% faster" instead of "feels faster"
- Regression detection: CI can fail if perf drops >5%
- Optimization validation: Prove [Enhancement] Add Cargo Release Profile Optimizations for 10-20% Performance Gain #111's 10-20% claim with data
- Community confidence: Production-grade tooling signals quality
Implementation Plan
- Add
criteriontoCargo.tomldev-dependencies - Create
benches/core_benchmarks.rswith Priority 1 functions - Run baseline benchmarks on
mainbranch - Document in
benches/README.md - (Future) Integrate with [CI/CD] Add Automated Performance Regression Detection #106 for CI tracking
References
- Criterion docs: https://bheisler.github.io/criterion.rs/book/
- Related: [CI/CD] Add Automated Performance Regression Detection #106 (CI performance tracking)
- Related: [Enhancement] Add Cargo Release Profile Optimizations for 10-20% Performance Gain #111 (Cargo optimizations to validate)
- Similar: https://github.com/BurntSushi/ripgrep/tree/master/benchmarks
Labels
enhancement, performance, rust, good first issue
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomersperformancerust