Skip to content

[Enhancement] Add Criterion Rust Benchmarks for Performance Tracking #113

@ada-cinar

Description

@ada-cinar

Problem

Currently, Durak only has Python-based benchmarks (benchmarks/benchmark_rust_vs_python.py). We lack native Rust benchmarks using the industry-standard criterion framework.

This creates several issues:

Proposal

Add criterion-based benchmarks for core Rust functions in benches/:

Setup

# Cargo.toml
[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }

[[bench]]
name = "core_benchmarks"
harness = false

Benchmark Suite Structure

benches/
├── core_benchmarks.rs    # Main benchmark suite
└── README.md             # How to run & interpret results

Functions to Benchmark

Priority 1 (Hot Path):

  • fast_normalize() - Turkish normalization
  • tokenize_with_offsets() - Regex tokenization
  • strip_suffixes() - Naive suffix stripping
  • strip_suffixes_validated() - Validated stripping with harmony

Priority 2 (Algorithmic):

  • vowel_harmony::check_vowel_harmony() - Harmony validation
  • morphotactics::validate_sequence() - Morphotactic constraints
  • root_validator::is_valid_root() - Root validity checks

Priority 3 (Resource Access):

  • get_lemma_dict() - Dictionary loading (should be ~0 overhead)
  • lookup_lemma() - HashMap lookup

Example Benchmark

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use durak_core::{fast_normalize, tokenize_with_offsets};

fn bench_normalization(c: &mut Criterion) {
    let text = "İSTANBUL'dan geliyorum! 🇹🇷";
    
    c.bench_function("fast_normalize", |b| {
        b.iter(|| fast_normalize(black_box(text)))
    });
}

fn bench_tokenization(c: &mut Criterion) {
    let text = "Merhaba dünya! Bu bir test. https://example.com";
    
    c.bench_function("tokenize_with_offsets", |b| {
        b.iter(|| tokenize_with_offsets(black_box(text)))
    });
}

criterion_group!(benches, bench_normalization, bench_tokenization);
criterion_main!(benches);

Usage

# Run benchmarks
cargo bench

# View HTML reports
open target/criterion/report/index.html

# Compare against baseline
cargo bench --baseline main
git checkout feature-branch
cargo bench --baseline main  # Shows % change

Benefits

Implementation Plan

  1. Add criterion to Cargo.toml dev-dependencies
  2. Create benches/core_benchmarks.rs with Priority 1 functions
  3. Run baseline benchmarks on main branch
  4. Document in benches/README.md
  5. (Future) Integrate with [CI/CD] Add Automated Performance Regression Detection #106 for CI tracking

References

Labels

enhancement, performance, rust, good first issue

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions