Skip to content

aruneshvv (single thread)#234

Open
aruneshvv wants to merge 3 commits intotempestphp:mainfrom
aruneshvv:single-thread
Open

aruneshvv (single thread)#234
aruneshvv wants to merge 3 commits intotempestphp:mainfrom
aruneshvv:single-thread

Conversation

@aruneshvv
Copy link

Summary

  • 🚂 Single thread submission
  • Optimized single-core CSV parser with fread 8MB buffers
  • Key optimizations: integer date keys (YYYYMMDD) for faster hash lookups, zero-copy leftover handling, 2x loop unrolling, inlined offset math
  • No multi-process logic — pure single-thread solution

Test plan

  • php tempest data:validate passes
  • Deterministic output (byte-identical to multi-thread solution)
  • Total visit count matches input line count
  • Dates sorted ascending within each path
  • First-appearance key order preserved

/bench

aruneshvv and others added 2 commits March 4, 2026 10:20
Multi-process architecture with 8 workers using pcntl_fork, each
parsing newline-aligned file chunks via fread 8MB buffers. Key
optimizations: integer date keys (YYYYMMDD) for 57% faster hash
lookups during merge, zero-copy leftover handling across buffers,
2x loop unrolling, reference-based merge, igbinary when available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Single-core solution with fread 8MB buffers, integer date keys
(YYYYMMDD) for faster hash lookups, zero-copy leftover handling,
2x loop unrolling, and inlined offset math. No pcntl_fork or
multi-process logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@brendt
Copy link
Member

brendt commented Mar 4, 2026

Benchmarking complete! Mean execution time: 33.11506546938s

@brendt brendt removed the verified label Mar 4, 2026
Replace nested hash table ($data[$path][$date]++) with flat integer
array ($counts[slugBase + dateId]++) for O(1) packed array access.

Key optimizations:
- Pre-computed slug->ID and date->ID mappings from Visit::all()
- 8-char date keys (YY-MM-DD) for faster hash lookups
- Comma search with fixed 52-char jump (skip domain+timestamp)
- Eliminates per-path inner hash table overhead

Benchmarked 2x faster parsing (3.4s vs 7.5s on 10M rows).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aruneshvv
Copy link
Author

/bench

@brendt
Copy link
Member

brendt commented Mar 4, 2026

Benchmarking complete! Mean execution time: 17.8358065843s

@brendt
Copy link
Member

brendt commented Mar 4, 2026

Mean time? More like meme time. You're cooking. 🍳
🏆 leaderboard.csv

@brendt
Copy link
Member

brendt commented Mar 4, 2026

You just made the CPU do less cardio. 🫀
🏆 leaderboard-single-thread.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants