Skip to content

aruneshvv#233

Open
aruneshvv wants to merge 3 commits intotempestphp:mainfrom
aruneshvv:main
Open

aruneshvv#233
aruneshvv wants to merge 3 commits intotempestphp:mainfrom
aruneshvv:main

Conversation

@aruneshvv
Copy link

Summary

  • Multi-process parallel CSV parser using pcntl_fork with 8 workers
  • Key optimizations: integer date keys (YYYYMMDD) for faster hash lookups, zero-copy leftover handling, 2x loop unrolling, reference-based merge, igbinary serialization when available
  • Validated and deterministic output across runs

Test plan

  • php tempest data:validate passes
  • Deterministic output (3 runs produce byte-identical JSON)
  • Total visit count matches input line count (10M in = 10M out)
  • 268 unique paths correctly aggregated
  • Dates sorted ascending within each path
  • First-appearance key order preserved across parallel merge

/bench

Multi-process architecture with 8 workers using pcntl_fork, each
parsing newline-aligned file chunks via fread 8MB buffers. Key
optimizations: integer date keys (YYYYMMDD) for 57% faster hash
lookups during merge, zero-copy leftover handling across buffers,
2x loop unrolling, reference-based merge, igbinary when available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@brendt
Copy link
Member

brendt commented Mar 4, 2026

Benchmarking complete! Mean execution time: 7.88241270524s

@brendt brendt removed the verified label Mar 4, 2026
aruneshvv and others added 2 commits March 4, 2026 16:06
Replace nested hash tables with flat integer array for O(1) packed
array access in each worker. Key changes:

- Pre-computed slug->ID and date->ID mappings from Visit::all()
- 8-char date keys (YY-MM-DD) for faster hash lookups
- Comma search with fixed 52-char jump
- Element-wise array addition merge (replaces nested hash merge)
- ~30% faster parsing per worker + simpler merge phase

Benchmarked: 1.4-2.0s on 10M rows (vs 2.5s before).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Children now serialize only non-zero count entries (~60K) instead of
full flat array (880K entries), reducing temp file size ~14x and
speeding up serialization, deserialization, and merge phases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aruneshvv
Copy link
Author

/bench

@brendt
Copy link
Member

brendt commented Mar 4, 2026

Benchmarking complete! Mean execution time: 4.5404760714s

@brendt
Copy link
Member

brendt commented Mar 4, 2026

Milliseconds were harmed in the making of this improvement. ⏱️
🏆 leaderboard.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants