Skip to content

Conversation

@matthew6s
Copy link

Closes #2797

Summary

Adds an async engine using aiohttp as a drop-in replacement for the synchronous requests-futures ThreadPoolExecutor approach.

Benchmark Results (real-world, 478 sites)

Engine Timeout Time Results Found
Sync (current) 15s 39.0s 13
Async (new) 15s 21.4s 14
Sync (current) 60s 65.8s 13
Async (new) 60s 63.6s 14

The async engine is ~1.8x faster and found an additional result that the sync engine missed (likely due to more efficient connection handling).

Changes

File Change
sherlock_project/async_engine.py New — async engine module
sherlock_project/sherlock.py Import + CLI flags + call routing
pyproject.toml Add aiohttp dependency

New CLI Flags

  • --workers N, -w N — max concurrent requests (default: 100)
  • --sync — use legacy synchronous engine

Backwards Compatibility

  • Default behavior switches to async
  • --sync preserves the existing behavior exactly
  • Return value is identical (same dict structure, same QueryResult objects)
  • All existing CLI flags work unchanged

New Dependency

  • aiohttp ^3.9.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: async engine with aiohttp for 3-5x performance improvement

1 participant