__ __ _
\ \/ _\ ___ ___ _ _| |_
\ \ \ / __/ _ \| | | | __|
/\_/ /\ \ (_| (_) | |_| | |_
\___/\__/\___\___/ \__,_|\__| @CyInnove
Fast, scope-aware, headless crawling framework to extract Dynamic JS files.
# Install JSCOUT
go install github.com/cyinnove/jscout/cmd/jscout@latest
# Start crawling
jscout -u https://example.com -max-depth 1 -o -| Feature | Description |
|---|---|
| 🕸️ Headless Browser | Chrome/Chromium powered crawling with chromedp |
| 🎯 Scoped BFS | Host suffix allow-list for targeted crawling |
| ⚡ Dynamic JS Extraction | Captures both static and dynamic JavaScript files |
| 📥 Flexible Input | URL, file list, or stdin with auto-normalization |
| 📊 Multiple Formats | txt, jsonl, csv output (unique txt by default) |
| 🔄 Concurrency Control | Configurable parallel crawling for speed |
| 🎛️ Smart Filtering | Optional JS-in-scope filtering |
| ⚙️ Customizable | User-Agent, timeouts, waits, and more |
| 🐳 Docker Ready | Pre-built image with Chromium included |
go install github.com/cyinnove/jscout/cmd/jscout@latestRequirements: Go 1.22+ and Chrome/Chromium
git clone https://github.com/cyinnove/jscout
cd jscout
go build -o jscout ./cmd/jscoutBinary will be at ./jscout (Linux/macOS) or jscout.exe (Windows).
go get github.com/cyinnove/jscout@latestExample Usage:
package main
import (
"fmt"
"github.com/cyinnove/jscout/lib"
)
func main() {
opts := lib.DefaultOptions()
opts.Seeds = []string{"https://example.com"}
recs, err := lib.Crawl(opts)
if err != nil { panic(err) }
fmt.Printf("found %d JS files\n", len(recs))
}Build locally:
docker build -t cyinnove/jscout:latest .Run:
docker run --rm -it \
--network host \
cyinnove/jscout:latest -u https://example.com -max-depth 1 -o -📝 Notes:
- Image includes
chromium; Chrome sandbox is disabled viaJSCOUT_NO_SANDBOX=1for container compatibility- Use
-o -to write results to stdout- To read local files, mount volumes:
-v "$PWD:/data"and-l /data/seeds.txt
jscout -u https://news.ycombinator.com -max-depth 0 -format txt -o -
# See all available flags
jscout --helpDepth + scope file + concurrency:
jscout -l seeds.txt \
--scope-file scope.txt \
--max-depth 2 --max-pages 500 \
--concurrency 6 \
-format jsonl -o results.jsonlStdin seeds:
cat domains.txt | jscout --stdin --scheme https -o -Include third-party JS:
jscout -u https://example.com --js-in-scope=false -o -Custom User-Agent and Chrome path:
jscout -u https://target.tld \
--user-agent "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118 Safari/537.36" \
--chrome-path /usr/bin/chromium-browser| Flag | Description | Default |
|---|---|---|
-u |
Single seed (URL or host) | - |
-l |
File with seeds (one per line) | - |
--stdin |
Read seeds from STDIN | - |
--scheme |
Default scheme for host-only seeds | https |
| Flag | Description | Default |
|---|---|---|
--scope |
Comma-separated allowed host suffixes | Seed hosts |
--scope-file |
File with allowed suffixes (one per line) | - |
| Flag | Description | Default |
|---|---|---|
--max-depth |
Crawl depth from seeds | 1 |
--max-pages |
Limit pages (0 = unlimited) | 100 |
--concurrency |
Concurrent pages | 4 |
--wait |
Seconds after load for dynamic JS | 3 |
--page-timeout |
Per-page timeout in seconds | 30 |
| Flag | Description | Default |
|---|---|---|
--headless |
Run headless | true |
--chrome-path |
Explicit Chrome/Chromium path | Auto-detect |
--user-agent |
Custom UA string | Default Chrome |
| Flag | Description | Default |
|---|---|---|
-o |
Output path or - for stdout |
- |
--format |
Output format: txt|jsonl|csv | txt |
--unique |
De-duplicate JS URLs in txt mode | true |
--js-in-scope |
Only output JS whose host matches scope | true |
--no-banner |
Disable the startup ASCII banner | false |
Uses github.com/cyinnove/logify. To adjust verbosity in code, set logify.MaxLevel early in main. A --log-level flag can be added on request.
- Linux: JSCOUT verifies Chrome/Chromium availability. If not found and interactive, it prompts for a path; otherwise it errors with install hints.
- Docker:
JSCOUT_NO_SANDBOX=1is set by default to make Chromium work as root. Unset it by overriding env if you run with a user that can use the sandbox.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ by @CyInnove