Playwright-powered crawler that visits a site from a start URL, accepts common cookie banners, forces lazy-loaded images to load, optionally stabilizes sticky headers, and saves full-page screenshots. Output is mirrored to a local folder by host and path.
- Node.js 18+ recommended
- npm (comes with Node)
-
Install dependencies:
npm install
-
Install the Playwright browser binaries (Chromium):
npx playwright install chromium
Basic run (headless, default options):
node crawl-shoot.js https://example.comScreenshots are written under shots/<host>/<path>/.../*.png by default.
Pass flags as --name=value after the start URL:
- --limit (default: 20): Max pages to capture.
- --out (default:
shots): Output directory root. - --delay (default: 400): ms to wait between page visits.
- --ignoreQuery (default: true): Strip querystrings for de-duplication.
- --width (default: 1366): Viewport width in pixels.
- --forceStickyTop (default: true): Temporarily pin a likely header to the top for the shot.
- --stickySelector (default: empty): Comma-separated CSS selectors to explicitly target a header (overrides auto-detection), e.g.
"header, .site-header". - --stickyWaitMs (default: 400): Extra settle time after forcing sticky header.
- --subtreeOnly (default: true): Only crawl within the start URL path subtree and origin.
- --normalizeGallery (default: true): Normalize common horizontal galleries so they don't add blank space.
- --headed (default: false): Run Chromium in headed mode (useful for debugging or sites that block headless).
-
Crawl up to 100 pages, save to
shots, 1440px width:node crawl-shoot.js https://example.com --limit=100 --width=1440
-
Force a specific header selector and wait longer for sticky stabilization:
node crawl-shoot.js https://example.com \ --forceStickyTop=true \ --stickySelector="header, .site-header" \ --stickyWaitMs=800 -
Run with UI (headed) and write to a custom folder:
node crawl-shoot.js https://example.com --headed=true --out=shots/example
- Files are mirrored by host and path under the output directory. Example:
shots/example.com/index.pngshots/example.com/products/index.pngshots/example.com/products/item-123.png
- If a site shows elements only after interaction, try
--headed=trueto observe behavior and adjust flags. - If cookie banners block scrolling, they are auto-accepted when possible; re-run with
--headed=trueif a site uses a custom CMP. - Keep
--ignoreQuery=trueto reduce duplicates when querystrings don’t change content. - Set
--subtreeOnly=falseto crawl the entire origin instead of only the start path subtree.
- If Chromium is missing, run:
npx playwright install chromium. - Some pages with heavy client rendering may need more time; increase
--delayand/or--stickyWaitMs. - If the detected header is wrong, supply an explicit
--stickySelector.