Skip to content

Vizioz/Site-Shooter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Site-Shooter

Playwright-powered crawler that visits a site from a start URL, accepts common cookie banners, forces lazy-loaded images to load, optionally stabilizes sticky headers, and saves full-page screenshots. Output is mirrored to a local folder by host and path.

Prerequisites

  • Node.js 18+ recommended
  • npm (comes with Node)

Setup

  1. Install dependencies:

    npm install
  2. Install the Playwright browser binaries (Chromium):

    npx playwright install chromium

Usage

Basic run (headless, default options):

node crawl-shoot.js https://example.com

Screenshots are written under shots/<host>/<path>/.../*.png by default.

Options

Pass flags as --name=value after the start URL:

  • --limit (default: 20): Max pages to capture.
  • --out (default: shots): Output directory root.
  • --delay (default: 400): ms to wait between page visits.
  • --ignoreQuery (default: true): Strip querystrings for de-duplication.
  • --width (default: 1366): Viewport width in pixels.
  • --forceStickyTop (default: true): Temporarily pin a likely header to the top for the shot.
  • --stickySelector (default: empty): Comma-separated CSS selectors to explicitly target a header (overrides auto-detection), e.g. "header, .site-header".
  • --stickyWaitMs (default: 400): Extra settle time after forcing sticky header.
  • --subtreeOnly (default: true): Only crawl within the start URL path subtree and origin.
  • --normalizeGallery (default: true): Normalize common horizontal galleries so they don't add blank space.
  • --headed (default: false): Run Chromium in headed mode (useful for debugging or sites that block headless).

Examples

  • Crawl up to 100 pages, save to shots, 1440px width:

    node crawl-shoot.js https://example.com --limit=100 --width=1440
  • Force a specific header selector and wait longer for sticky stabilization:

    node crawl-shoot.js https://example.com \
      --forceStickyTop=true \
      --stickySelector="header, .site-header" \
      --stickyWaitMs=800
  • Run with UI (headed) and write to a custom folder:

    node crawl-shoot.js https://example.com --headed=true --out=shots/example

Output structure

  • Files are mirrored by host and path under the output directory. Example:
    • shots/example.com/index.png
    • shots/example.com/products/index.png
    • shots/example.com/products/item-123.png

Tips

  • If a site shows elements only after interaction, try --headed=true to observe behavior and adjust flags.
  • If cookie banners block scrolling, they are auto-accepted when possible; re-run with --headed=true if a site uses a custom CMP.
  • Keep --ignoreQuery=true to reduce duplicates when querystrings don’t change content.
  • Set --subtreeOnly=false to crawl the entire origin instead of only the start path subtree.

Troubleshooting

  • If Chromium is missing, run: npx playwright install chromium.
  • Some pages with heavy client rendering may need more time; increase --delay and/or --stickyWaitMs.
  • If the detected header is wrong, supply an explicit --stickySelector.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published