Skip to content
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .github/actions/hydrate-localize/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: 'Hydrate Localize'
description: 'Crawl a site with Chrome to trigger Localize phrase detection'
inputs:
target_base_url:
description: 'Base URL of the docs site to crawl'
required: true
sitemap_path:
description: 'Path to the sitemap file relative to the base URL'
required: false
default: 'sitemap.xml'
runs:
using: "composite"
steps:
- name: Install dependencies
run: cd ${{ github.action_path }} && npm ci
shell: bash

- name: Set up headless Chrome
run: cd ${{ github.action_path }} && npx playwright install --with-deps chromium
shell: bash

- name: Run crawler script
run: node ${{ github.action_path }}/index.js ${{ inputs.target_base_url }} ${{ inputs.sitemap_path }}
shell: bash
env:
CRAWLEE_MEMORY_MBYTES: 8000
31 changes: 31 additions & 0 deletions .github/actions/hydrate-localize/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import { PlaywrightCrawler, Sitemap } from 'crawlee';

const targetHost = process.argv[2];
const sitemapPath = process.argv[3] || 'sitemap.xml';

const crawler = new PlaywrightCrawler({
async requestHandler({ request, page, enqueueLinks }) {
const requestHost = 'https://global.localizecdn.com';
const requestMethod = 'POST';
if (request.loadedUrl.indexOf('api-catalog') >= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot api-catalog would be a variable that takes multiple path prefixes to apply this logic to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in feb4090. Added a skip_path_prefixes input (comma-separated, defaults to api-catalog) to action.yml. In index.js it's parsed into an array and checked with pathname.startsWith(/${prefix}) — leading slashes in user-supplied prefixes are stripped to avoid double-slash matches.

// Localize is currently disabled on API catalog pages, so we
// can't wait for a request on those pages
console.log(`Not waiting for Localize on ${request.loadedUrl}`);
} else {
await page.waitForRequest(
req => new URL(req.url()).origin === requestHost && req.method() === requestMethod
);
console.log(`Done waiting for ${requestMethod} request to ${requestHost} on ${request.loadedUrl}`);
}
}
});

const { urls } = await Sitemap.load(`${targetHost}/${sitemapPath}`);

await crawler.addRequests(urls.map((url) => {
const { pathname, search, hash } = new URL(url);
return `${targetHost}${pathname}${search}${hash}`;
}));

// Run the crawler
await crawler.run();
Loading