Claude/fix aggregator 12 sources gd8 hs#27
Conversation
| // Tracks clicks sent FROM AI-Pulse TO external sites | ||
| function addUTMParams(url, category = 'general') { | ||
| // Use Freedium mirror for Medium articles to bypass paywall | ||
| if (url.includes('medium.com') || url.includes('towardsdatascience.com')) { |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, instead of testing url.includes('medium.com'), the code should parse the URL and inspect its hostname (and possibly pathname) to decide whether it is actually on medium.com or towardsdatascience.com. This avoids matching attackers’ URLs that merely contain those strings in their query/path or as part of a different domain name.
The best targeted fix here is: in addUTMParams, parse the input URL via the standard URL constructor. Extract hostname, normalize it to lowercase, and check whether it is exactly medium.com or towardsdatascience.com or a subdomain of them (if desired). Only when that host check passes should we rewrite the URL to https://freedium.cloud/<original URL>. If parsing fails (invalid URL), we should fall back to the original url and just append UTM parameters. This preserves existing functionality (rewriting real Medium/TDS links and adding UTM parameters to all links) but closes the substring issue.
Concretely: modify lines 34–41 in src/aggregator.js. Introduce a small try { ... } catch block that uses new URL(url) to obtain hostname, compares it against medium.com and towardsdatascience.com with equality or endsWith('.medium.com') / endsWith('.towardsdatascience.com'). No new imports are required, since URL is a built-in global in modern Node.js. The rest of the function (building and appending utmParams) stays unchanged.
| @@ -33,8 +33,19 @@ | ||
| // Tracks clicks sent FROM AI-Pulse TO external sites | ||
| function addUTMParams(url, category = 'general') { | ||
| // Use Freedium mirror for Medium articles to bypass paywall | ||
| if (url.includes('medium.com') || url.includes('towardsdatascience.com')) { | ||
| url = `https://freedium.cloud/${url}`; | ||
| try { | ||
| const parsed = new URL(url); | ||
| const host = parsed.hostname.toLowerCase(); | ||
| const isMedium = | ||
| host === 'medium.com' || host.endsWith('.medium.com'); | ||
| const isTowardsDataScience = | ||
| host === 'towardsdatascience.com' || host.endsWith('.towardsdatascience.com'); | ||
|
|
||
| if (isMedium || isTowardsDataScience) { | ||
| url = `https://freedium.cloud/${url}`; | ||
| } | ||
| } catch (e) { | ||
| // If URL parsing fails, skip Freedium rewrite and just append UTM params | ||
| } | ||
|
|
||
| const utmParams = `utm_source=ai-pulse&utm_medium=reader&utm_campaign=article&utm_content=${category}`; |
| // Tracks clicks sent FROM AI-Pulse TO external sites | ||
| function addUTMParams(url, category = 'general') { | ||
| // Use Freedium mirror for Medium articles to bypass paywall | ||
| if (url.includes('medium.com') || url.includes('towardsdatascience.com')) { |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, to fix incomplete URL substring sanitization, you should parse the URL using a proper URL parser, extract the hostname, normalize it, and compare that hostname (or a well‑defined pattern, such as exact matches or specific subdomains) against an allowlist. Avoid using includes on the raw URL string.
For this specific code, the intent is to detect Medium and Towards Data Science articles and route them via Freedium. The safest approach is:
- Parse the URL with the global
URLconstructor (built into Node.js), - Extract
hostname, - Check
hostname === 'medium.com'orhostname === 'towardsdatascience.com'(and optionally known subdomains), - Only then rewrite to
https://freedium.cloud/${url}.
We should wrap the parsing in a try/catch so that malformed URLs do not crash the function; on failure, just skip the Freedium rewrite and proceed with appending UTM parameters. This preserves existing behavior for valid Medium/TDS URLs while preventing arbitrary hosts that merely contain those substrings from being rewritten.
Concretely, in src/aggregator.js inside addUTMParams, replace the if (url.includes(...)) block with logic that:
- Uses
new URL(url)to parse, - Compares
parsed.hostnameagainst a small allowlist such as['medium.com', 'www.medium.com', 'towardsdatascience.com', 'www.towardsdatascience.com'], - Only rewrites when the hostname matches.
No external dependencies are required; URL is part of the standard library in modern Node.js.
| @@ -33,8 +33,21 @@ | ||
| // Tracks clicks sent FROM AI-Pulse TO external sites | ||
| function addUTMParams(url, category = 'general') { | ||
| // Use Freedium mirror for Medium articles to bypass paywall | ||
| if (url.includes('medium.com') || url.includes('towardsdatascience.com')) { | ||
| url = `https://freedium.cloud/${url}`; | ||
| try { | ||
| const parsedUrl = new URL(url); | ||
| const hostname = parsedUrl.hostname.toLowerCase(); | ||
| const freediumHosts = new Set([ | ||
| 'medium.com', | ||
| 'www.medium.com', | ||
| 'towardsdatascience.com', | ||
| 'www.towardsdatascience.com' | ||
| ]); | ||
|
|
||
| if (freediumHosts.has(hostname)) { | ||
| url = `https://freedium.cloud/${url}`; | ||
| } | ||
| } catch (e) { | ||
| // If the URL is invalid, skip Freedium rewriting and just add UTM parameters below. | ||
| } | ||
|
|
||
| const utmParams = `utm_source=ai-pulse&utm_medium=reader&utm_campaign=article&utm_content=${category}`; |
…alert #27) Replace unsafe string matching with proper URL parsing: - Parse URL hostname before checking for medium.com - Prevents bypasses like http://evil-medium.com or http://evil.net/medium.com - Add freedium.app as fallback mirror for Medium articles Fixes: CWE-20 (Incomplete URL substring sanitization) https://claude.ai/code/session_0138bAjho1fWwiRZju3nJFJ3
#127) …alert #27) Replace unsafe string matching with proper URL parsing: - Parse URL hostname before checking for medium.com - Prevents bypasses like http://evil-medium.com or http://evil.net/medium.com - Add freedium.app as fallback mirror for Medium articles Fixes: CWE-20 (Incomplete URL substring sanitization) https://claude.ai/code/session_0138bAjho1fWwiRZju3nJFJ3
No description provided.