Skip to content

Claude/fix aggregator 12 sources gd8 hs#27

Merged
EthanThePhoenix38 merged 2 commits intomainfrom
claude/fix-aggregator-12-sources-Gd8HS
Jan 15, 2026
Merged

Claude/fix aggregator 12 sources gd8 hs#27
EthanThePhoenix38 merged 2 commits intomainfrom
claude/fix-aggregator-12-sources-Gd8HS

Conversation

@EthanThePhoenix38
Copy link
Member

No description provided.

Copilot AI review requested due to automatic review settings January 15, 2026 14:19
@github-actions github-actions bot added documentation Improvements or additions to documentation backend labels Jan 15, 2026
@EthanThePhoenix38 EthanThePhoenix38 merged commit 3f3c5e6 into main Jan 15, 2026
6 checks passed
@EthanThePhoenix38 EthanThePhoenix38 deleted the claude/fix-aggregator-12-sources-Gd8HS branch January 15, 2026 14:19
// Tracks clicks sent FROM AI-Pulse TO external sites
function addUTMParams(url, category = 'general') {
// Use Freedium mirror for Medium articles to bypass paywall
if (url.includes('medium.com') || url.includes('towardsdatascience.com')) {

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
medium.com
' can be anywhere in the URL, and arbitrary hosts may come before or after it.

Copilot Autofix

AI about 1 month ago

In general, instead of testing url.includes('medium.com'), the code should parse the URL and inspect its hostname (and possibly pathname) to decide whether it is actually on medium.com or towardsdatascience.com. This avoids matching attackers’ URLs that merely contain those strings in their query/path or as part of a different domain name.

The best targeted fix here is: in addUTMParams, parse the input URL via the standard URL constructor. Extract hostname, normalize it to lowercase, and check whether it is exactly medium.com or towardsdatascience.com or a subdomain of them (if desired). Only when that host check passes should we rewrite the URL to https://freedium.cloud/<original URL>. If parsing fails (invalid URL), we should fall back to the original url and just append UTM parameters. This preserves existing functionality (rewriting real Medium/TDS links and adding UTM parameters to all links) but closes the substring issue.

Concretely: modify lines 34–41 in src/aggregator.js. Introduce a small try { ... } catch block that uses new URL(url) to obtain hostname, compares it against medium.com and towardsdatascience.com with equality or endsWith('.medium.com') / endsWith('.towardsdatascience.com'). No new imports are required, since URL is a built-in global in modern Node.js. The rest of the function (building and appending utmParams) stays unchanged.

Suggested changeset 1
src/aggregator.js

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/aggregator.js b/src/aggregator.js
--- a/src/aggregator.js
+++ b/src/aggregator.js
@@ -33,8 +33,19 @@
 // Tracks clicks sent FROM AI-Pulse TO external sites
 function addUTMParams(url, category = 'general') {
   // Use Freedium mirror for Medium articles to bypass paywall
-  if (url.includes('medium.com') || url.includes('towardsdatascience.com')) {
-    url = `https://freedium.cloud/${url}`;
+  try {
+    const parsed = new URL(url);
+    const host = parsed.hostname.toLowerCase();
+    const isMedium =
+      host === 'medium.com' || host.endsWith('.medium.com');
+    const isTowardsDataScience =
+      host === 'towardsdatascience.com' || host.endsWith('.towardsdatascience.com');
+
+    if (isMedium || isTowardsDataScience) {
+      url = `https://freedium.cloud/${url}`;
+    }
+  } catch (e) {
+    // If URL parsing fails, skip Freedium rewrite and just append UTM params
   }
 
   const utmParams = `utm_source=ai-pulse&utm_medium=reader&utm_campaign=article&utm_content=${category}`;
EOF
@@ -33,8 +33,19 @@
// Tracks clicks sent FROM AI-Pulse TO external sites
function addUTMParams(url, category = 'general') {
// Use Freedium mirror for Medium articles to bypass paywall
if (url.includes('medium.com') || url.includes('towardsdatascience.com')) {
url = `https://freedium.cloud/${url}`;
try {
const parsed = new URL(url);
const host = parsed.hostname.toLowerCase();
const isMedium =
host === 'medium.com' || host.endsWith('.medium.com');
const isTowardsDataScience =
host === 'towardsdatascience.com' || host.endsWith('.towardsdatascience.com');

if (isMedium || isTowardsDataScience) {
url = `https://freedium.cloud/${url}`;
}
} catch (e) {
// If URL parsing fails, skip Freedium rewrite and just append UTM params
}

const utmParams = `utm_source=ai-pulse&utm_medium=reader&utm_campaign=article&utm_content=${category}`;
Copilot is powered by AI and may make mistakes. Always verify output.
// Tracks clicks sent FROM AI-Pulse TO external sites
function addUTMParams(url, category = 'general') {
// Use Freedium mirror for Medium articles to bypass paywall
if (url.includes('medium.com') || url.includes('towardsdatascience.com')) {

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

'
towardsdatascience.com
' can be anywhere in the URL, and arbitrary hosts may come before or after it.

Copilot Autofix

AI about 1 month ago

In general, to fix incomplete URL substring sanitization, you should parse the URL using a proper URL parser, extract the hostname, normalize it, and compare that hostname (or a well‑defined pattern, such as exact matches or specific subdomains) against an allowlist. Avoid using includes on the raw URL string.

For this specific code, the intent is to detect Medium and Towards Data Science articles and route them via Freedium. The safest approach is:

  1. Parse the URL with the global URL constructor (built into Node.js),
  2. Extract hostname,
  3. Check hostname === 'medium.com' or hostname === 'towardsdatascience.com' (and optionally known subdomains),
  4. Only then rewrite to https://freedium.cloud/${url}.

We should wrap the parsing in a try/catch so that malformed URLs do not crash the function; on failure, just skip the Freedium rewrite and proceed with appending UTM parameters. This preserves existing behavior for valid Medium/TDS URLs while preventing arbitrary hosts that merely contain those substrings from being rewritten.

Concretely, in src/aggregator.js inside addUTMParams, replace the if (url.includes(...)) block with logic that:

  • Uses new URL(url) to parse,
  • Compares parsed.hostname against a small allowlist such as ['medium.com', 'www.medium.com', 'towardsdatascience.com', 'www.towardsdatascience.com'],
  • Only rewrites when the hostname matches.

No external dependencies are required; URL is part of the standard library in modern Node.js.

Suggested changeset 1
src/aggregator.js

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/aggregator.js b/src/aggregator.js
--- a/src/aggregator.js
+++ b/src/aggregator.js
@@ -33,8 +33,21 @@
 // Tracks clicks sent FROM AI-Pulse TO external sites
 function addUTMParams(url, category = 'general') {
   // Use Freedium mirror for Medium articles to bypass paywall
-  if (url.includes('medium.com') || url.includes('towardsdatascience.com')) {
-    url = `https://freedium.cloud/${url}`;
+  try {
+    const parsedUrl = new URL(url);
+    const hostname = parsedUrl.hostname.toLowerCase();
+    const freediumHosts = new Set([
+      'medium.com',
+      'www.medium.com',
+      'towardsdatascience.com',
+      'www.towardsdatascience.com'
+    ]);
+
+    if (freediumHosts.has(hostname)) {
+      url = `https://freedium.cloud/${url}`;
+    }
+  } catch (e) {
+    // If the URL is invalid, skip Freedium rewriting and just add UTM parameters below.
   }
 
   const utmParams = `utm_source=ai-pulse&utm_medium=reader&utm_campaign=article&utm_content=${category}`;
EOF
@@ -33,8 +33,21 @@
// Tracks clicks sent FROM AI-Pulse TO external sites
function addUTMParams(url, category = 'general') {
// Use Freedium mirror for Medium articles to bypass paywall
if (url.includes('medium.com') || url.includes('towardsdatascience.com')) {
url = `https://freedium.cloud/${url}`;
try {
const parsedUrl = new URL(url);
const hostname = parsedUrl.hostname.toLowerCase();
const freediumHosts = new Set([
'medium.com',
'www.medium.com',
'towardsdatascience.com',
'www.towardsdatascience.com'
]);

if (freediumHosts.has(hostname)) {
url = `https://freedium.cloud/${url}`;
}
} catch (e) {
// If the URL is invalid, skip Freedium rewriting and just add UTM parameters below.
}

const utmParams = `utm_source=ai-pulse&utm_medium=reader&utm_campaign=article&utm_content=${category}`;
Copilot is powered by AI and may make mistakes. Always verify output.
EthanThePhoenix38 pushed a commit that referenced this pull request Feb 18, 2026
…alert #27)

Replace unsafe string matching with proper URL parsing:
- Parse URL hostname before checking for medium.com
- Prevents bypasses like http://evil-medium.com or http://evil.net/medium.com
- Add freedium.app as fallback mirror for Medium articles

Fixes: CWE-20 (Incomplete URL substring sanitization)

https://claude.ai/code/session_0138bAjho1fWwiRZju3nJFJ3
EthanThePhoenix38 added a commit that referenced this pull request Feb 18, 2026
#127)

…alert #27)

Replace unsafe string matching with proper URL parsing:
- Parse URL hostname before checking for medium.com
- Prevents bypasses like http://evil-medium.com or
http://evil.net/medium.com
- Add freedium.app as fallback mirror for Medium articles

Fixes: CWE-20 (Incomplete URL substring sanitization)

https://claude.ai/code/session_0138bAjho1fWwiRZju3nJFJ3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants