Potential fix for code scanning alert no. 16: Incomplete URL substring sanitization#28
Conversation
…g sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This pull request addresses a security code scanning alert (#16) related to incomplete URL substring sanitization in the addUTMParams function. The fix replaces unsafe string matching with proper URL parsing to prevent malicious URLs from being incorrectly rewritten to the Freedium mirror.
Changes:
- Added import for Node.js built-in
URLclass for proper URL parsing - Replaced substring-based URL checking with hostname-based validation using an explicit allowlist
- Wrapped the Freedium rewrite logic in a try-catch block to handle malformed URLs gracefully
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const mediumHosts = [ | ||
| 'medium.com', | ||
| 'www.medium.com', | ||
| 'towardsdatascience.com', | ||
| 'www.towardsdatascience.com' | ||
| ]; |
There was a problem hiding this comment.
The mediumHosts array is defined inside the function on every call. Consider moving this to a constant at the module level (similar to FEED_CATEGORIES) to avoid unnecessary array creation on each invocation, improving performance and maintainability.
| 'www.towardsdatascience.com' | ||
| ]; | ||
|
|
||
| if (mediumHosts.includes(hostname)) { |
There was a problem hiding this comment.
The allowlist only includes exact domains but doesn't handle Medium's custom subdomain pattern. Medium allows users to publish on custom subdomains (e.g., username.medium.com or publication.medium.com). Consider checking if the hostname ends with .medium.com to catch all legitimate Medium subdomains, while still preventing path-based attacks.
| if (mediumHosts.includes(hostname)) { | |
| if (mediumHosts.includes(hostname) || hostname.endsWith('.medium.com')) { |
| // Rewrite only genuine Medium/TDS URLs to Freedium | ||
| url = `https://freedium.cloud/${url}`; | ||
| } | ||
| } catch (e) { |
There was a problem hiding this comment.
The error handling silently swallows all exceptions without logging. This makes debugging difficult if URL parsing fails. Consider logging the error (at minimum during development) or documenting why silent failure is acceptable here. A malformed URL error could indicate a data quality issue that should be investigated.
| } catch (e) { | |
| } catch (e) { | |
| if (process.env.NODE_ENV !== 'production') { | |
| console.error('addUTMParams: Failed to parse URL for UTM handling:', url, e); | |
| } |
| @@ -3,6 +3,7 @@ const Parser = require('rss-parser'); | |||
| const axios = require('axios'); | |||
There was a problem hiding this comment.
Unused variable axios.
| const axios = require('axios'); |
| @@ -3,6 +3,7 @@ const Parser = require('rss-parser'); | |||
| const axios = require('axios'); | |||
| const { Octokit } = require('@octokit/rest'); | |||
There was a problem hiding this comment.
Unused variable Octokit.
| const { Octokit } = require('@octokit/rest'); |
Potential fix for https://github.com/ThePhoenixAgency/AI-Pulse/security/code-scanning/16
In general, substring matching on the full URL string should be replaced by checks on the parsed URL’s hostname, combined with an explicit allowlist and precise matching (e.g., exact host or specific known mirrors). For this particular function, we should parse the URL, inspect the
hostname, and only apply the Freedium rewrite if the hostname is exactlymedium.comortowardsdatascience.com(or known subdomains likewww.medium.comif desired). This prevents URLs wheremedium.comonly appears in the path, query, or as part of another domain from being rewritten.The best minimal fix inside
src/aggregator.jsis:urlparsing (or use the WHATWG URL class) without changing existing imports.addUTMParams, parse the incomingurlinto a URL object, safely extracthostname, and use strict equality against a small allowlist of Medium-related domains.Concretely:
const { URL } = require('url');at the top ofsrc/aggregator.jsafter existing requires (anywhere among them).addUTMParamsso that:new URL(url), catching errors; on parse failure, skips the Freedium logic and just appends UTM params.ALLOWED_MEDIUM_HOSTSarray like['medium.com', 'www.medium.com', 'towardsdatascience.com', 'www.towardsdatascience.com'].parsed.hostnameis in that list, rewritesurltohttps://freedium.cloud/+ the original URL string (or a normalized version).This keeps all existing functionality (Freedium rewrite + UTM parameters) but removes the incomplete substring sanitization.
Suggested fixes powered by Copilot Autofix. Review carefully before merging.