Potential fix for code scanning alert no. 11: Incomplete multi-character sanitization#12
Conversation
…ter sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f29c13b85f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "rss-parser": "^3.13.0", | ||
| "sanitize-html": "^2.17.0" |
There was a problem hiding this comment.
Update lockfile for new sanitize-html dependency
This commit adds sanitize-html to package.json but does not update package-lock.json. In any environment that installs dependencies via the lockfile (e.g., npm ci in CI/CD or deployments), sanitize-html will not be installed and the new require('sanitize-html') in src/aggregator.js will throw MODULE_NOT_FOUND at runtime. Please regenerate the lockfile so it includes the new dependency.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR addresses a code scanning alert regarding incomplete multi-character sanitization by replacing regex-based HTML tag removal with a dedicated HTML sanitization library. The fix targets a security vulnerability where the regex pattern /<[^>]*>/g could be bypassed by malformed HTML.
Key changes:
- Added
sanitize-htmllibrary as a new dependency - Implemented a
sanitizeTexthelper function that usessanitize-htmlwith strict configuration - Replaced regex-based HTML stripping in
article.contentSnippetandarticle.titleprocessing
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| package.json | Added sanitize-html version 2.17.0 as a new dependency |
| src/aggregator.js | Added import for sanitize-html, implemented sanitizeText helper function, and replaced regex-based tag removal with sanitizeText calls |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "rss-parser": "^3.13.0", | ||
| "sanitize-html": "^2.17.0" |
There was a problem hiding this comment.
The project already includes dompurify and isomorphic-dompurify as dependencies. Adding sanitize-html creates redundancy with three different HTML sanitization libraries. Consider using one of the existing libraries (particularly isomorphic-dompurify which works in both browser and Node.js environments) instead of introducing a third sanitization library. This would reduce bundle size, dependency complexity, and potential security surface.
| "rss-parser": "^3.13.0", | |
| "sanitize-html": "^2.17.0" | |
| "rss-parser": "^3.13.0" |
| const Parser = require('rss-parser'); | ||
| const axios = require('axios'); | ||
| const { Octokit } = require('@octokit/rest'); | ||
| const sanitizeHtml = require('sanitize-html'); |
There was a problem hiding this comment.
Consider using the existing isomorphic-dompurify library instead of adding sanitize-html. The project already has DOMPurify available, which can strip HTML tags effectively using DOMPurify.sanitize(input, {ALLOWED_TAGS: []}). This would maintain consistency with the sanitization approach used in the HTML files (portfolio.html, reader.html, stats.html) and reduce the number of dependencies.
Potential fix for https://github.com/ThePhoenixAgency/AI-Pulse/security/code-scanning/11
In general, the fix is to stop using an ad‑hoc regex to “remove HTML tags” and instead use a well-tested, HTML‑aware sanitization/parsing approach. For this code, the best low-impact change is to replace the regex-based
.replace(/<[^>]*>/g, '')with a call to a dedicated HTML-sanitizing/stripping library that reliably removes or escapes unsafe content, including<script>and other dangerous constructs.Concretely, in
src/aggregator.js:sanitize-html.sanitizeText, that:sanitizeHtmlwith a configuration that strips all tags and attributes, returning only safe text.article.contentSnippet.article.title, replacing the inlined regex with the helper before slicing to 200 chars.This keeps the existing functionality (getting a plain-text-ish title/summary) but implements it in a robust way that avoids incomplete multi-character sanitization.
You will need:
src/aggregator.jsforsanitize-html.sanitizeTextfunction defined near the other helpers.sanitizeText.Suggested fixes powered by Copilot Autofix. Review carefully before merging.