Potential fix for code scanning alert no. 12: Incomplete multi-character sanitization by EthanThePhoenix38 · Pull Request #11 · ThePhoenixAgency/AI-Pulse

EthanThePhoenix38 · 2026-01-08T19:55:59Z

Potential fix for https://github.com/ThePhoenixAgency/AI-Pulse/security/code-scanning/12

In general, the safest fix is to stop doing ad-hoc HTML sanitization with hand-written regexes and instead either: (a) rely on proper HTML escaping at the sink, or (b) use a well-tested sanitization/escaping library when you need HTML-safe output. Here, the analyzer specifically worries about <script surviving the regex; we can mitigate this within the shown snippet by (1) using a more robust sanitization approach for article.title and rawSummary, or (2) explicitly neutralizing any < / > characters that remain after the current regex.

Given the constraints (only editing this file, minimal behavior change, and no knowledge of how titles are rendered later), the safest local change is to ensure the resulting title and summary cannot introduce HTML elements at all. We can do this by HTML-escaping <, >, and & in the text after stripping tags, which preserves the visible text but makes it safe for insertion into HTML. Concretely:

Factor out a small helper htmlEscape that replaces &, <, and > with their HTML entities. This uses only standard JS, no extra imports.
Use this helper in sanitizeArticle:
- For rawSummary: keep the existing replace(/<[^>]*>/g, '') and then htmlEscape the result before passing to smartTruncate.
- For title: apply the same replace and then htmlEscape, then slice(0, 200).
This ensures that even if something like <script or stray < characters remain or are introduced by edge cases, they will be rendered as text and not interpreted as HTML.

All changes are within src/aggregator.js, near sanitizeArticle. We add the helper above sanitizeArticle and update the lines building rawSummary and title accordingly.

Suggested fixes powered by Copilot Autofix. Review carefully before merging.

…ter sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

src/aggregator.js

EthanThePhoenix38 · 2026-01-08T20:33:21Z

Il faut tout corriger.

EthanThePhoenix38 · 2026-01-08T20:34:19Z

@copilot Corrige tout

Copilot · 2026-01-08T20:34:27Z

@EthanThePhoenix38 I've opened a new pull request, #17, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot

Pull request overview

This PR attempts to fix a code scanning alert (#12) about incomplete multi-character sanitization by adding HTML escaping to article titles and summaries in the RSS feed aggregation pipeline. The change introduces a new htmlEscape function that converts &, <, and > to their HTML entity equivalents (&, <, >) and applies this escaping to the data layer.

Key changes:

Added htmlEscape helper function to convert special HTML characters to entities
Applied HTML escaping to article titles and content snippets after tag removal
Modified the sanitization flow to escape HTML entities at the data storage layer

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-08T20:36:50Z

src/aggregator.js

+/**
+ * HTML-escape a string so it is safe to insert into HTML contexts.
+ * Converts &, <, and > to their corresponding entities.
+ * @param {string} input
+ * @returns {string}
+ */
+function htmlEscape(input) {
+  if (!input) {
+    return '';
+  }
+  return input
+    .replace(/&/g, '&amp;')
+    .replace(/</g, '&lt;')
+    .replace(/>/g, '&gt;');
+}
+
 // Sanitize and process articles
 function sanitizeArticle(article, sourceName, tags, category) {
+  const rawSummary = htmlEscape(
+    article.contentSnippet?.replace(/<[^>]*>/g, '') || ''
+  );
+
+  return {
+    title: htmlEscape(article.title?.replace(/<[^>]*>/g, '') || '').slice(0, 200) || 'Untitled',


HTML-escaping the title and summary at the data layer causes issues in multiple contexts where the data is used:

Markdown generation (line 199-201): HTML entities like <, >, and & will appear as literal text in the generated README markdown, making content harder to read.

LinkedIn posts (line 249): If re-enabled, the LinkedIn API would receive HTML entities in the post text, which would be displayed to users as < instead of <.

Reader HTML (reader.html:469): Uses textContent to insert the title, which means HTML entities would be double-escaped and shown to users literally.

Stats HTML (stats.html:436): Uses DOMPurify which expects raw text, not pre-escaped text, leading to visible entities.

The original regex-based sanitization approach (removing HTML tags) was more appropriate for this use case. If HTML injection is a concern, it should be addressed at the presentation layer (where it's already done with DOMPurify and textContent), not at the data storage layer. Consider reverting the htmlEscape calls and relying on the existing output-side protections.

Suggested change

/**

* HTML-escape a string so it is safe to insert into HTML contexts.

* Converts &, <, and > to their corresponding entities.

* @param {string} input

* @returns {string}

*/

function htmlEscape(input) {

if (!input) {

return '';

}

return input

.replace(/&/g, '&')

.replace(/</g, '<')

.replace(/>/g, '>');

}

// Sanitize and process articles

function sanitizeArticle(article, sourceName, tags, category) {

const rawSummary = htmlEscape(

article.contentSnippet?.replace(/<[^>]*>/g, '') || ''

);

return {

title: htmlEscape(article.title?.replace(/<[^>]*>/g, '') || '').slice(0, 200) || 'Untitled',

// Note: HTML escaping is intentionally handled at the presentation layer

// (e.g., via textContent and DOMPurify). At the data layer we only strip

// HTML tags (see sanitizeText) and keep the underlying text unescaped.

// Sanitize and process articles

function sanitizeArticle(article, sourceName, tags, category) {

Potential fix for code scanning alert no. 12: Incomplete multi-charac…

bb179be

…ter sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

github-advanced-security bot found potential problems Jan 8, 2026

View reviewed changes

src/aggregator.js Fixed Show fixed Hide fixed

src/aggregator.js Fixed Show fixed Hide fixed

EthanThePhoenix38 marked this pull request as ready for review January 8, 2026 20:33

Copilot AI review requested due to automatic review settings January 8, 2026 20:33

Copilot started reviewing on behalf of EthanThePhoenix38 January 8, 2026 20:33 View session

Copilot AI mentioned this pull request Jan 8, 2026

[WIP] Fix incomplete multi-character sanitization issue #17

Merged

8 tasks

Merge branch 'main' into alert-autofix-12

a19303c

github-actions bot added the backend label Jan 8, 2026

EthanThePhoenix38 merged commit aef78d4 into main Jan 8, 2026
5 checks passed

EthanThePhoenix38 deleted the alert-autofix-12 branch January 8, 2026 20:36

Copilot AI reviewed Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential fix for code scanning alert no. 12: Incomplete multi-character sanitization#11

Potential fix for code scanning alert no. 12: Incomplete multi-character sanitization#11
EthanThePhoenix38 merged 2 commits intomainfrom
alert-autofix-12

EthanThePhoenix38 commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

EthanThePhoenix38 commented Jan 8, 2026

Uh oh!

EthanThePhoenix38 commented Jan 8, 2026

Uh oh!

Copilot AI commented Jan 8, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

EthanThePhoenix38 commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

EthanThePhoenix38 commented Jan 8, 2026

Uh oh!

EthanThePhoenix38 commented Jan 8, 2026

Uh oh!

Copilot AI commented Jan 8, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants