🔒 Security Fix: SQL Injection Vulnerabilities in Database Clients#52
🔒 Security Fix: SQL Injection Vulnerabilities in Database Clients#52ivosetyadi wants to merge 1 commit intobagofwords1:mainfrom
Conversation
Fixed SQL injection vulnerabilities in three database client implementations where user-controlled values were being interpolated directly into SQL statements without proper escaping or parameterization: 1. aws_redshift_client.py: Schema names in SET search_path statement - Now uses psycopg2.sql.Identifier for safe identifier quoting 2. duckdb_client.py: Multiple injection points including: - View/table names in CREATE VIEW, DESCRIBE, and SELECT statements - Added _quote_identifier() helper method for proper DuckDB identifier quoting 3. postgresql_client.py: Schema names in SET search_path statement - Now uses psycopg2.sql.Identifier with proper list handling These vulnerabilities could have allowed attackers to execute arbitrary SQL commands if they could control database/schema/table names. All identifiers are now properly quoted to prevent injection attacks. Security Impact: HIGH - Prevents potential data breach and unauthorized database access 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
@claude review |
There was a problem hiding this comment.
Pull request overview
This pull request addresses critical SQL injection vulnerabilities in three database client implementations by implementing proper identifier quoting and escaping mechanisms to prevent malicious SQL injection through schema, table, and view names.
Key Changes:
- Added psycopg2.sql.Identifier-based quoting for PostgreSQL and AWS Redshift schema names in SET search_path statements
- Implemented custom
_quote_identifier()method in DuckDB client to safely escape table and view names - Applied identifier quoting across 7 locations in DuckDB client where table/view names are used in SQL statements
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
backend/app/data_sources/clients/postgresql_client.py |
Replaced string concatenation with psycopg2.sql.Identifier for search_path to prevent schema name SQL injection |
backend/app/data_sources/clients/duckdb_client.py |
Added _quote_identifier() helper method and applied it to all table/view name usages in CREATE VIEW, DESCRIBE, and SELECT statements |
backend/app/data_sources/clients/aws_redshift_client.py |
Replaced string concatenation with psycopg2.sql.Identifier for search_path with proper transaction commit |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Use parameterized query with identifier quoting to prevent SQL injection | ||
| from psycopg2 import sql | ||
| cursor.execute( | ||
| sql.SQL("SET search_path TO {}").format(sql.Identifier(self.schema)) | ||
| ) |
There was a problem hiding this comment.
The SQL injection fix for search_path should be covered by tests. Consider adding test cases that verify schema names containing special characters or SQL injection attempts are properly escaped using psycopg2.sql.Identifier and cannot be exploited.
| search_path_sql = psql.SQL("SET search_path TO {}").format( | ||
| psql.SQL(", ").join(identifiers) | ||
| ) | ||
| cursor.execute(search_path_sql) |
There was a problem hiding this comment.
The SET search_path command is executed on the raw psycopg2 connection without an explicit commit. In psycopg2, when not in autocommit mode, SET commands need to be committed to take effect. Consider adding raw_conn.commit() after line 65 to ensure the search_path setting persists for subsequent queries, similar to the approach used in the AWS Redshift client.
| cursor.execute(search_path_sql) | |
| cursor.execute(search_path_sql) | |
| # Commit so that the SET search_path takes effect when autocommit is off | |
| raw_conn.commit() |
| def _quote_identifier(self, identifier: str) -> str: | ||
| """Quote an identifier (table/view/column name) to prevent SQL injection. | ||
|
|
||
| In DuckDB, identifiers are quoted with double quotes, and any double quotes | ||
| within the identifier are escaped by doubling them. | ||
| """ | ||
| if identifier is None: | ||
| raise ValueError("Identifier cannot be None") | ||
| # Escape double quotes by doubling them | ||
| escaped = str(identifier).replace('"', '""') | ||
| return f'"{escaped}"' |
There was a problem hiding this comment.
The new _quote_identifier method and its security implications should be covered by tests. Consider adding test cases that verify identifiers containing special characters (like double quotes, semicolons, or SQL keywords) are properly escaped and cannot be used for SQL injection. For example, test with table names like 'test"; DROP TABLE users; --' to ensure the quoting mechanism prevents injection attacks.
| # Use proper identifier quoting to prevent SQL injection | ||
| # PostgreSQL requires identifiers to be quoted individually | ||
| from psycopg2 import sql as psql | ||
| try: | ||
| conn.execute(text(f"SET search_path TO {search_path}")) | ||
| # Get the raw psycopg2 connection from SQLAlchemy | ||
| raw_conn = conn.connection.driver_connection | ||
| with raw_conn.cursor() as cursor: | ||
| # Build a list of quoted identifiers | ||
| identifiers = [psql.Identifier(schema) for schema in self._schemas] | ||
| # Join them with commas | ||
| search_path_sql = psql.SQL("SET search_path TO {}").format( | ||
| psql.SQL(", ").join(identifiers) | ||
| ) | ||
| cursor.execute(search_path_sql) |
There was a problem hiding this comment.
The SQL injection fix for search_path should be covered by tests. Consider adding test cases that verify schema names containing special characters or SQL injection attempts (e.g., schemas like 'public; DROP TABLE users; --') are properly escaped using psycopg2.sql.Identifier and cannot be exploited.
🔒 Security Fix: SQL Injection Vulnerabilities in Database Clients
Summary
Fixed critical SQL injection vulnerabilities in three database client implementations where user-controlled values were being interpolated directly into SQL statements without proper escaping or parameterization.
Vulnerabilities Fixed
1. AWS Redshift Client (
aws_redshift_client.py)SET search_pathstatement were concatenated without escapingpsycopg2.sql.Identifierfor safe identifier quoting2. DuckDB Client (
duckdb_client.py)CREATE VIEWstatementsDESCRIBEstatementsSELECTstatements_quote_identifier()helper method that properly escapes identifiers using DuckDB double-quote syntax3. PostgreSQL Client (
postgresql_client.py)SET search_pathconcatenated without escapingpsycopg2.sql.Identifierwith proper list handling for multiple schemasSecurity Impact
Severity: HIGH
These vulnerabilities could have allowed attackers to:
Attack vector required control over database/schema/table names, which could occur through:
Testing
Changes
Diff Summary
Checklist