Skip to content

Add tests for external table functionality with Parquet files#23653

Open
flypiggyyoyoyo wants to merge 11 commits intomatrixorigin:mainfrom
flypiggyyoyoyo:issue-#12062
Open

Add tests for external table functionality with Parquet files#23653
flypiggyyoyoyo wants to merge 11 commits intomatrixorigin:mainfrom
flypiggyyoyoyo:issue-#12062

Conversation

@flypiggyyoyoyo
Copy link
Collaborator

@flypiggyyoyoyo flypiggyyoyoyo commented Feb 3, 2026

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #12062

What this PR does / why we need it:

Add tests for external table functionality with Parquet files.


PR Type

Tests


Description

  • Add comprehensive end-to-end tests for Parquet external table functionality

  • Test basic data types, compression formats, and Parquet version compatibility

  • Validate NULL handling, decimal types, binary types, and edge cases

  • Verify query operations (WHERE, GROUP BY, JOIN, subqueries) on external tables

  • Ensure external tables are read-only (INSERT/UPDATE/DELETE operations fail)


Diagram Walkthrough

flowchart LR
  A["Test Suite"] --> B["Basic Types & Iris Dataset"]
  A --> C["Compression Formats"]
  A --> D["Parquet Versions"]
  A --> E["NULL & Decimal Handling"]
  A --> F["Binary Types & Edge Cases"]
  A --> G["Query Operations"]
  A --> H["Read-Only Validation"]
  B --> I["SQL Test File"]
  C --> I
  D --> I
  E --> I
  F --> I
  G --> I
  H --> I
  I --> J["Result File"]
Loading

File Walkthrough

Relevant files
Tests
external_table_parquet.sql
Comprehensive Parquet external table test suite                   

test/distributed/cases/table/external_table_parquet.sql

  • Create comprehensive test suite with 12 test sections covering Parquet
    external tables
  • Test basic types (INT, BIGINT, FLOAT, DOUBLE, VARCHAR, BOOL, DATE,
    TIME)
  • Validate multiple compression formats (NONE, SNAPPY, GZIP, LZ4, ZSTD,
    BROTLI)
  • Test Parquet v1 (legacy) and v2 (modern) format compatibility
  • Verify NULL handling, DECIMAL types, BINARY types, and empty files
  • Test query features (WHERE, GROUP BY, HAVING, ORDER BY, JOIN,
    subqueries)
  • Validate read-only constraints (INSERT, UPDATE, DELETE should fail)
+341/-0 
external_table_parquet.result
Expected test results for Parquet external tables               

test/distributed/cases/table/external_table_parquet.result

  • Provide expected output results for all test cases in the SQL test
    file
  • Include result sets for basic type queries, compression format tests,
    and version compatibility
  • Show NULL handling results, decimal precision outputs, and binary data
    hex representations
  • Demonstrate query operation results (filtering, aggregation, joins,
    subqueries)
  • Validate error messages for INSERT/UPDATE/DELETE operations on
    external tables
  • Include metadata information (column types, precision, scale) in
    result output
+341/-0 

@qodo-code-review
Copy link

qodo-code-review bot commented Feb 3, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@matrix-meow matrix-meow added the size/L Denotes a PR that changes [500,999] lines label Feb 3, 2026
@mergify mergify bot added the kind/test-ci label Feb 3, 2026
@qodo-code-review
Copy link

qodo-code-review bot commented Feb 3, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Remove redundant table drop statements

Remove the redundant DROP TABLE statements at the end of the test script. The
final DROP DATABASE ext_parquet_db command is sufficient for cleanup.

test/distributed/cases/table/external_table_parquet.sql [317-341]

 -- ============================================================================
 -- Cleanup
 -- ============================================================================
 
-drop table if exists ext_supported_types;
-drop table if exists ext_iris;
-drop table if exists ext_none;
-drop table if exists ext_snappy;
-drop table if exists ext_gzip;
-drop table if exists ext_lz4;
-drop table if exists ext_zstd;
-drop table if exists ext_brotli;
-drop table if exists ext_v1;
-drop table if exists ext_v2;
-drop table if exists ext_nullable;
-drop table if exists ext_decimal;
-drop table if exists ext_binary;
-drop table if exists ext_empty;
-drop table if exists variety_info;
-drop table if exists ext_with_stats;
-drop table if exists ext_no_stats;
-drop table if exists ext_wide;
-drop table if exists ext_wildcard_v;
-drop table if exists ext_readonly;
-drop database ext_parquet_db;
+drop database if exists ext_parquet_db;
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: This is a valid suggestion that improves code conciseness and maintainability by removing redundant DROP TABLE statements, as the final DROP DATABASE handles the cleanup.

Low
Possible issue
Terminate header with delimiter

Add the row-delimiter marker (𝄀) to the header line for the ext_empty table
query to ensure consistent formatting across all result headers.

test/distributed/cases/table/external_table_parquet.result [188]

-➤ id[-5,64,0]  ¦  name[12,-1,0]
+➤ id[-5,64,0]  ¦  name[12,-1,0]  𝄀
  • Apply / Chat
Suggestion importance[1-10]: 4

__

Why: This suggestion correctly identifies an inconsistency in the test result file where a header line is missing its trailing delimiter, improving the file's consistency.

Low
  • Update

@mergify mergify bot added the queued label Feb 6, 2026
@mergify
Copy link
Contributor

mergify bot commented Feb 6, 2026

Merge Queue Status

🚫 The pull request has left the queue (rule: main) at 0544fab

This pull request spent 12 minutes 20 seconds in the queue, with no time running CI.

Reason

The pull request #23653 has been manually updated

Hint

If you want to requeue this pull request, you can post a @mergifyio requeue comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/test-ci Review effort 2/5 size/L Denotes a PR that changes [500,999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants