Skip to content

[Feature](paimon) Refactor Paimon system tables to use native table execution path#60556

Open
suxiaogang223 wants to merge 15 commits intoapache:masterfrom
suxiaogang223:refact-sys-table
Open

[Feature](paimon) Refactor Paimon system tables to use native table execution path#60556
suxiaogang223 wants to merge 15 commits intoapache:masterfrom
suxiaogang223:refact-sys-table

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Feb 5, 2026

What problem does this PR solve?

Summary

  • Refactor Paimon system tables ($snapshots, $files, $schemas, $partitions, etc.) to use the native table execution path (PaimonScanNode) instead of the TVF path (MetadataScanNode / paimon_meta())
  • Introduce a SysTable type hierarchy (NativeSysTable / TvfSysTable) and a centralized SysTableResolver to cleanly separate native vs TVF execution paths
  • Remove paimon_meta TVF, PaimonSysTableJniScanner, and PaimonSysTableJniReader — all Paimon system table queries now go through the unified PaimonScanNode + PaimonJniScanner path

Motivation

Previously, Paimon system tables were queried via a Table-Valued Function (paimon_meta()), which:

  • Required a separate Java scanner (PaimonSysTableJniScanner) and C++ reader (PaimonSysTableJniReader) dedicated to system tables
  • Went through MetadataScanNode instead of PaimonScanNode, missing optimizations available in the native path (predicate pushdown, projection pushdown, etc.)
  • Created a divergent code path from regular Paimon table queries
  • Made SysTable tightly coupled with the TVF execution model, making it hard to add new system table types

Architecture After Refactoring

SysTable Type Hierarchy

SysTable (base)
├── NativeSysTable (abstract)  — uses FileQueryScanNode (e.g., PaimonScanNode)
│   └── PaimonSysTable         — all Paimon system tables (snapshots, files, binlog, ...)
│                                dynamically loaded from Paimon SDK SystemTableLoader
└── TvfSysTable (abstract)     — uses MetadataScanNode via TVF
    ├── IcebergSysTable         — all Iceberg metadata tables (snapshots, history, manifests, ...)
    │                            dynamically loaded from Iceberg MetadataTableType
    └── PartitionsSysTable      — Hive partitions (singleton, uses partition_values TVF)

Each table type declares its supported system tables via Map<String, SysTable>:

  • PaimonExternalTable.getSupportedSysTables()PaimonSysTable.SUPPORTED_SYS_TABLES
  • IcebergExternalTable.getSupportedSysTables()IcebergSysTable.SUPPORTED_SYS_TABLES
  • HMSExternalTable.getSupportedSysTables() → varies by DLAType (HIVE/ICEBERG)

Query Execution Flow

User: SELECT * FROM table$snapshots

BindRelation.handleMetaTable()
  └→ SysTableResolver.resolveForPlan(table, ctl, db, "table$snapshots")
      └→ table.findSysTable("table$snapshots")  — O(1) map lookup
          └→ getSupportedSysTables().get("snapshots")

      ┌─ NativeSysTable (Paimon) ─────────────────────────────────────┐
      │  PaimonSysTable.createSysExternalTable(sourceTable)           │
      │  → new PaimonSysExternalTable(sourceTable, "snapshots")       │
      │  → return LogicalFileScan(sysExternalTable)                   │
      │                                                               │
      │  Execution: PaimonScanNode                                    │
      │    PaimonSource.resolvePaimonTable()                          │
      │      → PaimonSysExternalTable.getSysPaimonTable()             │
      │      → catalog.getPaimonTable(nameMapping, "main", "snapshots")│
      │    getSplits():                                               │
      │      DataSplit   → native reader (ORC/Parquet) or JNI        │
      │      non-DataSplit → JNI reader (PaimonJniScanner)            │
      └───────────────────────────────────────────────────────────────┘

      ┌─ TvfSysTable (Iceberg/Hive) ─────────────────────────────────┐
      │  TvfSysTable.createFunction(ctl, db, "table$snapshots")      │
      │  → return LogicalTVFRelation(tvf)                             │
      │                                                               │
      │  Execution: MetadataScanNode (unchanged)                      │
      └───────────────────────────────────────────────────────────────┘

DESCRIBE Flow

User: DESCRIBE table$snapshots

DescribeCommand.doRun()
  └→ SysTableResolver.resolveForDescribe(table, ctl, db, "table$snapshots")
      ┌─ NativeSysTable → sysExternalTable.getFullSchema()
      │    columns derived from Paimon system table rowType
      └─ TvfSysTable → tvfRef (TableValuedFunctionRefInfo)

Key New Classes

Class Purpose
SysTable Base class: system table name, suffix, matching logic
NativeSysTable Abstract: useNativeTablePath()=true, factory method createSysExternalTable()
TvfSysTable Abstract: useNativeTablePath()=false, factory methods createFunction()/createFunctionRef()
PaimonSysTable Concrete: Paimon system table registry, loaded from SystemTableLoader.SYSTEM_TABLES
SysTableResolver Central resolver with resolveForPlan() / resolveForDescribe() / validateForQuery()
PaimonSysExternalTable ExternalTable wrapper: lazy-loads Paimon system table, derives schema from rowType

Key Modified Classes

Class Change
TableIf getSupportedSysTables() returns Map<String, SysTable> (was List); added findSysTable() for O(1) lookup
BindRelation handleMetaTable() uses SysTableResolver; native path → LogicalFileScan, TVF path → LogicalTVFRelation
DescribeCommand Uses SysTableResolver.resolveForDescribe(); native path returns column schema directly
PaimonScanNode getSplits() separates DataSplit vs non-DataSplit; non-DataSplit always uses JNI reader
PaimonSplit Generalized to wrap any Paimon Split (not just DataSplit); added getDataSplit()
PaimonSource resolvePaimonTable() handles both PaimonExternalTable and PaimonSysExternalTable
PhysicalPlanTranslator Uses TableType.PAIMON_EXTERNAL_TABLE check instead of instanceof PaimonExternalTable
RelationUtil getDbAndTable() uses SysTableResolver.validateForQuery()
IcebergScanNode Handles non-BaseTable metadata tables to avoid ClassCastException

Removed Classes

Class Replacement
PaimonTableValuedFunction PaimonSysExternalTable + PaimonScanNode
PaimonMeta (Nereids TVF) LogicalFileScan with PaimonSysExternalTable
PaimonSysTableJniScanner (Java) Unified PaimonJniScanner
PaimonSysTableJniReader (C++) Unified PaimonJniReader
SupportedSysTables Per-type static maps (PaimonSysTable.SUPPORTED_SYS_TABLES, etc.)

Test Plan

  • Existing regression tests updated: paimon_system_table.groovy, test_paimon_system_table_auth.groovy
  • All paimon_meta() TVF calls replaced with direct table$systemTable syntax
  • DESCRIBE table$snapshots returns correct schema via native path
  • System table queries ($snapshots, $files, $schemas, $partitions) return correct results
  • Auth tests verify permission checks still work without paimon_meta TVF
  • New test test_table_name_with_dollar.groovy verifies tables with $ in their name still work correctly
  • Iceberg and Hive partition system tables (TVF path) remain unaffected

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.29% (1792/2260)
Line Coverage 64.76% (31835/49158)
Region Coverage 65.44% (15889/24280)
Branch Coverage 55.98% (8440/15078)

@doris-robot
Copy link

TPC-H: Total hot run time: 31965 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 54aaefe6af079ed05465117410818d13ccde2c4c, data reload: false

------ Round 1 ----------------------------------
q1	17692	4594	4353	4353
q2	2068	354	231	231
q3	10461	1339	725	725
q4	10418	912	313	313
q5	9231	2234	1941	1941
q6	216	180	148	148
q7	894	786	585	585
q8	9267	1438	1133	1133
q9	5532	4839	4884	4839
q10	6916	1937	1587	1587
q11	500	285	262	262
q12	384	377	227	227
q13	17806	4235	3283	3283
q14	233	234	217	217
q15	890	815	806	806
q16	678	671	616	616
q17	648	813	524	524
q18	7227	6964	7424	6964
q19	1454	1122	670	670
q20	439	390	260	260
q21	3002	2337	1986	1986
q22	389	357	295	295
Total cold run time: 106345 ms
Total hot run time: 31965 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4673	4629	4757	4629
q2	304	348	259	259
q3	2285	2816	2371	2371
q4	1461	1999	1427	1427
q5	4454	4827	4616	4616
q6	221	176	138	138
q7	1944	1908	1794	1794
q8	2575	2433	2439	2433
q9	7878	7585	7374	7374
q10	2855	3072	2586	2586
q11	541	453	431	431
q12	627	723	574	574
q13	3563	4002	3252	3252
q14	272	304	260	260
q15	824	785	775	775
q16	642	687	649	649
q17	1094	1313	1331	1313
q18	7337	7302	7409	7302
q19	883	799	810	799
q20	1961	2037	1898	1898
q21	4583	4228	4150	4150
q22	571	550	493	493
Total cold run time: 51548 ms
Total hot run time: 49523 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.33 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 54aaefe6af079ed05465117410818d13ccde2c4c, data reload: false

query1	0.06	0.05	0.04
query2	0.14	0.07	0.07
query3	0.33	0.08	0.08
query4	1.60	0.10	0.10
query5	0.26	0.25	0.24
query6	1.14	0.66	0.64
query7	0.04	0.03	0.03
query8	0.08	0.06	0.06
query9	0.60	0.50	0.52
query10	0.55	0.56	0.54
query11	0.27	0.13	0.13
query12	0.26	0.14	0.14
query13	0.64	0.61	0.61
query14	0.99	1.00	0.97
query15	0.92	0.81	0.83
query16	0.39	0.38	0.39
query17	1.03	1.03	1.06
query18	0.25	0.23	0.22
query19	1.97	1.89	1.84
query20	0.02	0.02	0.01
query21	15.39	0.34	0.29
query22	4.94	0.12	0.11
query23	15.35	0.44	0.27
query24	2.33	0.58	0.39
query25	0.11	0.11	0.11
query26	0.18	0.18	0.18
query27	0.11	0.10	0.10
query28	3.56	1.15	0.98
query29	12.53	4.03	3.34
query30	0.35	0.12	0.13
query31	2.80	0.67	0.44
query32	3.24	0.62	0.50
query33	3.04	3.02	3.03
query34	15.86	5.15	4.54
query35	4.59	4.50	4.49
query36	0.61	0.49	0.50
query37	0.30	0.09	0.08
query38	0.27	0.06	0.06
query39	0.08	0.05	0.05
query40	0.22	0.17	0.16
query41	0.14	0.07	0.07
query42	0.08	0.04	0.05
query43	0.06	0.06	0.06
Total cold run time: 97.68 s
Total hot run time: 28.33 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 3.48% (7/201) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 30551 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 824e2c0cf41af8ed5c3dc951a0b41554fe276480, data reload: false

------ Round 1 ----------------------------------
q1	17630	4426	4279	4279
q2	2043	357	257	257
q3	10142	1276	737	737
q4	10202	767	306	306
q5	7512	2172	1905	1905
q6	197	181	154	154
q7	872	728	606	606
q8	9264	1369	1132	1132
q9	4729	4662	4632	4632
q10	6840	1909	1559	1559
q11	513	306	277	277
q12	330	375	219	219
q13	17797	4013	3230	3230
q14	235	231	232	231
q15	893	821	788	788
q16	675	686	621	621
q17	704	794	531	531
q18	6576	5717	5907	5717
q19	1289	1052	670	670
q20	621	536	394	394
q21	2786	2013	2032	2013
q22	391	332	293	293
Total cold run time: 102241 ms
Total hot run time: 30551 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4728	4517	4525	4517
q2	260	352	255	255
q3	2437	2918	2370	2370
q4	1456	1869	1408	1408
q5	4497	4562	4504	4504
q6	241	188	136	136
q7	1999	1917	1783	1783
q8	2534	2409	2320	2320
q9	7426	7467	7470	7467
q10	2801	2944	2566	2566
q11	546	466	438	438
q12	669	733	634	634
q13	3858	4006	3282	3282
q14	279	284	256	256
q15	814	776	771	771
q16	644	697	642	642
q17	1074	1294	1356	1294
q18	7630	7174	7338	7174
q19	825	838	793	793
q20	1938	2041	1851	1851
q21	4505	4233	4089	4089
q22	572	542	510	510
Total cold run time: 51733 ms
Total hot run time: 49060 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.72 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 824e2c0cf41af8ed5c3dc951a0b41554fe276480, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.04	0.04
query3	0.26	0.08	0.09
query4	1.61	0.11	0.12
query5	0.28	0.25	0.25
query6	1.15	0.67	0.68
query7	0.03	0.02	0.02
query8	0.04	0.04	0.04
query9	0.56	0.50	0.49
query10	0.54	0.54	0.53
query11	0.14	0.09	0.09
query12	0.14	0.11	0.10
query13	0.63	0.62	0.61
query14	1.05	1.06	1.05
query15	0.88	0.86	0.87
query16	0.38	0.39	0.39
query17	1.11	1.13	1.12
query18	0.23	0.21	0.21
query19	2.05	2.00	2.00
query20	0.02	0.01	0.01
query21	15.39	0.27	0.15
query22	5.17	0.06	0.05
query23	15.81	0.29	0.11
query24	1.47	0.65	0.90
query25	0.07	0.09	0.06
query26	0.14	0.14	0.14
query27	0.09	0.05	0.06
query28	4.68	1.14	0.98
query29	12.63	3.99	3.22
query30	0.28	0.13	0.12
query31	2.87	0.64	0.41
query32	3.24	0.60	0.49
query33	3.28	3.23	3.26
query34	16.13	5.33	4.68
query35	4.82	4.79	4.78
query36	0.65	0.49	0.50
query37	0.11	0.07	0.07
query38	0.08	0.04	0.04
query39	0.05	0.03	0.03
query40	0.20	0.17	0.16
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.03	0.04
Total cold run time: 98.6 s
Total hot run time: 28.72 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 10.34% (30/290) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.64% (19419/36891)
Line Coverage 36.13% (180623/499941)
Region Coverage 32.46% (139982/431236)
Branch Coverage 33.49% (60644/181081)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 71.43% (5/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.31% (26505/36153)
Line Coverage 56.33% (280910/498718)
Region Coverage 54.00% (235226/435621)
Branch Coverage 55.67% (101206/181789)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 4.14% (12/290) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Native Read for Paimon System Tables (binlog, audit_log, ro)

3 participants