Skip to content

Conversation

@TiNnNnnn
Copy link

@TiNnNnnn TiNnNnnn commented Nov 24, 2025

Videx For Postgresql

Videx-for-pg is developing in the form of a PostgreSQL plugin...

[done] run with videx-statistic-server, support fetch all statistic except pg-statistic-ext
[wip] support e2e card injection during optimizing


How It Works

  1. Where is the statistical information stored?

    Postgres stores statistical information in the following system tables:

    • pg_class
    • pg_statistic
    • pg_statistic_ext
  2. How does the optimizer obtain statistics?

    postgres fetch statistics for relations depends on three core function:

    • get_relation_stats: access pg_statistic_ext,pg_statistic
    • get_index_stats: access pg_statistic_ext,pg_statistic
    • relation_estimate_size: access pg_class

videx-for-postgres fetch statistic from system tables and upload them to videx-statistic-server. At the same time, use a hook method to make the statistics retrieval function prioritize obtaining statistics from the videx-statistic-server.

Quick Start With Videx-Statistic-Server

step1: install postgresql from source code

  1. fetch source code of postgresql from https://www.postgresql.org/ftp/source/v17.5/
  2. build from source code (MAKR: you should replace target_dir and data_dir with your local path)
cd postgresql-17.5
./configure --prefix={target_dir} --enable-debug

make && make install

cd {target_dir}/bin
./initdb -U postgres -d {data_dir}
  1. Set environment variable
export PATH={target_dir}/bin:$PATH
export LD_LIBRARY_PATH={target_dir}/lib:$LD_LIBRARY_PATH

step2: compile videx

  1. Copy the pg/videx folder to the contrib folder (plugin dir for pg) in your pg directory
cp -rf videx/src/pg/videx postgresql-17.5/contrib/
  1. go into postgresql17.5/contrib/videx:
make && make install

step3: Configure and start postgresql server

Edit postgresql.conf in {data_dir},set shared_preload_libraries as videx, then postgresql while load videx.so while starting:

shared_preload_libraries = 'videx' # (change requires restart)`

we assume no password is set:

{target_dir}/bin/postgres -D {data_dir} -p 55555

step4:register videx in pg

connect postgres with psql (connect to database: postgres defaultly, we can create another databases this database):

{target_dir}/bin/psql -U postgres -p 55555

you can also directly install postgresql-client, then you can use psql under any directory:

sudo apt install -y postgresql-client // for ubuntu

step3: use videx

  1. create source database and create extension
-- connect to database postgres and create database:test
create database test;
-- switch to database:test
\c test
-- register extension videx on database:test
create extension videx;

verify if the registration was successful

test=# SELECT * FROM pg_extension WHERE extname = 'videx';
oid   | extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition 
--------+---------+----------+--------------+----------------+------------+-----------+--------------
368789 | videx   |       10 |         2200 | t              | 1.0        |           | 
(1 row)
  1. start pg-statistic-sever
python3 src/sub_platforms/sql_opt/videx/scripts/start_videx_server.py
  1. collect and import videx metadata
python3 src/sub_platforms/sql_opt/videx/scripts/videx_build_env_pg.py \
--target 127.0.0.1:55555:test:postgres:passwd \
--videx 127.0.0.1:55555:videx_test:postgres:passwd
  1. connect to database: videx_test and run explain.
psql -U postgres -p 55555 -d videx_test

@kr11
Copy link
Collaborator

kr11 commented Nov 25, 2025

Cool, a great PR! I will try and review this PR this week, thanks @TiNnNnnn

@kr11 kr11 assigned kr11 and unassigned kr11 Nov 25, 2025
@kr11 kr11 requested review from Copilot and kr11 November 25, 2025 02:17
@kr11 kr11 added the enhancement New feature or request label Nov 25, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds PostgreSQL support to the Videx virtual index system, enabling it to work as a PostgreSQL plugin/extension alongside the existing MySQL implementation. The implementation includes both Python-side metadata management and a C/C++ PostgreSQL extension for table access methods and statistics hooks.

Key Changes:

  • Adds a PostgreSQL extension (src/pg/videx/) implementing custom table access methods and statistics copying functionality
  • Refactors Python metadata classes to use base abstractions supporting both MySQL and PostgreSQL
  • Implements PostgreSQL-specific metadata fetching and statistics management in Python

Reviewed changes

Copilot reviewed 27 out of 30 changed files in this pull request and generated 59 comments.

Show a summary per file
File Description
src/pg/videx/* New PostgreSQL extension implementation including table access methods, statistics hooks, and JSON communication utilities
src/sub_platforms/sql_opt/pg_meta.py PostgreSQL-specific metadata classes (PGTable, PGColumn, PGIndex, PGStatistic)
src/sub_platforms/sql_opt/meta_base.py New base classes for database-agnostic table/column/index abstractions
src/sub_platforms/sql_opt/meta.py Refactored MySQL metadata classes to inherit from base classes
src/sub_platforms/sql_opt/videx/videx_pg_metadata.py PostgreSQL metadata fetching and construction logic
src/sub_platforms/sql_opt/videx/videx_mysql_utils.py Refactored connection configs with PostgreSQL support added
src/sub_platforms/sql_opt/videx/videx_utils.py Added schema/table serialization utilities for PostgreSQL
src/sub_platforms/sql_opt/videx/videx_service.py Added PostgreSQL environment creation function
src/sub_platforms/sql_opt/videx/scripts/videx_build_env.py Extended build script with PostgreSQL support
src/sub_platforms/sql_opt/env/rds_env.py New OpenPGEnv and DirectConnectPGEnv classes for PostgreSQL connections
src/sub_platforms/sql_opt/databases/pg/* PostgreSQL command execution and metadata retrieval implementation
src/sub_platforms/sql_opt/column_statastics/* Refactored statistics classes with base abstractions
Comments suppressed due to low confidence (1)

src/sub_platforms/sql_opt/videx/scripts/videx_build_env.py:287

    logging.info(get_usage_message(args, videx_ip, videx_port, videx_db, videx_user, videx_pwd, videx_server_ip_port))

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kr11
Copy link
Collaborator

kr11 commented Dec 29, 2025

Hi @TiNnNnnn, thanks a lot for the contribution — adding PostgreSQL support is a big step.

I have a few questions / suggestions before this becomes easy to reproduce and deeply-reviewed:

  1. Add a short "how it works" doc

    • Could you add doc/installation_pg.md (or a similar filename) with a brief architecture / workflow explanation for:
    • how videx_analyze works in PG (what stats are copied/collected, and where they are persisted: directly into the PG-VIDEX schema/tables, or sent to / stored in videx-statistic-server?),
    • whether videx_build_env plays any role in the PG flow: when collecting/filling statistics or running EXPLAIN, do we need videx_build_env or not?
  2. Reproducible build & dev environment

    • For MySQL there is a developer build environment (e.g. build/Dockerfile.build_env) which makes install/compile/debug much easier.
    • Could we have a similar PG developer setup (Dockerfile, docker-compose, or detailed install steps documented in doc/installation_pg.md) so reviewers can run the quick-start reliably?
    • In the PR description you mentioned: compile videx, register videx in PG, and use videx — is that the complete workflow? For newcomers to PG, it is still missing:
      1. which PG version(s) are supported, and how to install PG on Debian/Ubuntu (from source compile or apt-get install?);
      2. how to import/load test data;
      3. the current steps don’t mention videx-statistic-server at all — does PG-VIDEX not need the server, or is it optional (and if optional, what functionality changes when it’s enabled)?
  3. Keep MySQL and PG code minimally coupled

    • 3.1 I noticed many dataclasses in meta.py (table, column, etc.) changed their base class from BaseModel, PydanticDataClassJsonMixin to BaseTableId, BaseColumn, BaseTable, ..., but BaseXXX looks mostly like an empty implementation. Is this refactor neccessary? If the motivation is to support definitions in videx_metadata, maybe we can keep the original MySQL definitions and use Union types where needed. This would make it easier to confirm that adding PG support does not impact MySQL/MariaDB behavior. For example:
# Your version
class VidexTableStats(VidexTableStatsBase):
     table_meta: Optional[BaseTable] = None

# Another version
class VidexTableStats(VidexTableStatsBase):
     table_meta: Optional[Table|PgTable] = None
  • PG startup commands are currently added into videx_build_env via an if/else branch, but I’d suggest creating a new entry script instead (e.g. videx_build_env_pg.py) to keep the flows isolated. Some duplication is acceptable if it improves clarity.
  1. Please re-check obvious issues from Copilot/lint
    • There are some clear issues flagged (e.g. returning NotImplementedError instead of raising it, wrong pandas import, unused imports, doc typos).
      Could you please address the obvious ones before we iterate further?

Thanks again — happy to help test the PG quick-start once the environment/doc is done.

@kr11 kr11 added the important important, significant update label Feb 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request important important, significant update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants