Add comprehensive migration documentation section#3341
Add comprehensive migration documentation section#3341dataroaring wants to merge 3 commits intomasterfrom
Conversation
Create consolidated migration guides covering: - Overview page with migration path comparison table - PostgreSQL to Doris (JDBC Catalog, Flink CDC, Export/Import) - MySQL to Doris (Flink CDC, JDBC Catalog, DataX) - Elasticsearch to Doris (ES Catalog, inverted index migration) - Other OLAP systems (ClickHouse, Greenplum, Hive/Iceberg/Hudi) Each guide includes data type mappings, step-by-step instructions, and troubleshooting for common issues. Chinese translations included. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Elasticsearch nested type should map to Doris VARIANT type for better flexible schema handling. Added links to VARIANT documentation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This pull request adds comprehensive migration documentation to help users migrate data from various databases and data systems to Apache Doris. The documentation covers four major migration sources (PostgreSQL, MySQL, Elasticsearch, and other OLAP systems) with an overview page that provides a comparison table of migration paths and methods.
Changes:
- Adds a new "Migration" section to the documentation sidebar, positioned between "Getting Started" and "Guides"
- Creates 5 new English documentation files with detailed migration guides
- Provides complete Chinese translations for all migration guides
- Includes practical examples, data type mappings, and best practices for each migration source
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| sidebars.ts | Adds Migration category to sidebar with 5 migration guide entries |
| docs/migration/overview.md | Overview page with migration path comparison table and method selection guide |
| docs/migration/postgresql-to-doris.md | Complete PostgreSQL migration guide with JDBC Catalog, Flink CDC, and export-import options |
| docs/migration/mysql-to-doris.md | MySQL migration guide emphasizing Flink CDC with full database sync capabilities |
| docs/migration/elasticsearch-to-doris.md | Elasticsearch migration guide covering ES Catalog, Logstash, and custom scripts |
| docs/migration/other-olap-to-doris.md | Migration guides for ClickHouse, Greenplum, Hive, Iceberg, Hudi, and Spark/Flink connectors |
| i18n/zh-CN/.../migration/*.md | Chinese translations of all 5 migration guides |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| `★ Insight ─────────────────────────────────────` | ||
| 1. **DUPLICATE KEY model** is best for log data where append-only writes are common | ||
| 2. **Inverted indexes** enable full-text search similar to Elasticsearch | ||
| 3. **Dynamic partitioning** automatically manages time-based data lifecycle | ||
| `─────────────────────────────────────────────────` | ||
|
|
There was a problem hiding this comment.
The "Insight" section present in the English version (lines 147-151) is missing from the Chinese translation. This creates an inconsistency between the English and Chinese documentation. Either add a corresponding section to the Chinese version or remove it from the English version for consistency. If keeping it, consider using standard Docusaurus admonition syntax for better rendering.
| `★ Insight ─────────────────────────────────────` | |
| 1. **DUPLICATE KEY model** is best for log data where append-only writes are common | |
| 2. **Inverted indexes** enable full-text search similar to Elasticsearch | |
| 3. **Dynamic partitioning** automatically manages time-based data lifecycle | |
| `─────────────────────────────────────────────────` |
|
|
||
| | Source System | Recommended Method | Notes | | ||
| |---------------|-------------------|-------| | ||
| | ClickHouse | JDBC Catalog + SQL Convertor | Schema and SQL syntax conversion needed | |
There was a problem hiding this comment.
The term "Convertor" should be spelled "Converter" for correct English spelling. "Converter" is the standard spelling for a tool that converts something from one form to another.
| | ClickHouse | JDBC Catalog + SQL Convertor | Schema and SQL syntax conversion needed | | |
| | ClickHouse | JDBC Catalog + SQL Converter | Schema and SQL syntax conversion needed | |
| `★ Insight ─────────────────────────────────────` | ||
| 1. **DUPLICATE KEY model** is best for log data where append-only writes are common | ||
| 2. **Inverted indexes** enable full-text search similar to Elasticsearch | ||
| 3. **Dynamic partitioning** automatically manages time-based data lifecycle | ||
| `─────────────────────────────────────────────────` |
There was a problem hiding this comment.
This "Insight" section uses an unusual formatting style with inline code blocks (backticks) rather than a proper markdown callout or info box. This formatting is inconsistent with standard Docusaurus documentation practices and may not render as intended. Consider using a standard markdown blockquote (>) or a Docusaurus admonition (:::info, :::tip, etc.) instead for better visual consistency and rendering.
| `★ Insight ─────────────────────────────────────` | |
| 1. **DUPLICATE KEY model** is best for log data where append-only writes are common | |
| 2. **Inverted indexes** enable full-text search similar to Elasticsearch | |
| 3. **Dynamic partitioning** automatically manages time-based data lifecycle | |
| `─────────────────────────────────────────────────` | |
| :::info Insight | |
| 1. **DUPLICATE KEY model** is best for log data where append-only writes are common | |
| 2. **Inverted indexes** enable full-text search similar to Elasticsearch | |
| 3. **Dynamic partitioning** automatically manages time-based data lifecycle | |
| ::: |
| ```sql | ||
| -- Use Flink Doris Connector's database sync feature | ||
| CREATE DATABASE IF NOT EXISTS sync_db; | ||
|
|
||
| -- FlinkCDC whole database sync configuration | ||
| -- See Flink Doris Connector documentation for complete setup |
There was a problem hiding this comment.
The PostgreSQL full database sync section (Step 3) provides only a placeholder comment directing users to see the Flink Doris Connector documentation, while the MySQL migration guide provides a complete working example with the CdcTools command. For consistency and completeness, consider adding a similar concrete example here showing the postgres-sync-database command, similar to the mysql-sync-database example in mysql-to-doris.md (lines 132-149).
| ```sql | |
| -- Use Flink Doris Connector's database sync feature | |
| CREATE DATABASE IF NOT EXISTS sync_db; | |
| -- FlinkCDC whole database sync configuration | |
| -- See Flink Doris Connector documentation for complete setup | |
| ```bash | |
| # Use CdcTools to perform a full PostgreSQL database sync into Doris | |
| # This is similar in spirit to the mysql-sync-database command | |
| cdc-tools postgres-sync-database \ | |
| --pg-host pg-host \ | |
| --pg-port 5432 \ | |
| --pg-username pg_user \ | |
| --pg-password pg_password \ | |
| --pg-database source_db \ | |
| --include-tables "public.orders,public.customers" \ | |
| --doris-fe-nodes doris-fe:8030 \ | |
| --doris-username doris_user \ | |
| --doris-password doris_password \ | |
| --doris-database target_db \ | |
| --sink-label-prefix pg_full_sync |
Change JSON type mappings to VARIANT across all migration docs: - PostgreSQL: json/jsonb → VARIANT - MySQL: JSON → VARIANT - Elasticsearch: object, flattened → VARIANT VARIANT type provides better flexible schema support for semi-structured data migration. Added links to VARIANT documentation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Test plan
🤖 Generated with Claude Code