Skip to content

Add comprehensive migration documentation section#3341

Open
dataroaring wants to merge 3 commits intomasterfrom
feature/migration-documentation-consolidation
Open

Add comprehensive migration documentation section#3341
dataroaring wants to merge 3 commits intomasterfrom
feature/migration-documentation-consolidation

Conversation

@dataroaring
Copy link
Contributor

Summary

  • Create consolidated migration documentation section with guides for migrating from PostgreSQL, MySQL, Elasticsearch, and other OLAP systems to Apache Doris
  • Add migration overview page with comparison table of migration paths and methods
  • Include Chinese translations for all migration guides

Test plan

  • Verify all internal documentation links resolve correctly
  • Check sidebar navigation displays Migration section after Getting Started
  • Review English documentation renders properly
  • Review Chinese documentation renders properly

🤖 Generated with Claude Code

Create consolidated migration guides covering:
- Overview page with migration path comparison table
- PostgreSQL to Doris (JDBC Catalog, Flink CDC, Export/Import)
- MySQL to Doris (Flink CDC, JDBC Catalog, DataX)
- Elasticsearch to Doris (ES Catalog, inverted index migration)
- Other OLAP systems (ClickHouse, Greenplum, Hive/Iceberg/Hudi)

Each guide includes data type mappings, step-by-step instructions,
and troubleshooting for common issues. Chinese translations included.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 5, 2026 23:40
Elasticsearch nested type should map to Doris VARIANT type for better
flexible schema handling. Added links to VARIANT documentation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive migration documentation to help users migrate data from various databases and data systems to Apache Doris. The documentation covers four major migration sources (PostgreSQL, MySQL, Elasticsearch, and other OLAP systems) with an overview page that provides a comparison table of migration paths and methods.

Changes:

  • Adds a new "Migration" section to the documentation sidebar, positioned between "Getting Started" and "Guides"
  • Creates 5 new English documentation files with detailed migration guides
  • Provides complete Chinese translations for all migration guides
  • Includes practical examples, data type mappings, and best practices for each migration source

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sidebars.ts Adds Migration category to sidebar with 5 migration guide entries
docs/migration/overview.md Overview page with migration path comparison table and method selection guide
docs/migration/postgresql-to-doris.md Complete PostgreSQL migration guide with JDBC Catalog, Flink CDC, and export-import options
docs/migration/mysql-to-doris.md MySQL migration guide emphasizing Flink CDC with full database sync capabilities
docs/migration/elasticsearch-to-doris.md Elasticsearch migration guide covering ES Catalog, Logstash, and custom scripts
docs/migration/other-olap-to-doris.md Migration guides for ClickHouse, Greenplum, Hive, Iceberg, Hudi, and Spark/Flink connectors
i18n/zh-CN/.../migration/*.md Chinese translations of all 5 migration guides

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +147 to +152
`★ Insight ─────────────────────────────────────`
1. **DUPLICATE KEY model** is best for log data where append-only writes are common
2. **Inverted indexes** enable full-text search similar to Elasticsearch
3. **Dynamic partitioning** automatically manages time-based data lifecycle
`─────────────────────────────────────────────────`

Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Insight" section present in the English version (lines 147-151) is missing from the Chinese translation. This creates an inconsistency between the English and Chinese documentation. Either add a corresponding section to the Chinese version or remove it from the English version for consistency. If keeping it, consider using standard Docusaurus admonition syntax for better rendering.

Suggested change
`★ Insight ─────────────────────────────────────`
1. **DUPLICATE KEY model** is best for log data where append-only writes are common
2. **Inverted indexes** enable full-text search similar to Elasticsearch
3. **Dynamic partitioning** automatically manages time-based data lifecycle
`─────────────────────────────────────────────────`

Copilot uses AI. Check for mistakes.

| Source System | Recommended Method | Notes |
|---------------|-------------------|-------|
| ClickHouse | JDBC Catalog + SQL Convertor | Schema and SQL syntax conversion needed |
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "Convertor" should be spelled "Converter" for correct English spelling. "Converter" is the standard spelling for a tool that converts something from one form to another.

Suggested change
| ClickHouse | JDBC Catalog + SQL Convertor | Schema and SQL syntax conversion needed |
| ClickHouse | JDBC Catalog + SQL Converter | Schema and SQL syntax conversion needed |

Copilot uses AI. Check for mistakes.
Comment on lines +147 to +151
`★ Insight ─────────────────────────────────────`
1. **DUPLICATE KEY model** is best for log data where append-only writes are common
2. **Inverted indexes** enable full-text search similar to Elasticsearch
3. **Dynamic partitioning** automatically manages time-based data lifecycle
`─────────────────────────────────────────────────`
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "Insight" section uses an unusual formatting style with inline code blocks (backticks) rather than a proper markdown callout or info box. This formatting is inconsistent with standard Docusaurus documentation practices and may not render as intended. Consider using a standard markdown blockquote (>) or a Docusaurus admonition (:::info, :::tip, etc.) instead for better visual consistency and rendering.

Suggested change
` Insight ─────────────────────────────────────`
1. **DUPLICATE KEY model** is best for log data where append-only writes are common
2. **Inverted indexes** enable full-text search similar to Elasticsearch
3. **Dynamic partitioning** automatically manages time-based data lifecycle
`─────────────────────────────────────────────────`
:::info Insight
1. **DUPLICATE KEY model** is best for log data where append-only writes are common
2. **Inverted indexes** enable full-text search similar to Elasticsearch
3. **Dynamic partitioning** automatically manages time-based data lifecycle
:::

Copilot uses AI. Check for mistakes.
Comment on lines +274 to +279
```sql
-- Use Flink Doris Connector's database sync feature
CREATE DATABASE IF NOT EXISTS sync_db;

-- FlinkCDC whole database sync configuration
-- See Flink Doris Connector documentation for complete setup
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PostgreSQL full database sync section (Step 3) provides only a placeholder comment directing users to see the Flink Doris Connector documentation, while the MySQL migration guide provides a complete working example with the CdcTools command. For consistency and completeness, consider adding a similar concrete example here showing the postgres-sync-database command, similar to the mysql-sync-database example in mysql-to-doris.md (lines 132-149).

Suggested change
```sql
-- Use Flink Doris Connector's database sync feature
CREATE DATABASE IF NOT EXISTS sync_db;
-- FlinkCDC whole database sync configuration
-- See Flink Doris Connector documentation for complete setup
```bash
# Use CdcTools to perform a full PostgreSQL database sync into Doris
# This is similar in spirit to the mysql-sync-database command
cdc-tools postgres-sync-database \
--pg-host pg-host \
--pg-port 5432 \
--pg-username pg_user \
--pg-password pg_password \
--pg-database source_db \
--include-tables "public.orders,public.customers" \
--doris-fe-nodes doris-fe:8030 \
--doris-username doris_user \
--doris-password doris_password \
--doris-database target_db \
--sink-label-prefix pg_full_sync

Copilot uses AI. Check for mistakes.
Change JSON type mappings to VARIANT across all migration docs:
- PostgreSQL: json/jsonb → VARIANT
- MySQL: JSON → VARIANT
- Elasticsearch: object, flattened → VARIANT

VARIANT type provides better flexible schema support for semi-structured
data migration. Added links to VARIANT documentation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant