From 4520d870bba907f271c9176df602f2affd11ec62 Mon Sep 17 00:00:00 2001 From: Lev Gorodetskiy Date: Sun, 18 May 2025 00:29:52 -0300 Subject: [PATCH 01/17] outdated migrations page --- docs/1.getting-started/5.database.md | 4 +++ docs/12.faq.md | 43 ++++++++-------------------- 2 files changed, 16 insertions(+), 31 deletions(-) diff --git a/docs/1.getting-started/5.database.md b/docs/1.getting-started/5.database.md index a15992ebe..87964877f 100644 --- a/docs/1.getting-started/5.database.md +++ b/docs/1.getting-started/5.database.md @@ -128,6 +128,10 @@ For more information visit the official TimescaleDB documentation: ## Migrations +::banner{type="warning"} +Using migrations is generally not recommended. See [F.A.Q.](../12.faq.md#how-to-perform-database-migrations) for details. +:: + ::banner{type="note"} The database migrations feature is optional and is disabled by default. To enable it, you need to install `aerich`, which is available in the `[migrations]` optional dependencies group and set the `DIPDUP_MIGRATIONS` environment variable. :: diff --git a/docs/12.faq.md b/docs/12.faq.md index 7f2b34afc..94a06ea4e 100644 --- a/docs/12.faq.md +++ b/docs/12.faq.md @@ -56,7 +56,9 @@ Instead, save raw data in handlers and process it later with hooks when all cond ### How to perform database migrations? -At the moment DipDup does not have a built-in migration system. Framework architecture implies that schema changes are rare and usually require reindexing. However, you can perform migrations yourself using third-party tools or write your own scripts and keep them in `sql` project directory. +Using migrations is not recommended. DipDup architecture is designed to be simple and predictable. It uses a single database schema for all indexes, and any changes to the schema require a full reindex to ensure data consistency. Consider using SQL scripts instead as described [below](#how-to-modify-schema-manually). + +If want to proceed with migration tools, DipDup provides integration with [aerich](https://github.com/tortoise/aerich). See [Migrations](./1.getting-started/5.database.md#migrations) section for details. You may want to disable the schema hash check in config. Alternatively, call the `schema approve` command after every schema change. @@ -66,35 +68,6 @@ advanced: schema_modified: ignore ``` -Now, let's prepare a migration script. To determine the changes you need to make, you can compare the SQL schema dump before and after modifying the models. Say you need to add a new field to one of the models. - -```diff -class Event(Model): - ... -+ timestamp = fields.DatetimeField() -``` - -```shell -dipdup schema export > old-schema.sql -# [make some changes] -dipdup schema export > new-schema.sql -diff old-schema.sql new-schema.sql -``` - -And here's SQL for the new column: - -```diff -+ "timestamp" TIMESTAMP NOT NULL, -``` - -Now prepare the idempotent migration script and put it in the `sql/on_restart` directory. - -```sql [sql/on_restart/00-add-timestamp.sql] -ALTER TABLE "event" ADD COLUMN IF NOT EXISTS "timestamp" TIMESTAMP NOT NULL; - -SELECT dipdup_approve('public'); -``` - ### I get `schema_modified` error, but I didn't change anything DipDup compares the current schema hash with the one stored in the database. If they don't match, it throws an error. If models were not modified, most likely the reason is incorrect model definitions. e.g. if you define a timestamp field like this… @@ -150,7 +123,7 @@ WARNING dipdup.database Decimal context precision has been updated: 28 -> ### How to modify schema manually? -Drop an idempotent SQL script into `sql/on_reindex/` directory. For example, here's how to create a Timescale hypertable: +Drop an SQL script into `sql/on_reindex/` directory. It will be executed after the Tortoise schema initialization. For example, here's how to create a Timescale hypertable: ```sql [sql/on_reindex/00_prepare_db.sql] CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE; @@ -160,6 +133,14 @@ ALTER TABLE swap ADD PRIMARY KEY (id, timestamp); SELECT create_hypertable('swap', 'timestamp', chunk_time_interval => 7776000); ``` +If you want to modify existing schema and know what you're doing, put the idempodent (i.e. can be executed multiple times without changing the result) in `sql/on_restart/` directory and call `dipdup_approve` function inside. + +```sql [sql/on_restart/00_alter_timestamp.sql] +ALTER TABLE "event" ADD COLUMN IF NOT EXISTS "timestamp" TIMESTAMP NOT NULL; + +SELECT dipdup_approve('public'); +``` + ## Package ### What is the symlink in the project root for? From 9f62cd2819f794ba23b6fc72aa5fb30172a4085d Mon Sep 17 00:00:00 2001 From: Lev Gorodetskiy Date: Sun, 18 May 2025 00:58:22 -0300 Subject: [PATCH 02/17] outdated requirements, basic faq --- docs/12.faq.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/docs/12.faq.md b/docs/12.faq.md index 94a06ea4e..e07dd2893 100644 --- a/docs/12.faq.md +++ b/docs/12.faq.md @@ -12,11 +12,25 @@ This page contains answers to the most frequently asked questions about DipDup. ## General +### What is DipDup? + +DipDup is a Python framework for building indexing applications. It allows you to index data from various blockchains and other sources, process it, and store it in a database. DipDup is designed to be fast, efficient, and easy to use. + +### Why DipDup? + +- **Declarative configuration**: DipDup separates business logic from indexing mechanics through a YAML-based configuration file, making it easy to understand and modify your indexing rules without diving into code. +- **Type-safe development**: Generated typeclasses provide type hints for smart contract data, enabling IDE autocompletion and catching errors before runtime. +- **Multi-chain support**: DipDup supports multiple blockchains including EVM-compatible chains, Starknet, Substrate and Tezos, allowing you to build cross-chain applications with a unified API. +- **Hassle-free deployment**: Deploy DipDup on any machine with Python and PostgreSQL/SQLite. Docker integration simplifies cloud, on-premises, or local deployments. +- **Developer experience**: Rich CLI tooling, comprehensive documentation, and a helpful community make development smooth and efficient. +- **Monitoring and observability**: Built-in Prometheus metrics, Sentry integration, crash reports and detailed help messages make it easy to track application health and diagnose issues. +- **Free and open source**: DipDup is licensed under MIT license, giving you the freedom to use and modify it for any purpose. + ### What hardware do I need to run DipDup? DipDup can run on any amd64/arm64 machine starting from 1 CPU core and 256M of RAM. Aim for a good single-threaded and disk I/O performance. -Actual RAM requirements depend on multiple factors: the number and complexity of indexes, the size of internal queues and caches, and the usage of `CachedModel`. For the average project, 1GB is usually enough. If you're running DipDup on some ultra-low-end instance and getting OOMs, try the `DIPDUP_LOW_MEMORY=1` environment variable. +Actual RAM requirements can grow significantly depending on the number and complexity of indexes, the size of internal queues and caches, and `CachedModel` usage. ## Indexing From f38ace912cd5370a09f9cf74fe7699b1d7e1fecd Mon Sep 17 00:00:00 2001 From: Lev Gorodetskiy Date: Sun, 18 May 2025 01:24:18 -0300 Subject: [PATCH 03/17] more outdated faq --- docs/12.faq.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/docs/12.faq.md b/docs/12.faq.md index e07dd2893..2f47910d2 100644 --- a/docs/12.faq.md +++ b/docs/12.faq.md @@ -163,17 +163,20 @@ DipDup project must be a valid discoverable Python package. Python searches for ## Maintenance -### pipx, Poetry, PDM... What's the difference? +### What's the difference between uv/pip/Poetry/others? -For historical reasons, Python package management is a mess. There are multiple tools and approaches to manage Python dependencies. Here's a brief comparison: +**tl;dr**: Just use `uv` for everything. -- **pip** is a general-purpose package manager. It's simple and robust, but you need to manage venvs and lock files manually. -- **pipx** is meant for applications. It installs packages into separate environments in `~/.local/pipx/venvs` and makes their CLI commands available from any path. pipx is a recommended way to install DipDup CLI. -- **Poetry** and **PDM** are full-fledged project management tools. They handle venvs, lock files, dependency resolution, publishing, etc. +For historical reasons, Python package management is a mess. There are multiple tools and approaches to manage Python dependencies. **pip** is a general-purpose package manager. It's simple and robust, but only covers basic functionality. For a full-fledged project, you need to use a tool to handle virtual environments, lock files, dependency resolution, publishing, etc. Some of the most popular tools are: uv, Poetry, PDM, Hatch and others. -Using PDM/Poetry is not required to run DipDup, but strongly recommended. Choosing one over the other is a matter of personal preference. _As of writing_, Poetry is [faster](https://lincolnloop.github.io/python-package-manager-shootout/), more popular, and has a nicer CLI, while PDM is more PEP-compatible and allows dependency overrides. +Starting with version 8.3, DipDup uses **uv** as the default package manager for both CLI installer and project management. This tool is extremely fast, reliable, and replaces all bulk of functionality provided by other tools. -You can choose the preferred tool (or none) when initializing a project with `dipdup new` command. If you change your mind later, modify the `replay.yaml` file and run `dipdup init --force`. +Poetry and PDM integration in DipDup is deprecated and will be removed in future releases. To perform a migration, run the following commands: + +```shell [Terminal] +sed -i 's/\( package_manager: \).*/\1uv/' configs/replay.yaml +dipdup init --force +``` ## Miscellaneous From 1dab0fc685f3d25c9a226841f0e5771e830739ba Mon Sep 17 00:00:00 2001 From: Lev Gorodetskiy Date: Sun, 18 May 2025 01:27:14 -0300 Subject: [PATCH 04/17] .default.env cleanup --- docs/10.supported-networks/0.overview.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/10.supported-networks/0.overview.md b/docs/10.supported-networks/0.overview.md index 07915b5ed..77b1143a0 100644 --- a/docs/10.supported-networks/0.overview.md +++ b/docs/10.supported-networks/0.overview.md @@ -37,7 +37,11 @@ datasources: ws_url: ${NODE_WS_URL:-wss://eth-mainnet.g.alchemy.com/v2}/${NODE_API_KEY:-''} ``` -To configure datasources for other networks, you need to change URLs and API keys. You can do it in the config file directly, but it's better to use environment variables. Check the `deploy/.env.default` file in your project directory; it contains all the variables used in config. +To configure datasources for other networks, you need to change URLs and API keys. You can do it in the config file directly, but it's better to use environment variables. Run the following command to create a `.env` file with all the necessary variables: + +```shell [Terminal] +dipdup config env -o deploy/.env +``` [evm.subsquid](../3.datasources/1.evm_subsquid.md) - Subsquid Network is the main source of historical data for EVM-compatible networks. It's free and available for many networks. From 6b53e75e48c4b1adaf604d4f54021e934c6e2f59 Mon Sep 17 00:00:00 2001 From: Lev Gorodetskiy Date: Sun, 18 May 2025 01:29:27 -0300 Subject: [PATCH 05/17] env default cleanup --- docs/1.getting-started/4.package.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/1.getting-started/4.package.md b/docs/1.getting-started/4.package.md index 1c6bf5ac8..9afea6b47 100644 --- a/docs/1.getting-started/4.package.md +++ b/docs/1.getting-started/4.package.md @@ -15,7 +15,7 @@ The structure of the resulting package is the following: | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | | :file_folder: `abi` | Contract ABIs used to generate typeclasses | | :file_folder: `configs` | Environment-specific configs to merge with the root one | -| :file_folder: `deploy` | Dockerfiles, compose files, and default env variables for each environment | +| :file_folder: `deploy` | Dockerfiles and Compose manifests | | :file_folder: `graphql` | Custom GraphQL queries to expose with Hasura engine | | :file_folder: `handlers` | User-defined callbacks to process contract data | | :file_folder: `hasura` | Arbitrary Hasura metadata to apply during configuration | @@ -94,7 +94,6 @@ The `deploy` directory contains: - `Dockerfile`, a recipe to build a Docker image with your project. Usually, you won't need to modify it. See comments inside for details. - Compose files to run your project locally or in the cloud. -- Default env variables for each environment. See [Environment variables](../1.getting-started/3.config.md#environment-variables) for details. ## Nested packages From 102056d9774487a6323646cc15474d24ece62c21 Mon Sep 17 00:00:00 2001 From: Lev Gorodetskiy Date: Sun, 18 May 2025 01:43:37 -0300 Subject: [PATCH 06/17] glossary --- docs/15.glossary.md | 104 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 94 insertions(+), 10 deletions(-) diff --git a/docs/15.glossary.md b/docs/15.glossary.md index 2ff016fb6..1a31da24a 100644 --- a/docs/15.glossary.md +++ b/docs/15.glossary.md @@ -1,6 +1,6 @@ --- title: Glossary -description: "Our sponsors, contributors and other acknowledgments" +description: "Glossary of terms used in DipDup documentation" nested: Resources --- @@ -30,11 +30,17 @@ A configuration file which defines a project's structure, settings, environment In DipDup, an object passed as a first argument to all callbacks. Provides access to the current state of the indexer and various methods to interact with it. +### contract alias + +A user-defined name for a contract in the DipDup config, used to reference contracts throughout the project. + ### datasource +A connector to an external API or node providing blockchain data. Examples: `tezos.tzkt`, `evm.subsquid`, `starknet.node`, `substrate.subscan`. Datasources are defined in the config and used by indexes to fetch data. + ### DipDup -An open source framework for building smart contract indexes for the Tezos network. +An open source Python framework for building smart contract indexers for Tezos, EVM, Starknet, Substrate, and other blockchains. ### Docker @@ -48,19 +54,29 @@ A tool for defining and managing multi-container Docker applications, using a YA Variables used to define the environment in which a program runs, providing a way to configure settings, paths, and other system-specific information. Could be set with `export KEY=VALUE` in the terminal or defined in `.env` file. +### EVM + +Ethereum Virtual Machine. The computation engine for Ethereum and compatible blockchains. DipDup supports EVM-compatible networks for event and transaction indexing. + +### event + +A log emitted by a smart contract, typically used to signal that something of interest has happened. In DipDup, events are indexed via event indexes. + ### GraphQL A query language and runtime for APIs that enables clients to request only the data they need, offering more flexibility and efficiency compared to traditional REST APIs. ### handler +A Python function that processes blockchain data matching a pattern or filter. Handlers implement business logic and are defined in the config and Python code. + ### Hasura An open-source engine that connects to databases and microservices, providing real-time GraphQL APIs for faster and efficient data access. ### head -The latest block on the blockchain. In DipDup terminology, this term applies to [Datasources](1.getting-started/4.package.md). +The latest block on the blockchain. In DipDup terminology, this term applies to [Datasources](1.getting-started/7.datasources.md). When "index level" is used, it refers to the latest block that has been fully processed and committed to the database. ### hook @@ -68,9 +84,13 @@ A user-defined function that is executed at specific points in the lifecycle of ### index +In DipDup, a configuration entry that defines what data to query from the blockchain, how to filter it, and which handlers to call. Each index operates independently and can target different contracts, events, or operations. + +Not to be confused with the database index, which is a data structure that improves the speed of data retrieval operations on a database table. + ### indexer -A program that reads data from a blockchain and stores it in a database for quick and easy querying. +A program that reads data from a blockchain and stores it in a database for quick and easy querying. DipDup is an indexer framework. ### job @@ -78,9 +98,7 @@ A scheduled task that runs at specific intervals or times. ### JSONSchema -A a vocabulary that allows for the definition of the structure and validation of JSON data. - -DipDup uses JSONSchema to validate the configuration file and generate types for the project. +A vocabulary that allows for the definition of the structure and validation of JSON data. DipDup uses JSONSchema to validate the configuration file and generate types for the project. ### level @@ -90,6 +108,10 @@ In DipDup, [block number](#block-number). A Python class representing a database table, defined using the Tortoise ORM library. +### ORM + +Object-Relational Mapping. A technique for interacting with a database using Python classes and objects instead of SQL queries. DipDup uses Tortoise ORM. + ### package A directory containing all the files needed to run a DipDup project. DipDup projects must be a valid Python package. See the [Package](1.getting-started/4.package.md) page. @@ -104,7 +126,7 @@ An open-source monitoring and alerting toolkit designed for reliability and scal ### RPC API -RPC stands for Remote Procedure Call. A protocol used to communicate with Tezos nodes and interact with the blockchain. DipDup receives minimal amount of data from RPC API due to slow performance relatively to TzKT and other APIs. +RPC stands for Remote Procedure Call. A protocol used to communicate with blockchain nodes and interact with the blockchain. DipDup receives minimal amount of data from RPC API due to slow performance relatively to other sources. ### schema @@ -118,21 +140,45 @@ A toolkit for developing smart contract indexing applications. A real-time error tracking and monitoring platform that helps developers identify, diagnose, and fix issues in applications, improving overall software quality and performance. +### Starknet + +A ZK-rollup network on Ethereum. DipDup supports Starknet for event and transaction indexing. + +### Substrate + +A blockchain framework for building custom blockchains, used by Polkadot, Kusama, and others. DipDup supports Substrate-based networks for event indexing. + ### sync level +The highest block level that has been fully processed and committed to the database by the indexer. + +### template + +**Config**: A reusable index definition with placeholders for values. Templates allow you to define common patterns and reuse them across multiple indexes, reducing duplication and improving maintainability. + +**CLI**: A project scaffold or example provided by DipDup to help users start new projects quickly. See `dipdup new -t