Welcome to our feature store workshop. This workshop was originally scheduled to be presented at the Data Day on October 21, 2025, in Monterrey, Mexico. In this session, we will guide you through the process of building a feature store for a multi-series forecasting project using Mage and Feast.
-
Mage for data ingestion, transformation, and orchestration.
-
Feast for feature definitions, lineage, training/serving parity, and materialization.
We'll work with the Iowa Liquor Sales open dataset and produce training tables + online features you can query from a model.
-
To understand entities, data sources, feature views, and feature services in Feast.
-
How to build reusable ETL/ELT pipelines in Mage that write to DuckDB/Parquet.
-
How to generate point-in-time correct training datasets and materialize them in an online store.
-
Python 3.11 & 3.12,
uvfor dependency management -
Mage-AI (pipelines & blocks)
-
Feast (≥ 0.50)
-
DuckDB (offline store & local dev tables)
-
SQLite (online store)
-
Parquet/Arrow for feature artifacts
All CLI commands below assume uv is installed. You can see the intallation instructions here, but can also be installed with pip (see below).
time_series_feature_store/
├── data/ # Local data & caches (dev only; usually git-ignored)
│ ├── features/ # Materialized features (e.g., Parquet/Arrow) for inspection
│ ├── iowa_liquor.duckdb # Data we pulled from Iowa SODA API
│ ├── online_store.db # Feast online store (SQLite, dev)
│ ├── registry.db # Feast registry
│ └── info.txt
│
├── data_movement_and_transformation/ # Ingestion + transformation code (ETL/ELT pipelines)
│ ├── custom/ # Mage blocks
│ ├── data_exporters/ # Mage blocks, writers: DuckDB/Parquet/Feast sources
│ ├── data_loaders/ # Mage blocks, readers: APIs/DBs
│ ├── sql/ # Parameterized SQL used by loaders/transformers
│ ├── transformers/ # Mage blocks, feature engineering steps (clean/aggregate/encode)
│ ├── pipelines/ # Mage pipelines metadata
│ ├── io_config.yaml # I/O profiles (local/dev/prod endpoints, creds handles)
│ ├── metadata.yaml # Orchestrator metadata (pipeline defs, block config)
│ ├── main.py # CLI entrypoints (run pipelines, bootstrap data)
│ ├── pyproject.toml # Runtime & build deps for this package
│ ├── README.md # How to run/extend ingestion & transforms
│ └── uv.lock # Locked dependency graph (pin exact versions)
│
├── feature_store/ # Feast repository (specs for entities, sources, features)
│ ├── __init__.py
│ ├── data_sources/ # Feast DataSource objects (DuckDB/Parquet/SQL, etc.)
│ ├── entities.py # Feast Entity definitions (keys, join semantics)
│ ├── feature_views/ # Feast FeatureView specs (schemas, TTLs, tags)
│ ├── feature_services/ # Feast FeatureService groupings for training/online
│ ├── pyproject.toml # Deps for the FS package
│ └── uv.lock # Locked deps for reproducible FS runs
│
├── ml_dev_and_experimentation/ # Notebooks & experiments using the feature store
│ ├── data/ # Small sample slices for notebooks (ok to git-ignore)
│ ├── notebooks/ # Exploratory analysis, model prototyping
│ ├── feature_store.yaml # Same as the one on feature_store/
│ ├── pyproject.toml # Deps for experimentation environment
│ └── uv.lock # Locked deps for experiments
│
├── utils/ # Reusable utilities shared across subprojects
│ ├── utils/ # Python package (e.g., logging, timing, FS helpers)
│ ├── pyproject.toml # Deps for the utils package (if packaged separately)
│ └── tsfs_utils.egg-info # Build metadata (GENERATED; not needed in VCS)
│
├── metadata.yaml # Top-level project metadata (e.g., repo settings/docs)
├── LICENSE # License for this repository
└── README.md # Project overview, quickstart, and links to sub-READMEs
- Python 3.11+ and uv
pip install uv- Other dependencies will be pulled by uv.
In each of the the environment folders (data_movement_and_transformation, feature_store, and ml_dev_and_experimentation) run:
uv syncThis will pull and sync the dependencies for each "service".
We store this data locally to save time, allowing us to avoid pulling everything from the SODA API each time we need it. Think of it as a low-latency source for this information. This approach is intended for learning purposes. To become more familiar with the pipeline that retrieves this data, you can refer to this tutorial.
To pull the data from the Iowa Liquor API, run:
cd data_movement_and_transformation
uv run mage run . socrata_iowa_liquor_pipelineThis will pull raw records from the Iowa SODA API (bounded sample for workshop) and store them into data/iowa_liquor.duckdb. This pipeline also creates additional features not present in the original Iowa Liquor data.
You need the complete data set (from 2012 to the current month) to obtain all liquor types and stores; otherwise, at ML-modeling time, you will encounter an error due to the lack of dummy columns. This is because several of the dummies (for stores and liquor types) we recorded in our feature store
You will need to run this several times, using the pipeline socrata_iowa_liquor_pipeline. The number of times you need to run the pipeline should be around 4 to capture information from 2012 to 2025. The pipeline will incrementally add new data (data that is not present) to your local DuckDB.
Please note that if you don't have enough resources, the pipeline run could crash. I apologize for this :(
At this point, you should be thinking "That's a lot of technical debt you have there." And you're absolutely right. Technical debt is unavoidable, and in this case, needed so I can deliver this workshop to you on time 😅.
uv run mage run . holidays_us # Get holidays data from Nager API
uv run mage run . long_weekends_us # Get long weekends data from Nager API
# Now, with all the raw data we need, we tell Mage to create, move, and save the data of our features
uv run mage run . fill_feature_store
# Some extra data (total sales)
uv run mage run . total_sales_weekly_python
uv run mage run . total_sales_monthly_pythonAlso, I recommend you to explore the Mage's UI. For thus you can use:
uv run mage startcd ../feature_store
uv run feast applyThis compiles and writes the registry to data/registry.db. With the compiled feature store, we can now view the metadata of our features and also interact with the feature store.
I also encourage you to explore the feature store user interface. You can do it as follows:
uv run feast ui --port 8889This will start a local server, and you will be able to explore our feature store at http://localhost:8889/. Here you will be able to discover the available features and check their metadata like description, owners, type, relations, and so on.
cd ../ml_dev_and_experimentation
# We will use Jupyter Lab to ensure we are using our uv ML environment
uv run jupyter labHere, you can go to ml_dev_and_experimentation/notebooks/skforecast_forecast.ipynb and explore how we can use Feast to get some training data from our feature store.
# Run a Mage pipeline
cd data_movement_and_transformation
uv run mage run . <pipeline_name>
# Feast lifecycle
cd ../feature_store
uv run feast apply
uv run feast ui # Browse registry/objects locally
uv run feast materialize 2024-01-01T00:00:00 2024-12-31T23:59:59-
DuckDB locked or file busy
- Stop active Python processes; remove iowa_liquor.duckdb.wal if present; re-run the step.
-
Feast registry drift
- Delete data/registry.db (dev only), then uv run feast apply.
-
Materialize finds no rows
- Check your DataSource time column and field_mapping; verify the date range and time zone.
This project is licensed under the terms of the repository’s MIT License.
As always, there is no need to cite, yet it would mean a lot if you did 😃. Feel free to use this code and project structure in your personal or work projects!
-
The system was tested only on Linux (Ubuntu 24.04) and macOS; hence, it could present instabilities on Windows systems.
-
Feature stores are by default time series systems and are not traditionally used for forecasting. Hence, this use case could be controversial.
-
Mage: data pipelines and orchestration.
-
Feast: feature store, training/serving consistency.
-
Iowa Department of Commerce (open data source).
