Skip to content

An example project demonstrating the implementation of a feature store for time series data.

License

Notifications You must be signed in to change notification settings

jospablo777/time_series_feature_store

Repository files navigation

Building a feature store for time series forecasting with Mage & Feast

Welcome to our feature store workshop. This workshop was originally scheduled to be presented at the Data Day on October 21, 2025, in Monterrey, Mexico. In this session, we will guide you through the process of building a feature store for a multi-series forecasting project using Mage and Feast.

  • Mage for data ingestion, transformation, and orchestration.

  • Feast for feature definitions, lineage, training/serving parity, and materialization.

We'll work with the Iowa Liquor Sales open dataset and produce training tables + online features you can query from a model.

Our feature store diagram

What you’ll learn

  • To understand entities, data sources, feature views, and feature services in Feast.

  • How to build reusable ETL/ELT pipelines in Mage that write to DuckDB/Parquet.

  • How to generate point-in-time correct training datasets and materialize them in an online store.

Tech stack

  • Python 3.11 & 3.12, uv for dependency management

  • Mage-AI (pipelines & blocks)

  • Feast (≥ 0.50)

  • DuckDB (offline store & local dev tables)

  • SQLite (online store)

  • Parquet/Arrow for feature artifacts

All CLI commands below assume uv is installed. You can see the intallation instructions here, but can also be installed with pip (see below).

Repository structure:

time_series_feature_store/
├── data/                               # Local data & caches (dev only; usually git-ignored)
│   ├── features/                       # Materialized features (e.g., Parquet/Arrow) for inspection
│   ├── iowa_liquor.duckdb              # Data we pulled from Iowa SODA API
│   ├── online_store.db                 # Feast online store (SQLite, dev)
│   ├── registry.db                     # Feast registry 
│   └── info.txt                        
│
├── data_movement_and_transformation/   # Ingestion + transformation code (ETL/ELT pipelines)
│   ├── custom/                         # Mage blocks
│   ├── data_exporters/                 # Mage blocks, writers: DuckDB/Parquet/Feast sources
│   ├── data_loaders/                   # Mage blocks, readers: APIs/DBs
│   ├── sql/                            # Parameterized SQL used by loaders/transformers
│   ├── transformers/                   # Mage blocks, feature engineering steps (clean/aggregate/encode)
│   ├── pipelines/                      # Mage pipelines metadata
│   ├── io_config.yaml                  # I/O profiles (local/dev/prod endpoints, creds handles)
│   ├── metadata.yaml                   # Orchestrator metadata (pipeline defs, block config)
│   ├── main.py                         # CLI entrypoints (run pipelines, bootstrap data)
│   ├── pyproject.toml                  # Runtime & build deps for this package
│   ├── README.md                       # How to run/extend ingestion & transforms
│   └── uv.lock                         # Locked dependency graph (pin exact versions)
│
├── feature_store/                      # Feast repository (specs for entities, sources, features)
│   ├── __init__.py
│   ├── data_sources/                   # Feast DataSource objects (DuckDB/Parquet/SQL, etc.)
│   ├── entities.py                     # Feast Entity definitions (keys, join semantics)
│   ├── feature_views/                  # Feast FeatureView specs (schemas, TTLs, tags)
│   ├── feature_services/               # Feast FeatureService groupings for training/online
│   ├── pyproject.toml                  # Deps for the FS package
│   └── uv.lock                         # Locked deps for reproducible FS runs
│
├── ml_dev_and_experimentation/         # Notebooks & experiments using the feature store
│   ├── data/                           # Small sample slices for notebooks (ok to git-ignore)
│   ├── notebooks/                      # Exploratory analysis, model prototyping
│   ├── feature_store.yaml              # Same as the one on feature_store/
│   ├── pyproject.toml                  # Deps for experimentation environment
│   └── uv.lock                         # Locked deps for experiments
│
├── utils/                              # Reusable utilities shared across subprojects
│   ├── utils/                          # Python package (e.g., logging, timing, FS helpers)
│   ├── pyproject.toml                  # Deps for the utils package (if packaged separately)
│   └── tsfs_utils.egg-info             # Build metadata (GENERATED; not needed in VCS)
│
├── metadata.yaml                       # Top-level project metadata (e.g., repo settings/docs)
├── LICENSE                             # License for this repository
└── README.md                           # Project overview, quickstart, and links to sub-READMEs

Prerequisites

  • Python 3.11+ and uv
pip install uv
  • Other dependencies will be pulled by uv.

Quickstart

1) Install dependencies

In each of the the environment folders (data_movement_and_transformation, feature_store, and ml_dev_and_experimentation) run:

uv sync

This will pull and sync the dependencies for each "service".

2) Save liquor data locally for fast iteration (pull Iowa SODA and stage to DuckDB)

We store this data locally to save time, allowing us to avoid pulling everything from the SODA API each time we need it. Think of it as a low-latency source for this information. This approach is intended for learning purposes. To become more familiar with the pipeline that retrieves this data, you can refer to this tutorial.

To pull the data from the Iowa Liquor API, run:

cd data_movement_and_transformation
uv run mage run . socrata_iowa_liquor_pipeline

This will pull raw records from the Iowa SODA API (bounded sample for workshop) and store them into data/iowa_liquor.duckdb. This pipeline also creates additional features not present in the original Iowa Liquor data.

You need the complete data set (from 2012 to the current month) to obtain all liquor types and stores; otherwise, at ML-modeling time, you will encounter an error due to the lack of dummy columns. This is because several of the dummies (for stores and liquor types) we recorded in our feature store

You will need to run this several times, using the pipeline socrata_iowa_liquor_pipeline. The number of times you need to run the pipeline should be around 4 to capture information from 2012 to 2025. The pipeline will incrementally add new data (data that is not present) to your local DuckDB.

Please note that if you don't have enough resources, the pipeline run could crash. I apologize for this :(

At this point, you should be thinking "That's a lot of technical debt you have there." And you're absolutely right. Technical debt is unavoidable, and in this case, needed so I can deliver this workshop to you on time 😅.

3) Create the features of the store and fill it so we can use them

uv run mage run . holidays_us       # Get holidays data from Nager API
uv run mage run . long_weekends_us  # Get long weekends data from Nager API

# Now, with all the raw data we need, we tell Mage to create, move, and save the data of our features
uv run mage run . fill_feature_store

# Some extra data (total sales)
uv run mage run . total_sales_weekly_python
uv run mage run . total_sales_monthly_python

Also, I recommend you to explore the Mage's UI. For thus you can use:

uv run mage start

4) Apply Feast repo (entities, data sources, feature views)

cd ../feature_store
uv run feast apply

This compiles and writes the registry to data/registry.db. With the compiled feature store, we can now view the metadata of our features and also interact with the feature store.

I also encourage you to explore the feature store user interface. You can do it as follows:

uv run feast ui --port 8889

This will start a local server, and you will be able to explore our feature store at http://localhost:8889/. Here you will be able to discover the available features and check their metadata like description, owners, type, relations, and so on.

5) Get training data and build a forecaster with it

cd ../ml_dev_and_experimentation

# We will use Jupyter Lab to ensure we are using our uv ML environment
uv run jupyter lab

Here, you can go to ml_dev_and_experimentation/notebooks/skforecast_forecast.ipynb and explore how we can use Feast to get some training data from our feature store.

Common commands

# Run a Mage pipeline
cd data_movement_and_transformation
uv run mage run . <pipeline_name>

# Feast lifecycle
cd ../feature_store
uv run feast apply
uv run feast ui      # Browse registry/objects locally
uv run feast materialize 2024-01-01T00:00:00 2024-12-31T23:59:59

Troubleshooting

  • DuckDB locked or file busy

    • Stop active Python processes; remove iowa_liquor.duckdb.wal if present; re-run the step.
  • Feast registry drift

    • Delete data/registry.db (dev only), then uv run feast apply.
  • Materialize finds no rows

    • Check your DataSource time column and field_mapping; verify the date range and time zone.

License

This project is licensed under the terms of the repository’s MIT License.

Citation

As always, there is no need to cite, yet it would mean a lot if you did 😃. Feel free to use this code and project structure in your personal or work projects!

Limitations

  • The system was tested only on Linux (Ubuntu 24.04) and macOS; hence, it could present instabilities on Windows systems.

  • Feature stores are by default time series systems and are not traditionally used for forecasting. Hence, this use case could be controversial.

Acknowledgements

  • Mage: data pipelines and orchestration.

  • Feast: feature store, training/serving consistency.

  • Iowa Department of Commerce (open data source).

About

An example project demonstrating the implementation of a feature store for time series data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published