| title | excerpt | products | keywords | ||||
|---|---|---|---|---|---|---|---|
About continuous aggregates |
A TimescaleDB continuous aggregate combines your data into analytic summaries and is refreshed in the background when new data is added. Learn how continuous aggregates work and how to use them |
|
|
import CaggsFunctionSupport from "versionContent/_partials/_caggs-function-support.mdx"; import CaggsIntro from "versionContent/_partials/_caggs-intro.mdx"; import CaggsTypes from "versionContent/_partials/_caggs-types.mdx"; import CreateHypertablePolicyNote from "versionContent/_partials/_create-hypertable-columnstore-policy-note.mdx";
You can create a continuous aggregate on top of another continuous aggregate. This allows you to summarize data at different granularity. For example, you might have a raw hypertable that contains second-by-second data. Create a continuous aggregate on the hypertable to calculate hourly data. To calculate daily data, create a continuous aggregate on top of your hourly continuous aggregate.
For more information, see the documentation about continuous aggregates on continuous aggregates.
Continuous aggregates support the following JOIN features:
| Feature | TimescaleDB < 2.10.x | TimescaleDB <= 2.15.x | TimescaleDB >= 2.16.x |
|---|---|---|---|
| INNER JOIN | ❌ | ✅ | ✅ |
| LEFT JOIN | ❌ | ❌ | ✅ |
| LATERAL JOIN | ❌ | ❌ | ✅ |
| Joins between ONE hypertable and ONE standard $PG table | ❌ | ✅ | ✅ |
| Joins between ONE hypertable and MANY standard $PG tables | ❌ | ❌ | ✅ |
Join conditions must be equality conditions, and there can only be ONE JOIN condition |
❌ | ✅ | ✅ |
| Any join conditions | ❌ | ❌ | ✅ |
JOINS in TimescaleDB must meet the following conditions:
- Only the changes to the hypertable are tracked, and they are updated in the continuous aggregate when it is refreshed. Changes to standard $PG table are not tracked.
- You can use an
INNER,LEFT, andLATERALjoins; no other join type is supported. - Joins on the materialized hypertable of a continuous aggregate are not supported.
- Hierarchical continuous aggregates can be created on top of a continuous
aggregate with a
JOINclause, but cannot themselves have aJOINclause.
Given the following schema:
CREATE TABLE locations (
id TEXT PRIMARY KEY,
name TEXT
);
CREATE TABLE devices (
id SERIAL PRIMARY KEY,
location_id TEXT,
name TEXT
);
CREATE TABLE conditions (
"time" TIMESTAMPTZ,
device_id INTEGER,
temperature FLOAT8
) WITH (
tsdb.hypertable
);See the following JOIN examples on continuous aggregates:
-
INNER JOINon a single equality condition, using theONclause:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) AS SELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature) FROM conditions JOIN devices ON devices.id = conditions.device_id GROUP BY bucket, devices.name WITH NO DATA;
-
INNER JOINon a single equality condition, using theONclause, with a further condition added in theWHEREclause:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) AS SELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature) FROM conditions JOIN devices ON devices.id = conditions.device_id WHERE devices.location_id = 'location123' GROUP BY bucket, devices.name WITH NO DATA;
-
INNER JOINon a single equality condition specified inWHEREclause:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) AS SELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature) FROM conditions, devices WHERE devices.id = conditions.device_id GROUP BY bucket, devices.name WITH NO DATA;
-
INNER JOINon multiple equality conditions:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) AS SELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature) FROM conditions JOIN devices ON devices.id = conditions.device_id AND devices.location_id = 'location123' GROUP BY bucket, devices.name WITH NO DATA;
TimescaleDB v2.16.x and higher.
-
INNER JOINwith a single equality condition specified inWHEREclause can be combined with further conditions in theWHEREclause:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) AS SELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature) FROM conditions, devices WHERE devices.id = conditions.device_id AND devices.location_id = 'location123' GROUP BY bucket, devices.name WITH NO DATA;
TimescaleDB v2.16.x and higher.
-
INNER JOINbetween a hypertable and multiple $PG tables:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) AS SELECT time_bucket('1 day', time) AS bucket, devices.name AS device, locations.name AS location, MIN(temperature), MAX(temperature) FROM conditions JOIN devices ON devices.id = conditions.device_id JOIN locations ON locations.id = devices.location_id GROUP BY bucket, devices.name, locations.name WITH NO DATA;
TimescaleDB v2.16.x and higher.
-
LEFT JOINbetween a hypertable and a $PG table:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) AS SELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature) FROM conditions LEFT JOIN devices ON devices.id = conditions.device_id GROUP BY bucket, devices.name WITH NO DATA;
TimescaleDB v2.16.x and higher.
-
LATERAL JOINbetween a hypertable and a subquery:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) AS SELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature) FROM conditions, LATERAL (SELECT * FROM devices WHERE devices.id = conditions.device_id) AS devices GROUP BY bucket, devices.name WITH NO DATA;
TimescaleDB v2.16.x and higher.
In $TIMESCALE_DB v2.7 and later, continuous aggregates support all $PG
aggregate functions. This includes both parallelizable aggregates, such as SUM
and AVG, and non-parallelizable aggregates, such as RANK.
In $TIMESCALE_DB v2.10.0 and later, the FROM clause supports JOINS, with
some restrictions. For more information, see the JOIN support section.
In older versions of $TIMESCALE_DB, continuous aggregates only support aggregate functions that can be parallelized by $PG. You can work around this by aggregating the other parts of your query in the continuous aggregate, then using the window function to query the aggregate.
If you want the old behavior in later versions of $TIMESCALE_DB, set the
timescaledb.finalized parameter to false when you create your continuous
aggregate.
Continuous aggregates consist of:
- Materialization hypertable to store the aggregated data in
- Materialization engine to aggregate data from the raw, underlying, table to the materialization hypertable
- Invalidation engine to determine when data needs to be re-materialized, due to changes in the data
- Query engine to access the aggregated data
Continuous aggregates take raw data from the original hypertable, aggregate it, and store the aggregated data in a materialization hypertable. When you query the continuous aggregate view, the aggregated data is returned to you as needed.
Using the same temperature example, the materialization table looks like this:
| day | location | chunk | avg temperature |
|---|---|---|---|
| 2021/01/01 | New York | 1 | 73 |
| 2021/01/01 | Stockholm | 1 | 70 |
| 2021/01/02 | New York | 2 | |
| 2021/01/02 | Stockholm | 2 | 69 |
The materialization table is stored as a $TIMESCALE_DB hypertable, to take
advantage of the scaling and query optimizations that hypertables offer.
Materialization tables contain a column for each group-by clause in the query,
and an aggregate column for each aggregate in the query.
For more information, see materialization hypertables.
The materialization engine performs two transactions. The first transaction blocks all INSERTs, UPDATEs, and DELETEs, determines the time range to materialize, and updates the invalidation threshold. The second transaction unblocks other transactions, and materializes the aggregates. The first transaction is very quick, and most of the work happens during the second transaction, to ensure that the work does not interfere with other operations.
Any change to the data in a hypertable could potentially invalidate some materialized rows. The invalidation engine checks to ensure that the system does not become swamped with invalidations.
Fortunately, time-series data means that nearly all INSERTs and UPDATEs have a recent timestamp, so the invalidation engine does not materialize all the data, but to a set point in time called the materialization threshold. This threshold is set so that the vast majority of INSERTs contain more recent timestamps. These data points have never been materialized by the continuous aggregate, so there is no additional work needed to notify the continuous aggregate that they have been added. When the materializer next runs, it is responsible for determining how much new data can be materialized without invalidating the continuous aggregate. It then materializes the more recent data and moves the materialization threshold forward in time. This ensures that the threshold lags behind the point-in-time where data changes are common, and that most INSERTs do not require any extra writes.
When data older than the invalidation threshold is changed, each transaction logs the minimum and maximum timestamps of the rows it modified. The continuous aggregate then identifies which complete time buckets are affected based on this per-transaction tracking. The range of buckets that are recalculated depends on transaction boundaries:
- If you modify rows in the 10:00 bucket and rows in the 15:00 bucket within a single transaction, all buckets from 10:00 to 15:00 (including intermediate buckets 11:00, 12:00, 13:00, and 14:00) are recalculated during refresh.
- If you modify rows in the 10:00 bucket in one transaction and rows in the 15:00 bucket in a separate transaction, only the 10:00 and 15:00 buckets are recalculated. The intermediate buckets (11:00, 12:00, 13:00, 14:00) are not affected.
This logging does cause some write load. However, the threshold lags behind the area of data that is currently changing, so the writes are small and rare.