Skip to content

Conversation

@bendk
Copy link
Contributor

@bendk bendk commented Feb 5, 2026

Created the rust_component_metrics dataset, which contains data for shared Rust components that ship on Desktop, iOS, and/or Android. See https://github.com/mozilla/application-services/ for examples of these components.

Added SQL generators to create derived datasets that aggregate Glean metrics. The aggregate tables will be used to create dashboards for teams that own the Rust components. There's already some code in the application-services repo to generate these dashboards, however the queries are running slowly and often time out. Currently supported metrics are counters, distributions, labeled distributions and events.

SQL generators were used in order to make it easy for teams to add metrics for their components in the future. We can tell them to update sql_generators/rust_component_metrics/__init__.py and open a PR.

I could almost have just used the GLAM ETL tables, but we need some extra capabilities:

Description

Related Tickets & Documents

Reviewer, please follow this checklist

@bendk bendk requested a review from a team as a code owner February 5, 2026 21:55
CROSS JOIN UNNEST(metrics.{{ metric.table }}.{{ category }}_{{ metric.name }}.values) as values
-- This generates multiple rows based on the `value` field. This is needed to make the `APPROX_QUANTILES`
-- weigh `value.key` correctly.
CROSS JOIN UNNEST(GENERATE_ARRAY(1, `values`.value))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best way to get percentiles for a distribution metric? I tried to find a UDF for this, but couldn't.

@bendk bendk force-pushed the push-ltmxmvkrqvsq branch 2 times, most recently from 4c0d58d to 45b1c7c Compare February 9, 2026 15:57
Created the `rust_component_metrics` dataset, which contains data for
shared Rust components that ship on Desktop, iOS, and/or Android.  See
https://github.com/mozilla/application-services/ for examples of these
components.

Added SQL generators to create derived datasets that aggregate Glean
metrics. The aggregate tables will be used to create dashboards for
teams that own the Rust components. There's already some code in the
application-services repo to generate these dashboards, however the
queries are running slowly and often time out.  Currently supported
metrics are counters, distributions, labeled distributions and events.

SQL generators were used in order to make it easy for teams to add
metrics for their components in the future.  We can tell them to update
`sql_generators/rust_component_metrics/__init__.py` and open a PR.

I could almost have just used the GLAM ETL tables, but we need some extra capabilities:
* firefox-ios support (mozilla/glam#1830)
* Aggregation by submission date (mozilla/glam#1073)
@bendk
Copy link
Contributor Author

bendk commented Feb 9, 2026

Fixed the YAML formatting and the partition date field name. Hopefully this makes CI green.

metrics=[
LabeledDistribution("ingest_download_time", DistributionType.timing),
LabeledDistribution("ingest_time", DistributionType.timing),
LabeledDistribution("ingest_query_time", DistributionType.timing),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dry-run CI task is failing because it looks like this the suggest_ingest_query_time doesn't actually exist in the source table.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the full dryrun error for the query:

sql/*****************************/rust_component_derived_moz_fx_data_shared_prod_2707a00cf2b29f0afd4e378935d3378c99f6599e_761c7b67/ingest_query_time_v1/query.sql ERROR
 [{'code': 400, 'errors': [{'message': 'Field name suggest_ingest_query_time does not exist in STRUCT<network_http3_complete_load ARRAY<STRUCT<key STRING, value STRUCT<bucket_count INT64, count INT64, histogram_type STRING, ...>>>, network_http3_first_sent_to_last_received ARRAY<STRUCT<key STRING, value STRUCT<bucket_count INT64, count INT64, histogram_type STRING, ...>>>, network_http3_open_to_first_received ARRAY<STRUCT<key STRING, value STRUCT<bucket_count INT64, count INT64, histogram_type STRING, ...>>>, ...>; Did you mean suggest_ingest_time? at [21:50]', 'domain': 'global', 'reason': 'invalidQuery', 'location': 'q', 'locationType': 'parameter'}], 'response': {'headers': {'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'content-encoding': 'gzip', 'content-type': 'application/json; charset=UTF-8', 'date': 'Mon, 09 Feb 2026 18:29:08 GMT', 'server': 'ESF', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'x-content-type-options': 'nosniff', 'x-frame-options': 'SAMEORIGIN', 'x-xss-protection': '0'}}, 'message': 'Field name suggest_ingest_query_time does not exist in STRUCT<network_http3_complete_load ARRAY<STRUCT<key STRING, value STRUCT<bucket_count INT64, count INT64, histogram_type STRING, ...>>>, network_http3_first_sent_to_last_received ARRAY<STRUCT<key STRING, value STRUCT<bucket_count INT64, count INT64, histogram_type STRING, ...>>>, network_http3_open_to_first_received ARRAY<STRUCT<key STRING, value STRUCT<bucket_count INT64, count INT64, histogram_type STRING, ...>>>, ...>; Did you mean suggest_ingest_time? at [21:50]'}]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that typo. I changed it to query_time, which should fix it.

@scholtzan scholtzan enabled auto-merge February 9, 2026 20:19
@scholtzan scholtzan added this pull request to the merge queue Feb 9, 2026
github-merge-queue bot pushed a commit that referenced this pull request Feb 9, 2026
* Rust component metrics

Created the `rust_component_metrics` dataset, which contains data for
shared Rust components that ship on Desktop, iOS, and/or Android.  See
https://github.com/mozilla/application-services/ for examples of these
components.

Added SQL generators to create derived datasets that aggregate Glean
metrics. The aggregate tables will be used to create dashboards for
teams that own the Rust components. There's already some code in the
application-services repo to generate these dashboards, however the
queries are running slowly and often time out.  Currently supported
metrics are counters, distributions, labeled distributions and events.

SQL generators were used in order to make it easy for teams to add
metrics for their components in the future.  We can tell them to update
`sql_generators/rust_component_metrics/__init__.py` and open a PR.

I could almost have just used the GLAM ETL tables, but we need some extra capabilities:
* firefox-ios support (mozilla/glam#1830)
* Aggregation by submission date (mozilla/glam#1073)

* Apply suggestion from @scholtzan

---------

Co-authored-by: Anna Scholtz <anna@scholtzan.net>
Merged via the queue into mozilla:main with commit 8d9e74f Feb 9, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants