Skip to content

Conversation

@brendanwells-moz
Copy link

Description

The PR adds datasets for a forecasting table and a view. Officially, these are temporary datasets pending a conversation between the PDS and AE teams about naming conventions for forecasting output tables. However, we are naming them generically in the hope that we have guessed the final naming convention correctly and these won't have to be migrated.

There is no ticket for this PR, but it relates to the discussion today in the PDS x AE meeting, documented here: https://docs.google.com/document/d/1DL_Nr1e-7YC05F2ssdplcH2cmsjmtzzFWyPKbS2rgqs/edit?tab=t.0#heading=h.4itj0ra86mg6 . The original project proposal is found here: https://docs.google.com/document/d/15RNNhlcE9oj3GLlCWns6DOUWHKjZpnaEbz7OiAQZi_E/edit?tab=t.0 .

Reviewer, please follow this checklist

@brendanwells-moz brendanwells-moz requested a review from a team as a code owner November 25, 2025 21:49
@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@@ -0,0 +1,16 @@
friendly_name: Forecasts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after syncing w/ @brendanwells-moz, i recommended going ahead and creating a dataset for forecasts vs a temp one. my feeling is that we will end up w/ a dataset for forecasts shared across products + forecasts_restricted (or similar) - if that doesn't pan out we can re-name this dataset, but this seemed preferable to data science. other perspectives are welcome :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something we should discuss via a proposal to make sure we have all context around forecasting and plans for forecasting at least in the near future.

Renaming is a bit more complicated in that it would require creating a new dataset, moving all the resources into it and then deleting the old ones. For this reason, I think we should consider this a temporary space until we have a more formal decision through a proposal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! You didn't ask for it explicitly, but out of an abundance of caution I'll prefix both new datasets with tmp_.

…workgroup subgroup (defined in cloudops-infra). This allows the Outerbounds default perimeter service account to write data to tables in this dataset.
@chelseatroy
Copy link
Contributor

Okay @brendanwells-moz, I have added a commit to this PR to specify the workgroup subgroup that should receive edit permissions for the forecasts_derived dataset.

Copy link
Contributor

@chelseatroy chelseatroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment. This looks good to me; I added the lines that grant permissions to the requisite service account.

Nota Bene: I'm deferring to the team that uses and updates forecasting tables to decide how to organize those tables.

@dataops-ci-bot

This comment has been minimized.

- role: roles/bigquery.dataViewer
members:
- workgroup:mozilla-confidential
syndication: {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
syndication: {}

dataset_base_acl: view
user_facing: true
labels: {}
default_table_workgroup_access:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think default_table_workgroup_access should be included here. workgroup_access should already define all permissions that will be inherited by all queries / datasets / tables defined within this namespace unless explicitly overwritten in their own metadata files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was auto-generated by the bqetl command, FYI.

I'll hold off on changing anything pending the check below.

members:
- workgroup:dataops-managed/external-outerbounds-task-default

syndication: {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
syndication: {}

dataset_base_acl: derived
user_facing: false
labels: {}
default_table_workgroup_access:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scholtzan if I remember correctly we should not be specifying default_table_workgroup_access, this is something the metadata generation / update command does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove default_table_workgroup_access here:

DEFAULT_WORKGROUP_ACCESS = [
dict(role="roles/bigquery.dataViewer", members=["workgroup:mozilla-confidential"])
]
DEFAULT_TABLE_WORKGROUP_ACCESS = DEFAULT_WORKGROUP_ACCESS

Copy link
Contributor

@kik-kik kik-kik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few notes, we also need to make sure the tool is given read access to forecasts_derived dataset.

Also, I still think we need to have a wider agreement on how we want forecasting to fit into the bigger picture.

@chelseatroy
Copy link
Contributor

Made the changes @kik-kik pointed out, with the exception of the default_table_workgroup_access one (I know we're still waiting for confirmation on removing that).

@dataops-ci-bot

This comment has been minimized.

@brendanwells-moz
Copy link
Author

brendanwells-moz commented Nov 26, 2025

I apologize that forecasts didn't get properly renamed to tmp_forecasts, and instead git deleted the original and created the new one. I got a little lost in yaml linter "fun" and didn't properly update my commit.

@dataops-ci-bot
Copy link

Integration report for "Remove forecast, which I thought already happened"

sql.diff

Click to expand!
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod: tmp_forecasts
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod: tmp_forecasts_derived
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/tmp_forecasts/dataset_metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/tmp_forecasts/dataset_metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/tmp_forecasts/dataset_metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/tmp_forecasts/dataset_metadata.yaml	2025-11-26 22:10:54.000000000 +0000
@@ -0,0 +1,16 @@
+friendly_name: Temporary Forecasts
+description: |-
+  Temporary dataset pending a conversation on naming conventions. Use at your own risk.
+  Views on forecasts based on Firefox data and external models. Forecast views may also live in other datasets related to the products they support.
+dataset_base_acl: view
+user_facing: true
+labels: {}
+default_table_workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+default_table_expiration_ms: null
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/tmp_forecasts_derived/dataset_metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/tmp_forecasts_derived/dataset_metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/tmp_forecasts_derived/dataset_metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/tmp_forecasts_derived/dataset_metadata.yaml	2025-11-26 22:10:54.000000000 +0000
@@ -0,0 +1,21 @@
+friendly_name: Temporary Forecasts Derived
+description: |-
+  Temporary dataset pending a conversation on naming conventions. Use at your own risk.
+  Forecasts based on Firefox data and external models, plus related secondary tables. Writable by the Outerbounds default perimeter.
+  This dataset combines forecasts across products; forecasts may also live in other datasets related to the products they support.
+dataset_base_acl: derived
+user_facing: false
+labels: {}
+default_table_workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+default_table_expiration_ms: null
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+  - workgroup:dataops-managed/external-outerbounds-task-default
+- role: roles/bigquery.dataEditor
+  members:
+  - workgroup:dataops-managed/external-outerbounds-task-default

Link to full diff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not mean we necessarily have to call this tmp_ it was more a note for ourselves to keep this in mind when using it.

dataset_base_acl: view
user_facing: true
labels: {}
default_table_workgroup_access:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove default_table_workgroup_access here:

DEFAULT_WORKGROUP_ACCESS = [
dict(role="roles/bigquery.dataViewer", members=["workgroup:mozilla-confidential"])
]
DEFAULT_TABLE_WORKGROUP_ACCESS = DEFAULT_WORKGROUP_ACCESS

Copy link
Contributor

@kik-kik kik-kik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+wc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants