Skip to content

Conversation

@zaneselvans
Copy link
Member

@zaneselvans zaneselvans commented Aug 31, 2025

Overview

I created this PR to do some learning/experimentation while also doing something productive:

  • Get more familiar with Marimo notebooks, since we're using them in our metrics and they generally seem like a cool new tool for sharing data visualizations and analyses.
  • Experiment with Polars dataframes to understand whether it might be a good tool for the performance oriented work that we are looking at doing in PUDL.
  • Define a python environment for this repository, so that the notebooks stored here can potentially be run independently, since the data can be accessed directly from S3 now, rather than needing to be read from Kaggle.
  • Get more familiar with using pixi to manage conda environments, as we look to migrate away from conda-lock in the main PUDL repo.
  • Experiment with deploying marimo notebooks via GitHub Pages using WASM, to see if that might be a reasonable way to share lightweight PUDL examples.
  • Provide an environment for reviewing the SEC 10-K data and docs in this PR since I used Polars, and it's not part of the pudl-dev environment.

Launching the notebook

Using uv, marimo and the inline script metadata

  • Install uv if you haven't already.
  • Install marimo as a standalone tool using uv:
uv tool install marimo
  • Launch the SEC 10-K notebook:
uv tool run marimo edit --sandbox marimo/sec10k-data-review.py

Using pixi and the conda environment defined in this PR

pixi run marimo edit marimo/sec10k-data-review.py

Issues / Questions

  • Overall it was easy to get the Marimo notebook working and pulling data from either local PUDL_OUTPUT or S3, and the dependency tracking in the notebook meant that once the data had been pulled (about 30 seconds of download) the remote version worked just as well as the local one, which was great.
  • I tried messing around with altair for plotting, but found it frustrating and decided to put that off for a later time, and just stuck with Matplotlib for now.
  • The notebook is successfully being deployed on GitHub Pages, but it seems to be getting a very stale environment -- it has polars 1.24 while the version specified in the script metadata and pixi environment is 1.32.
  • Maybe as a result of that stale environment, the pl.read_parquet() is failing in the deployed environment.
  • Maybe this is an issue with pyodide? I'm not sure.
  • I also have not figured out how to disable the default GitHub Pages deployment action, which is now happening alongside the deployment that is specified on this branch.
  • The most recent version of marimo is not yet available on conda-forge for MacOS so it is specified as a PyPI dependency for the moment. This is due to a dependency called loro that still needs to be updated on conda-forge.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@zaneselvans zaneselvans marked this pull request as ready for review September 1, 2025 21:18
@zaneselvans zaneselvans assigned krivard and zaneselvans and unassigned krivard Sep 1, 2025
@zaneselvans zaneselvans requested a review from krivard September 1, 2025 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants