This repo contains the code to reproduce the prototype presented in our paper "Safe, Untrusted, Proof-Carrying AI Agents: towards the agentic lakehouse", presented at S2AI@IEEE Big Data 2025; in particular, we leverage Bauplan as a programmable lakehouse (together with its MCP server) to showcase how an LLM-based agent can autonomously repair a data pipeline in a cloud lakehouse (and do it safely and under human supervision).
If you're curious about the final result before diving into the code, you can check out this short video for a walkthrough of the prototype.
To use Bauplan, you need a free API key from the website. Once you have your key, follow the instructions to create a local configuration file.
We use uv to manage the Python environment: do
uv syncto create the environment and install the dependencies for this project.
Create a .env file inside of the src folder by copying the local.env file and filling in the relevant API keys depending on the inference provider of choice.
Get the MCP server from GitHub and follow the instructions to set it up. Start the MCP server with:
uv run python main.py --transport streamable-http --profile claudeagentNOTE: we use claude as the model for this example and claudeagent as the corresponding Bauplan profile: make sure to change the relevant startup parameters in launch_agent.py (and possibly model_utils.py) if you wish to run with OpenAI or TogetherAI as inference provider.
When launching the main agentic loop, the script will first run a faulty pipeline (the code is in bpln_pipeline) to create a failed job, and then it will run the agent prompting it to repair (generically) recent failed jobs (the exact request to the agent is in queries.py) - since we have just ran a pipeline, we know that the request can be fullfilled. Of course, you can check the Bauplan dashboard to check that indeed the run was attempted and failed.
cd src
uv run python launch_agent.pyWe provide scenario-based tests to make sure the agent is calling the expected MCP-provided tools for certain predefined queries. To run the tests, make sure the MCP server is running (see above) and then run:
cd src
uv run pytest -vvvNOTE: tests are currently only set up for the claudeagent profile.
The code is released "as is" under the MIT License. See the LICENSE file for details.