Skip to content
Change the repository type filter

All

    Repositories list

    • BARGAIN

      Public
      Low-Cost LLM-Powered Data Processing with Theoretical Guarantees
      Python
      63500Updated Feb 4, 2026Feb 4, 2026
    • docetl

      Public
      A system for agentic LLM-powered data processing and ETL
      Python
      3773.6k288Updated Feb 2, 2026Feb 2, 2026
    • Python
      0200Updated Jan 1, 2026Jan 1, 2026
    • TWIX

      Public
      TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared under…
      Python
      1721242Updated Nov 26, 2025Nov 26, 2025
    • Welcoming contributions from practitioners building AI/data systems - share your real-world problems, document where current tools fail, and help improve the be…
      Python
      21100Updated Sep 4, 2025Sep 4, 2025
    • Examples of docetl pipelines
      Python
      1200Updated Apr 22, 2025Apr 22, 2025
    • Parse PDFs using computer vision, layout analysis, and other state-of-the-art document intelligence techniques. WebApp implemented in Flask/Jinja2 with infer an…
      JavaScript
      2900Updated Jul 26, 2024Jul 26, 2024
    • Introduction to Flordb with PyTorch and TensorFlow
      Jupyter Notebook
      0000Updated Apr 9, 2024Apr 9, 2024