Skip to content

Suggestion: add WFGY as an open source tool for LLM debugging and long horizon evaluation #586

@onestardao

Description

@onestardao

Hi, and thanks for maintaining this curated list of AI tools. It is very helpful for people who build on top of existing projects.

I wanted to ask whether WFGY would fit here as a tool for debugging and stress testing large language models.

Project: WFGY
Repo: https://github.com/onestardao/WFGY

Very short description:

WFGY 1.0 is a PDF that defines a self repair loop on top of any base LLM and reports benchmark results.

WFGY 2.0 adds a tension style metric and a 16 item failure map that many engineers use as a checklist when their RAG or agent systems break in strange ways.

WFGY 3.0 is a TXT based “Singularity Demo” that acts as a long horizon tension stress test. Users upload the TXT to an LLM, trigger the run menu, and observe where the model’s internal story breaks under pressure.

The repo is MIT licensed and currently has more than one thousand stars. It is used mainly as a practical toolbox rather than a research toy, which is why I thought it might fit this tools list.

If you feel this is in scope, I would be happy to send a small PR adding one short entry. If you prefer to keep the list focused on other categories of tools, that is completely fine as well.

Thanks for taking a look.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions