Suggestion: add WFGY as an open source tool for LLM debugging and long horizon evaluation

Hi, and thanks for maintaining this curated list of AI tools. It is very helpful for people who build on top of existing projects.

I wanted to ask whether WFGY would fit here as a tool for debugging and stress testing large language models.

Project: WFGY
Repo: [https://github.com/onestardao/WFGY](https://github.com/onestardao/WFGY?utm_source=chatgpt.com)

Very short description:

WFGY 1.0 is a PDF that defines a self repair loop on top of any base LLM and reports benchmark results.

WFGY 2.0 adds a tension style metric and a 16 item failure map that many engineers use as a checklist when their RAG or agent systems break in strange ways.

WFGY 3.0 is a TXT based “Singularity Demo” that acts as a long horizon tension stress test. Users upload the TXT to an LLM, trigger the run menu, and observe where the model’s internal story breaks under pressure.

The repo is MIT licensed and currently has more than one thousand stars. It is used mainly as a practical toolbox rather than a research toy, which is why I thought it might fit this tools list.

If you feel this is in scope, I would be happy to send a small PR adding one short entry. If you prefer to keep the list focused on other categories of tools, that is completely fine as well.

Thanks for taking a look.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: add WFGY as an open source tool for LLM debugging and long horizon evaluation #586

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Suggestion: add WFGY as an open source tool for LLM debugging and long horizon evaluation #586

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions