We should add a feature to automatically generate question and ground truth pairs for documents within a collection.
Proposed Generation Methods:
- Simple Questions: Generate questions by feeding document chunks directly to an LLM.
- Complex Questions: Use the
GraphIndex to create in-depth questions that require synthesizing information from different parts of a document or even across multiple documents.
Requirements:
- Store the generated question and ground truth pairs in the database.
- Add configuration options to the collection settings page to enable/disable this feature and set up a scheduled job for generation.