Skip to content

123yongming/Panda_Dive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🐼 Panda_Dive

Panda_Dive

ι’†εŸŸζ·±εΊ¦ζœη΄’ε·₯ε…· - Deep Domain Research Tool

Python Version License Version LangGraph LangChain Last Commit Stars

A powerful multi-agent deep research tool built with LangGraph and LangChain. Panda_Dive orchestrates multiple researcher agents to comprehensively explore any domain, synthesize findings, and generate detailed reports with retrieval quality safeguards.

πŸ“‘ Table of Contents


✨ Features

πŸ€– Multi-Agent Research System

  • Supervisory Agent: Intelligently delegates research tasks to multiple specialized researcher agents
  • Concurrent Execution: Run up to 20 research tasks in parallel for maximum efficiency
  • Dynamic Task Delegation: The supervisor adapts based on research progress and findings

🧠 Flexible LLM Support

Panda_Dive supports multiple LLM providers out of the box:

  • OpenAI (GPT-4, GPT-3.5)
  • Anthropic (Claude 3.5, Claude 3)
  • DeepSeek (DeepSeek V3)
  • Google (VertexAI, GenAI)
  • Groq (Llama, Mixtral)
  • AWS Bedrock

Configure different models for different research stages:

  • Research queries
  • Information compression
  • Summarization
  • Final report generation

πŸ“Š Smart Token Management

  • Automatic Truncation: Intelligently handles token limit errors
  • Retry Logic: Robust retry mechanism for failed tool calls
  • Context Optimization: Compresses research findings to stay within limits

πŸ”§ Extensibility

  • MCP Integration: Extend tools via Model Context Protocol
  • LangSmith Tracing: Full observability and debugging support
  • Multiple Search APIs: Tavily, DuckDuckGo, Exa, ArXiv (DuckDuckGo is now the default - privacy-friendly and no API key required)

🎯 Retrieval Quality Loop

  • Query Rewriting: Expand queries to improve recall (supports both Tavily and DuckDuckGo)
  • Relevance Scoring: Score each result on a 0.0-1.0 scale
  • Reranking: Prioritize higher-quality sources before synthesis
  • Robust Error Handling: Graceful handling of connection issues for DuckDuckGo searches

πŸ†• Recent Updates

  • Added a polished local frontend demo view for Panda_Dive research workflows
  • Added a complete sample context research report for quick output reference
  • Updated README with visual showcase and direct links to example assets

πŸ–ΌοΈ Showcase

Frontend Effect Demo

Panda_Dive frontend demo

Panda_Dive Research Output Example

Preview topic: Systematic Investigation of Context in LLM-based Agent Systems

The sample report demonstrates:

  • Conceptual overview and context taxonomy
  • Design patterns (dispatcher, state channels, event sourcing)
  • Multi-agent context lifecycle, trade-offs, and failure modes
  • Open challenges and research directions for 2025-2026

πŸ—οΈ Architecture

Panda_Dive uses a sophisticated multi-agent graph architecture with three hierarchical layers: Main Graph (entry point), Supervisor Subgraph (orchestration), and Researcher Subgraph (execution).

Main Graph

Entry point handling user interaction, research brief generation, and final report synthesis:

graph TD
    START([START]) --> CLARIFY[clarify_with_user]
    
    CLARIFY --"need_clarification=True"--> USER["πŸ”„ Return to User<br/>with question"]
    USER -->|User response| CLARIFY
    
    CLARIFY --"need_clarification=False"--> BRIEF[write_research_brief]
    BRIEF --> SUPERVISOR["🧩 research_supervisor<br/>Subgraph Entry"]
    
    SUPERVISOR -->|All research<br/>completed| REPORT[final_report_generation]
    REPORT --> END([END])
    
    style START fill:#e1f5ff,stroke:#333,stroke-width:2px
    style END fill:#e1f5ff,stroke:#333,stroke-width:2px
    style CLARIFY fill:#fff3cd,stroke:#333
    style BRIEF fill:#d4edda,stroke:#333
    style SUPERVISOR fill:#f8dce0,stroke:#333,stroke-width:3px
    style REPORT fill:#cce5ff,stroke:#333
    style USER fill:#fff3cd,stroke:#666,stroke-dasharray: 5 5
Loading

Supervisor Subgraph

Orchestrates parallel research by dynamically spawning researcher subgraphs:

graph TB
    subgraph SUPERVISOR["🧩 Supervisor Subgraph"]
        START_S([START]) --> S[supervisor<br/>Lead Researcher]
        
        S --> ST{supervisor_tools<br/>Tool Router}
        
        %% Tool executions
        ST -->|think_tool| THINK["πŸ’­ Strategic Reflection"]
        THINK --> S
        
        ST -->|ConductResearch| SPAWN["πŸš€ Dynamic Subgraph Spawning"]
        
        %% Dynamic spawning detail
        subgraph DYNAMIC["πŸ”„ Dynamic Concurrency Control"]
            SPAWN --> CHECK{"Within<br/>max_concurrent<br/>limit?"}
            CHECK -->|Yes| RESEARCHER["🧩 researcher_subgraph<br/>(Instance N)"]
            CHECK -->|No| OVERFLOW["⚠️ Overflow:<br/>Skip with error"]
            RESEARCHER -->|async gather| COLLECT["πŸ“Š Collect Results"]
            OVERFLOW --> COLLECT
        end
        
        COLLECT --> UPDATE["πŸ“ Update State:<br/>β€’ notes<br/>β€’ raw_notes"]
        UPDATE --> S
        
        ST -->|ResearchComplete| DONE_S[Done]
        
        %% Loop conditions
        ST -.->|Iterations <<br/>max_researcher<br/>_iterations| S
    end
    
    style START_S fill:#e1f5ff
    style DONE_S fill:#d4edda
    style S fill:#f8dce0,stroke:#333,stroke-width:3px
    style ST fill:#fff3cd,stroke:#333
    style SPAWN fill:#d4edda,stroke:#333,stroke-width:2px
    style DYNAMIC fill:#f0f8ff,stroke:#666,stroke-dasharray: 3 3
    style RESEARCHER fill:#cce5ff,stroke:#333
Loading

Researcher Subgraph

Executes individual research tasks with the 6-step retrieval quality loop:

graph TB
    subgraph RESEARCHER["🧩 Researcher Subgraph"]
        START_R([START]) --> R[researcher<br/>Research Assistant]
        
        R --> RT{researcher_tools<br/>Tool Router}
        
        %% Tool executions
        RT -->|think_tool| THINK_R["πŸ’­ Strategic<br/>Reflection"]
        THINK_R --> R
        
        RT -->|Search Tool| RQL["🎯 Retrieval Quality Loop"]
        
        %% Retrieval Quality Loop detail
        subgraph RQL_DETAIL["πŸ”„ Query β†’ Results β†’ Score β†’ Rerank"]
            RQL --> REWRITE["1️⃣ Query Rewriting<br/>Generate N variants"]
            REWRITE --> SEARCH["2️⃣ Search Execution<br/>tavily/duckduckgo"]
            SEARCH --> PARSE["3️⃣ Result Parsing<br/>β†’ Structured dicts"]
            PARSE --> SCORE["4️⃣ Relevance Scoring<br/>LLM: 0.0-1.0"]
            SCORE --> RERANK["5️⃣ Reranking<br/>+ Source weight"]
            RERANK --> FORMAT["6️⃣ Format Results<br/>For researcher"]
            
            %% State tracking
            STATE["πŸ“Š State Tracking:<br/>β€’ rewritten_queries<br/>β€’ relevance_scores<br/>β€’ reranked_results<br/>β€’ quality_notes"]
        end
        
        FORMAT --> UPDATE_R["πŸ“ Update State"]
        UPDATE_R --> R
        
        RT -->|MCP Tools| MCP["πŸ”§ MCP Tools<br/>(Dynamic Loading)"]
        MCP --> R
        
        RT -->|ResearchComplete| COMPRESS[compress_research]
        
        COMPRESS --> DONE_R[Done]
        
        %% Loop conditions
        RT -.->|tool_calls <<br/>max_react<br/>_tool_calls| R
    end
    
    style START_R fill:#e1f5ff
    style DONE_R fill:#d4edda
    style R fill:#cce5ff,stroke:#333,stroke-width:3px
    style RT fill:#fff3cd,stroke:#333
    style RQL fill:#f8dce0,stroke:#333,stroke-width:2px
    style RQL_DETAIL fill:#fff5f5,stroke:#666,stroke-dasharray: 3 3
    style RESEARCHER fill:#cce5ff,stroke:#333
    style STATE fill:#f0f8ff,stroke:#999
Loading

Architecture Highlights

Layer Components Key Features
Main Graph clarify_with_user, write_research_brief, research_supervisor, final_report_generation User interaction, clarification loop, brief generation, report synthesis
Supervisor Subgraph supervisor, supervisor_tools, Dynamic Spawning Parallel research orchestration, concurrency control (max_concurrent_research_units), async subgraph spawning
Researcher Subgraph researcher, researcher_tools, Retrieval Quality Loop, compress_research Individual research execution, 6-step retrieval quality (rewrite β†’ search β†’ parse β†’ score β†’ rerank β†’ format), MCP integration

Data Flow

User Query 
  β†’ Main Graph (Clarification β†’ Brief)
  β†’ Supervisor Subgraph (Parallel delegation)
    β†’ Researcher Subgraph Instance 1 (Quality Loop)
    β†’ Researcher Subgraph Instance 2 (Quality Loop)
    β†’ Researcher Subgraph Instance N (Quality Loop)
  β†’ Main Graph (Synthesis β†’ Report)
  β†’ User

Each researcher subgraph executes the full retrieval quality loop: Query Rewriting β†’ Search Execution β†’ Result Parsing β†’ Relevance Scoring β†’ Reranking β†’ Result Formatting, with all metrics tracked in state for observability.


πŸ“¦ Installation

Prerequisites

  • Python 3.11 or higher
  • API keys for your chosen LLM provider(s)
  • (Optional) Tavily API key if using Tavily search (DuckDuckGo requires no API key)

Install from source

# Clone the repository
git clone https://github.com/123yongming/Panda_Dive.git
cd Panda_Dive

Linux/macOS

# Create virtual environment with uv
uv venv
source .venv/bin/activate

# Install dependencies
uv sync

Windows

# Create virtual environment
python -m venv venv
.\venv\Scripts\Activate

# Install uv and dependencies
pip install uv
uv pip install -r pyproject.toml

Alternative: Using pip directly

# Create virtual environment
python -m venv .venv

# Activate (Linux/macOS: source .venv/bin/activate, Windows: .venv\Scripts\activate)
source .venv/bin/activate

# Install in editable mode
pip install -e .

Configuration

Copy the example environment file and configure your API keys:

# Linux/macOS
cp .env.example .env

# Windows
copy .env.example .env

Edit .env with your credentials:

OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key
TAVILY_API_KEY=your_tavily_key
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=panda_dive

πŸš€ Quick Start

Basic Usage

from Panda_Dive import Configuration, deep_researcher
from langchain_core.messages import HumanMessage

# Configure the researcher (DuckDuckGo is default - no API key needed!)
config = Configuration(
    max_researcher_iterations=6,
    max_concurrent_research_units=4,
    allow_clarification=True,
    model="openai:gpt-4o"
)

# Start research
topic = "What are the latest developments in quantum computing?"

result = deep_researcher.invoke(
    {"messages": [HumanMessage(content=topic)]},
    config=config.to_runnable_config()
)

print(result["messages"][-1].content)

Running with LangSmith

You can also run Panda_Dive as a LangGraph development server:

uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking --host 0.0.0.0 --port 2026

This will start the development server on http://localhost:2026 with in-memory storage, allowing you to interact with the deep researcher through the LangSmith UI.


βš™οΈ Configuration

Key Options

Parameter Type Default Description
search_api str "duckduckgo" Search API to use: duckduckgo (default), tavily, exa, arxiv, or none
max_researcher_iterations int 6 Maximum iterations per researcher (1-10)
max_react_tool_calls int 6 Maximum tool calls per reaction (1-30)
max_concurrent_research_units int 4 Parallel research tasks (1-20)
allow_clarification bool True Ask clarifying questions before research
model str "openai:gpt-4o" Default model for research
query_variants int 3 Number of query variants for retrieval quality
relevance_threshold float 0.7 Minimum relevance score threshold
rerank_top_k int 10 Number of documents after reranking
rerank_weight_source str "auto" Source weighting strategy for reranking

πŸ” How It Works

Research Process

  1. Clarification (Optional)

    • Asks clarifying questions to understand research scope
    • User can confirm or modify the research brief
  2. Research Brief Generation

    • Creates a structured brief based on the topic
    • Identifies key areas to investigate
  3. Supervised Research

    • Supervisor delegates specific research tasks
    • Multiple researcher agents work in parallel
    • Each researcher explores their assigned subtopic
  4. Research Synthesis

    • Compresses individual findings to fit context
    • Synthesizes cross-cutting insights
  5. Final Report

    • Generates comprehensive, well-structured report
    • Includes citations and sources

πŸ§ͺ Evaluation

Panda_Dive includes a comprehensive evaluation framework using LangSmith to benchmark the deep research system against the "Deep Research Bench" dataset.

Environment Variables

Before running evaluations, ensure these environment variables are set:

Variable Required Description
LANGSMITH_API_KEY Yes LangSmith API key for evaluation tracking
OPENAI_API_KEY No* OpenAI API key (if using OpenAI models)
ANTHROPIC_API_KEY No* Anthropic API key (if using Claude models)
DEEPSEEK_API_KEY No* DeepSeek API key (if using DeepSeek models)

*Required only if using the respective provider's models.

Smoke Test (Quick Validation)

Run a quick smoke test on 2 examples to validate the setup:

# Basic smoke test (2 examples, default settings)
python tests/run_evaluate.py --smoke --dataset-name "deep_research_bench"

Supervisor Parallelism Evaluation

This evaluation measures both intended parallelism (tool-call count) and observed parallelism (span overlap) for the supervisor.

# Create the dataset (one-time setup)
python tests/create_supervisor_parallelism_dataset.py \
  --dataset-name "Panda_Dive: Supervisor Parallelism" \
  --source tests/prompt/supervisor_parallelism.jsonl

# Run the evaluation
python tests/run_evaluate.py \
  --dataset-name "Panda_Dive: Supervisor Parallelism" \
  --max-concurrency 1 \
  --experiment-prefix "supervisor-parallel"

Metrics produced:

  • tool_call_count_match: Whether actual tool calls match the reference count
  • parallel_overlap_ms: Total overlap time (ms) across trace spans

Full Evaluation

Run a full evaluation on the entire dataset (⚠️ Warning: Expensive!):

# Full evaluation (all dataset examples)
python tests/run_evaluate.py --full

Configuration Options

Flag Default Description
--smoke - Run smoke test (2 examples)
--full - Run full evaluation (all examples)
--dataset-name "Deep Research Bench" Dataset name in LangSmith
--max-examples 2 (smoke) / all (full) Maximum examples to evaluate
--experiment-prefix Auto-generated Prefix for experiment name
--max-concurrency 2 Maximum concurrent evaluations (max: 5)
--timeout-seconds 1800 Per-example timeout (seconds)
--model From env/config Model to use for evaluation

Cost Warning

⚠️ Full evaluation runs can be expensive! A full run on the "Deep Research Bench" dataset can cost $50-200+ depending on the model used. Always:

  1. Run a smoke test first to validate setup
  2. Monitor LangSmith during the run
  3. Start with lower concurrency to control costs

Exporting Results

After evaluation, export results to JSONL format:

# Export results using experiment project name
python tests/extract_langsmith_data.py \
  --project-name "deep-research-eval-smoke-20250204-120000" \
  --model-name "gpt-4o" \
  --output-dir tests/expt_results/

# Force overwrite if file exists
python tests/extract_langsmith_data.py \
  --project-name "your-experiment-name" \
  --model-name "claude-3-5-sonnet" \
  --force

Export Options

Flag Required Default Description
--project-name Yes - LangSmith project name containing the experiment runs
--model-name Yes - Model name (used for output filename)
--dataset-name No "Deep Research Bench" Dataset name for validation
--output-dir No tests/expt_results/ Output directory for JSONL file
--force No false Overwrite existing file if it exists

πŸ§ͺ Development

Running Tests

# Run all tests
python -m pytest

# Run with verbose output
python -m pytest -v

# Run with coverage
python -m pytest --cov=Panda_Dive

# Run specific test
python -m pytest src/test_api.py::test_function_name

Linting and Formatting

# Check code style
ruff check .

# Auto-fix issues
ruff check --fix .

# Type checking
mypy src/Panda_Dive/

Code Style Guidelines

  • Python 3.10+ type hints (e.g., list[str], not List[str])
  • Google-style docstrings
  • Async/await patterns for all graph nodes
  • Proper error handling and logging

See AGENTS.md for detailed development guidelines.


πŸ“‚ Project Structure

Panda_Dive/
β”œβ”€β”€ docs/
β”‚   └── retrieval-quality-loop.md  # Retrieval quality loop report
β”œβ”€β”€ src/
β”‚   └── Panda_Dive/
β”‚       β”œβ”€β”€ __init__.py           # Package exports
β”‚       β”œβ”€β”€ deepresearcher.py     # Main graph orchestration
β”‚       β”œβ”€β”€ configuration.py       # Pydantic configuration models
β”‚       β”œβ”€β”€ state.py               # TypedDict state definitions
β”‚       β”œβ”€β”€ prompts.py             # System prompts for LLMs
β”‚       └── utils.py               # Tool wrappers and helpers
β”œβ”€β”€ pyproject.toml                # Project configuration
β”œβ”€β”€ .env.example                  # Environment variables template
β”œβ”€β”€ AGENTS.md                     # Agent development guidelines
└── README.md                     # This file

🀝 Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes following our code style guidelines
  4. Run tests and linting (pytest and ruff)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Development Workflow

  • Follow PEP 8 and our ruff configuration
  • Add tests for new features
  • Update documentation as needed
  • Ensure type hints are complete

πŸ“„ License

This project is licensed under the MIT License.


πŸ™ Acknowledgments

Built with:


πŸ“ž Support


Made with ❀️ by PonyPan

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors