ι’εζ·±εΊ¦ζη΄’ε·₯ε · - Deep Domain Research Tool
A powerful multi-agent deep research tool built with LangGraph and LangChain. Panda_Dive orchestrates multiple researcher agents to comprehensively explore any domain, synthesize findings, and generate detailed reports with retrieval quality safeguards.
- Features
- Recent Updates
- Showcase
- Architecture
- Installation
- Quick Start
- Configuration
- How It Works
- Documentation
- Evaluation
- Development
- Project Structure
- Contributing
- License
- Support
- Supervisory Agent: Intelligently delegates research tasks to multiple specialized researcher agents
- Concurrent Execution: Run up to 20 research tasks in parallel for maximum efficiency
- Dynamic Task Delegation: The supervisor adapts based on research progress and findings
Panda_Dive supports multiple LLM providers out of the box:
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude 3.5, Claude 3)
- DeepSeek (DeepSeek V3)
- Google (VertexAI, GenAI)
- Groq (Llama, Mixtral)
- AWS Bedrock
Configure different models for different research stages:
- Research queries
- Information compression
- Summarization
- Final report generation
- Automatic Truncation: Intelligently handles token limit errors
- Retry Logic: Robust retry mechanism for failed tool calls
- Context Optimization: Compresses research findings to stay within limits
- MCP Integration: Extend tools via Model Context Protocol
- LangSmith Tracing: Full observability and debugging support
- Multiple Search APIs: Tavily, DuckDuckGo, Exa, ArXiv (DuckDuckGo is now the default - privacy-friendly and no API key required)
- Query Rewriting: Expand queries to improve recall (supports both Tavily and DuckDuckGo)
- Relevance Scoring: Score each result on a 0.0-1.0 scale
- Reranking: Prioritize higher-quality sources before synthesis
- Robust Error Handling: Graceful handling of connection issues for DuckDuckGo searches
- Added a polished local frontend demo view for Panda_Dive research workflows
- Added a complete sample context research report for quick output reference
- Updated README with visual showcase and direct links to example assets
- Sample report: example/context research report.md
Preview topic: Systematic Investigation of Context in LLM-based Agent Systems
The sample report demonstrates:
- Conceptual overview and context taxonomy
- Design patterns (dispatcher, state channels, event sourcing)
- Multi-agent context lifecycle, trade-offs, and failure modes
- Open challenges and research directions for 2025-2026
Panda_Dive uses a sophisticated multi-agent graph architecture with three hierarchical layers: Main Graph (entry point), Supervisor Subgraph (orchestration), and Researcher Subgraph (execution).
Entry point handling user interaction, research brief generation, and final report synthesis:
graph TD
START([START]) --> CLARIFY[clarify_with_user]
CLARIFY --"need_clarification=True"--> USER["π Return to User<br/>with question"]
USER -->|User response| CLARIFY
CLARIFY --"need_clarification=False"--> BRIEF[write_research_brief]
BRIEF --> SUPERVISOR["π§© research_supervisor<br/>Subgraph Entry"]
SUPERVISOR -->|All research<br/>completed| REPORT[final_report_generation]
REPORT --> END([END])
style START fill:#e1f5ff,stroke:#333,stroke-width:2px
style END fill:#e1f5ff,stroke:#333,stroke-width:2px
style CLARIFY fill:#fff3cd,stroke:#333
style BRIEF fill:#d4edda,stroke:#333
style SUPERVISOR fill:#f8dce0,stroke:#333,stroke-width:3px
style REPORT fill:#cce5ff,stroke:#333
style USER fill:#fff3cd,stroke:#666,stroke-dasharray: 5 5
Orchestrates parallel research by dynamically spawning researcher subgraphs:
graph TB
subgraph SUPERVISOR["π§© Supervisor Subgraph"]
START_S([START]) --> S[supervisor<br/>Lead Researcher]
S --> ST{supervisor_tools<br/>Tool Router}
%% Tool executions
ST -->|think_tool| THINK["π Strategic Reflection"]
THINK --> S
ST -->|ConductResearch| SPAWN["π Dynamic Subgraph Spawning"]
%% Dynamic spawning detail
subgraph DYNAMIC["π Dynamic Concurrency Control"]
SPAWN --> CHECK{"Within<br/>max_concurrent<br/>limit?"}
CHECK -->|Yes| RESEARCHER["π§© researcher_subgraph<br/>(Instance N)"]
CHECK -->|No| OVERFLOW["β οΈ Overflow:<br/>Skip with error"]
RESEARCHER -->|async gather| COLLECT["π Collect Results"]
OVERFLOW --> COLLECT
end
COLLECT --> UPDATE["π Update State:<br/>β’ notes<br/>β’ raw_notes"]
UPDATE --> S
ST -->|ResearchComplete| DONE_S[Done]
%% Loop conditions
ST -.->|Iterations <<br/>max_researcher<br/>_iterations| S
end
style START_S fill:#e1f5ff
style DONE_S fill:#d4edda
style S fill:#f8dce0,stroke:#333,stroke-width:3px
style ST fill:#fff3cd,stroke:#333
style SPAWN fill:#d4edda,stroke:#333,stroke-width:2px
style DYNAMIC fill:#f0f8ff,stroke:#666,stroke-dasharray: 3 3
style RESEARCHER fill:#cce5ff,stroke:#333
Executes individual research tasks with the 6-step retrieval quality loop:
graph TB
subgraph RESEARCHER["π§© Researcher Subgraph"]
START_R([START]) --> R[researcher<br/>Research Assistant]
R --> RT{researcher_tools<br/>Tool Router}
%% Tool executions
RT -->|think_tool| THINK_R["π Strategic<br/>Reflection"]
THINK_R --> R
RT -->|Search Tool| RQL["π― Retrieval Quality Loop"]
%% Retrieval Quality Loop detail
subgraph RQL_DETAIL["π Query β Results β Score β Rerank"]
RQL --> REWRITE["1οΈβ£ Query Rewriting<br/>Generate N variants"]
REWRITE --> SEARCH["2οΈβ£ Search Execution<br/>tavily/duckduckgo"]
SEARCH --> PARSE["3οΈβ£ Result Parsing<br/>β Structured dicts"]
PARSE --> SCORE["4οΈβ£ Relevance Scoring<br/>LLM: 0.0-1.0"]
SCORE --> RERANK["5οΈβ£ Reranking<br/>+ Source weight"]
RERANK --> FORMAT["6οΈβ£ Format Results<br/>For researcher"]
%% State tracking
STATE["π State Tracking:<br/>β’ rewritten_queries<br/>β’ relevance_scores<br/>β’ reranked_results<br/>β’ quality_notes"]
end
FORMAT --> UPDATE_R["π Update State"]
UPDATE_R --> R
RT -->|MCP Tools| MCP["π§ MCP Tools<br/>(Dynamic Loading)"]
MCP --> R
RT -->|ResearchComplete| COMPRESS[compress_research]
COMPRESS --> DONE_R[Done]
%% Loop conditions
RT -.->|tool_calls <<br/>max_react<br/>_tool_calls| R
end
style START_R fill:#e1f5ff
style DONE_R fill:#d4edda
style R fill:#cce5ff,stroke:#333,stroke-width:3px
style RT fill:#fff3cd,stroke:#333
style RQL fill:#f8dce0,stroke:#333,stroke-width:2px
style RQL_DETAIL fill:#fff5f5,stroke:#666,stroke-dasharray: 3 3
style RESEARCHER fill:#cce5ff,stroke:#333
style STATE fill:#f0f8ff,stroke:#999
| Layer | Components | Key Features |
|---|---|---|
| Main Graph | clarify_with_user, write_research_brief, research_supervisor, final_report_generation |
User interaction, clarification loop, brief generation, report synthesis |
| Supervisor Subgraph | supervisor, supervisor_tools, Dynamic Spawning |
Parallel research orchestration, concurrency control (max_concurrent_research_units), async subgraph spawning |
| Researcher Subgraph | researcher, researcher_tools, Retrieval Quality Loop, compress_research |
Individual research execution, 6-step retrieval quality (rewrite β search β parse β score β rerank β format), MCP integration |
User Query
β Main Graph (Clarification β Brief)
β Supervisor Subgraph (Parallel delegation)
β Researcher Subgraph Instance 1 (Quality Loop)
β Researcher Subgraph Instance 2 (Quality Loop)
β Researcher Subgraph Instance N (Quality Loop)
β Main Graph (Synthesis β Report)
β User
Each researcher subgraph executes the full retrieval quality loop: Query Rewriting β Search Execution β Result Parsing β Relevance Scoring β Reranking β Result Formatting, with all metrics tracked in state for observability.
- Python 3.11 or higher
- API keys for your chosen LLM provider(s)
- (Optional) Tavily API key if using Tavily search (DuckDuckGo requires no API key)
# Clone the repository
git clone https://github.com/123yongming/Panda_Dive.git
cd Panda_Dive# Create virtual environment with uv
uv venv
source .venv/bin/activate
# Install dependencies
uv sync# Create virtual environment
python -m venv venv
.\venv\Scripts\Activate
# Install uv and dependencies
pip install uv
uv pip install -r pyproject.toml# Create virtual environment
python -m venv .venv
# Activate (Linux/macOS: source .venv/bin/activate, Windows: .venv\Scripts\activate)
source .venv/bin/activate
# Install in editable mode
pip install -e .Copy the example environment file and configure your API keys:
# Linux/macOS
cp .env.example .env
# Windows
copy .env.example .envEdit .env with your credentials:
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key
TAVILY_API_KEY=your_tavily_key
LANGSMITH_API_KEY=your_langsmith_key
LANGSMITH_PROJECT=panda_divefrom Panda_Dive import Configuration, deep_researcher
from langchain_core.messages import HumanMessage
# Configure the researcher (DuckDuckGo is default - no API key needed!)
config = Configuration(
max_researcher_iterations=6,
max_concurrent_research_units=4,
allow_clarification=True,
model="openai:gpt-4o"
)
# Start research
topic = "What are the latest developments in quantum computing?"
result = deep_researcher.invoke(
{"messages": [HumanMessage(content=topic)]},
config=config.to_runnable_config()
)
print(result["messages"][-1].content)You can also run Panda_Dive as a LangGraph development server:
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking --host 0.0.0.0 --port 2026This will start the development server on http://localhost:2026 with in-memory storage, allowing you to interact with the deep researcher through the LangSmith UI.
| Parameter | Type | Default | Description |
|---|---|---|---|
search_api |
str | "duckduckgo" |
Search API to use: duckduckgo (default), tavily, exa, arxiv, or none |
max_researcher_iterations |
int | 6 |
Maximum iterations per researcher (1-10) |
max_react_tool_calls |
int | 6 |
Maximum tool calls per reaction (1-30) |
max_concurrent_research_units |
int | 4 |
Parallel research tasks (1-20) |
allow_clarification |
bool | True |
Ask clarifying questions before research |
model |
str | "openai:gpt-4o" |
Default model for research |
query_variants |
int | 3 |
Number of query variants for retrieval quality |
relevance_threshold |
float | 0.7 |
Minimum relevance score threshold |
rerank_top_k |
int | 10 |
Number of documents after reranking |
rerank_weight_source |
str | "auto" |
Source weighting strategy for reranking |
-
Clarification (Optional)
- Asks clarifying questions to understand research scope
- User can confirm or modify the research brief
-
Research Brief Generation
- Creates a structured brief based on the topic
- Identifies key areas to investigate
-
Supervised Research
- Supervisor delegates specific research tasks
- Multiple researcher agents work in parallel
- Each researcher explores their assigned subtopic
-
Research Synthesis
- Compresses individual findings to fit context
- Synthesizes cross-cutting insights
-
Final Report
- Generates comprehensive, well-structured report
- Includes citations and sources
Panda_Dive includes a comprehensive evaluation framework using LangSmith to benchmark the deep research system against the "Deep Research Bench" dataset.
Before running evaluations, ensure these environment variables are set:
| Variable | Required | Description |
|---|---|---|
LANGSMITH_API_KEY |
Yes | LangSmith API key for evaluation tracking |
OPENAI_API_KEY |
No* | OpenAI API key (if using OpenAI models) |
ANTHROPIC_API_KEY |
No* | Anthropic API key (if using Claude models) |
DEEPSEEK_API_KEY |
No* | DeepSeek API key (if using DeepSeek models) |
*Required only if using the respective provider's models.
Run a quick smoke test on 2 examples to validate the setup:
# Basic smoke test (2 examples, default settings)
python tests/run_evaluate.py --smoke --dataset-name "deep_research_bench"
This evaluation measures both intended parallelism (tool-call count) and observed parallelism (span overlap) for the supervisor.
# Create the dataset (one-time setup)
python tests/create_supervisor_parallelism_dataset.py \
--dataset-name "Panda_Dive: Supervisor Parallelism" \
--source tests/prompt/supervisor_parallelism.jsonl
# Run the evaluation
python tests/run_evaluate.py \
--dataset-name "Panda_Dive: Supervisor Parallelism" \
--max-concurrency 1 \
--experiment-prefix "supervisor-parallel"Metrics produced:
tool_call_count_match: Whether actual tool calls match the reference countparallel_overlap_ms: Total overlap time (ms) across trace spans
Run a full evaluation on the entire dataset (
# Full evaluation (all dataset examples)
python tests/run_evaluate.py --full
| Flag | Default | Description |
|---|---|---|
--smoke |
- | Run smoke test (2 examples) |
--full |
- | Run full evaluation (all examples) |
--dataset-name |
"Deep Research Bench" | Dataset name in LangSmith |
--max-examples |
2 (smoke) / all (full) | Maximum examples to evaluate |
--experiment-prefix |
Auto-generated | Prefix for experiment name |
--max-concurrency |
2 | Maximum concurrent evaluations (max: 5) |
--timeout-seconds |
1800 | Per-example timeout (seconds) |
--model |
From env/config | Model to use for evaluation |
- Run a smoke test first to validate setup
- Monitor LangSmith during the run
- Start with lower concurrency to control costs
After evaluation, export results to JSONL format:
# Export results using experiment project name
python tests/extract_langsmith_data.py \
--project-name "deep-research-eval-smoke-20250204-120000" \
--model-name "gpt-4o" \
--output-dir tests/expt_results/
# Force overwrite if file exists
python tests/extract_langsmith_data.py \
--project-name "your-experiment-name" \
--model-name "claude-3-5-sonnet" \
--force| Flag | Required | Default | Description |
|---|---|---|---|
--project-name |
Yes | - | LangSmith project name containing the experiment runs |
--model-name |
Yes | - | Model name (used for output filename) |
--dataset-name |
No | "Deep Research Bench" | Dataset name for validation |
--output-dir |
No | tests/expt_results/ |
Output directory for JSONL file |
--force |
No | false |
Overwrite existing file if it exists |
# Run all tests
python -m pytest
# Run with verbose output
python -m pytest -v
# Run with coverage
python -m pytest --cov=Panda_Dive
# Run specific test
python -m pytest src/test_api.py::test_function_name# Check code style
ruff check .
# Auto-fix issues
ruff check --fix .
# Type checking
mypy src/Panda_Dive/- Python 3.10+ type hints (e.g.,
list[str], notList[str]) - Google-style docstrings
- Async/await patterns for all graph nodes
- Proper error handling and logging
See AGENTS.md for detailed development guidelines.
Panda_Dive/
βββ docs/
β βββ retrieval-quality-loop.md # Retrieval quality loop report
βββ src/
β βββ Panda_Dive/
β βββ __init__.py # Package exports
β βββ deepresearcher.py # Main graph orchestration
β βββ configuration.py # Pydantic configuration models
β βββ state.py # TypedDict state definitions
β βββ prompts.py # System prompts for LLMs
β βββ utils.py # Tool wrappers and helpers
βββ pyproject.toml # Project configuration
βββ .env.example # Environment variables template
βββ AGENTS.md # Agent development guidelines
βββ README.md # This file
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes following our code style guidelines
- Run tests and linting (
pytestandruff) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 and our ruff configuration
- Add tests for new features
- Update documentation as needed
- Ensure type hints are complete
This project is licensed under the MIT License.
Built with:
- LangGraph - Graph-based orchestration
- LangChain - LLM application framework
- Pydantic - Data validation
- π Read the AGENTS.md for development guidelines
- π Report issues on GitHub Issues
- π¬ Explore more projects by PonyPan
Made with β€οΈ by PonyPan
