You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using `scrape` or `batch_scrape`, choose the right format:
334
+
335
+
-**JSON format (recommended for most cases):** Use when you need specific data from a page. Define a schema based on what you need to extract. This keeps responses small and avoids context window overflow.
336
+
-**Markdown format (use sparingly):** Only when you genuinely need the full page content, such as reading an entire article for summarization or analyzing page structure.
330
337
331
338
## Available Tools
332
339
@@ -342,38 +349,75 @@ Scrape content from a single URL with advanced options.
342
349
343
350
- Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content)
344
351
- When you're unsure which page contains the information (use search)
345
-
- When you need structured data (use extract)
346
352
347
353
**Common mistakes:**
348
354
349
355
- Using scrape for a list of URLs (use batch_scrape instead).
356
+
- Using markdown format by default (use JSON format to extract only what you need).
357
+
358
+
**Choosing the right format:**
359
+
360
+
-**JSON format (preferred):** For most use cases, use JSON format with a schema to extract only the specific data needed. This keeps responses focused and prevents context window overflow.
361
+
-**Markdown format:** Only when the task genuinely requires full page content (e.g., summarizing an entire article, analyzing page structure).
350
362
351
363
**Prompt Example:**
352
364
353
-
> "Get the content of the page at https://example.com."
365
+
> "Get the product details from https://example.com/product."
354
366
355
-
**Usage Example:**
367
+
**Usage Example (JSON format - preferred):**
356
368
357
369
```json
358
370
{
359
371
"name": "firecrawl_scrape",
360
372
"arguments": {
361
-
"url": "https://example.com",
373
+
"url": "https://example.com/product",
374
+
"formats": [{
375
+
"type": "json",
376
+
"prompt": "Extract the product information",
377
+
"schema": {
378
+
"type": "object",
379
+
"properties": {
380
+
"name": { "type": "string" },
381
+
"price": { "type": "number" },
382
+
"description": { "type": "string" }
383
+
},
384
+
"required": ["name", "price"]
385
+
}
386
+
}]
387
+
}
388
+
}
389
+
```
390
+
391
+
**Usage Example (markdown format - when full content needed):**
392
+
393
+
```json
394
+
{
395
+
"name": "firecrawl_scrape",
396
+
"arguments": {
397
+
"url": "https://example.com/article",
362
398
"formats": ["markdown"],
363
-
"onlyMainContent": true,
364
-
"waitFor": 1000,
365
-
"timeout": 30000,
366
-
"mobile": false,
367
-
"includeTags": ["article", "main"],
368
-
"excludeTags": ["nav", "footer"],
369
-
"skipTlsVerification": false
399
+
"onlyMainContent": true
370
400
}
371
401
}
372
402
```
373
403
404
+
**Usage Example (branding format - extract brand identity):**
405
+
406
+
```json
407
+
{
408
+
"name": "firecrawl_scrape",
409
+
"arguments": {
410
+
"url": "https://example.com",
411
+
"formats": ["branding"]
412
+
}
413
+
}
414
+
```
415
+
416
+
**Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication.
417
+
374
418
**Returns:**
375
419
376
-
-Markdown, HTML, or other formats as specified.
420
+
-JSON structured data, markdown, branding profile, or other formats as specified.
@@ -667,6 +711,108 @@ When using a self-hosted instance, the extraction will use your configured LLM.
667
711
}
668
712
```
669
713
714
+
### 9. Agent Tool (`firecrawl_agent`)
715
+
716
+
Autonomous web research agent. This is a separate AI agent layer that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query.
717
+
718
+
**How it works:**
719
+
720
+
The agent performs web searches, follows links, reads pages, and gathers data autonomously. This runs **asynchronously** - it returns a job ID immediately, and you poll `firecrawl_agent_status` to check when complete and retrieve results.
721
+
722
+
**Async workflow:**
723
+
724
+
1. Call `firecrawl_agent` with your prompt/schema → returns job ID
725
+
2. Do other work while the agent researches (can take minutes for complex queries)
726
+
3. Poll `firecrawl_agent_status` with the job ID to check progress
727
+
4. When status is "completed", the response includes the extracted data
728
+
729
+
**Best for:**
730
+
731
+
- Complex research tasks where you don't know the exact URLs
732
+
- Multi-source data gathering
733
+
- Finding information scattered across the web
734
+
- Tasks where you can do other work while waiting for results
735
+
736
+
**Not recommended for:**
737
+
738
+
- Simple single-page scraping where you know the URL (use scrape with JSON format - faster and cheaper)
739
+
740
+
**Arguments:**
741
+
742
+
-`prompt`: Natural language description of the data you want (required, max 10,000 characters)
743
+
-`urls`: Optional array of URLs to focus the agent on specific pages
744
+
-`schema`: Optional JSON schema for structured output
745
+
746
+
**Prompt Example:**
747
+
748
+
> "Find the founders of Firecrawl and their backgrounds"
749
+
750
+
**Usage Example (start agent, then poll for results):**
751
+
752
+
```json
753
+
{
754
+
"name": "firecrawl_agent",
755
+
"arguments": {
756
+
"prompt": "Find the top 5 AI startups founded in 2024 and their funding amounts",
757
+
"schema": {
758
+
"type": "object",
759
+
"properties": {
760
+
"startups": {
761
+
"type": "array",
762
+
"items": {
763
+
"type": "object",
764
+
"properties": {
765
+
"name": { "type": "string" },
766
+
"funding": { "type": "string" },
767
+
"founded": { "type": "string" }
768
+
}
769
+
}
770
+
}
771
+
}
772
+
}
773
+
}
774
+
}
775
+
```
776
+
777
+
Then poll with `firecrawl_agent_status` using the returned job ID.
778
+
779
+
**Usage Example (with URLs - agent focuses on specific pages):**
"prompt": "Compare the features and pricing information from these pages"
787
+
}
788
+
}
789
+
```
790
+
791
+
**Returns:**
792
+
793
+
- Job ID for status checking. Use `firecrawl_agent_status` to poll for results.
794
+
795
+
### 10. Check Agent Status (`firecrawl_agent_status`)
796
+
797
+
Check the status of an agent job and retrieve results when complete. Use this to poll for results after starting an agent.
798
+
799
+
**Polling pattern:** Agent research can take minutes for complex queries. Poll this endpoint periodically (e.g., every 10-30 seconds) until status is "completed" or "failed".
800
+
801
+
```json
802
+
{
803
+
"name": "firecrawl_agent_status",
804
+
"arguments": {
805
+
"id": "550e8400-e29b-41d4-a716-446655440000"
806
+
}
807
+
}
808
+
```
809
+
810
+
**Possible statuses:**
811
+
812
+
-`processing`: Agent is still researching - check back later
813
+
-`completed`: Research finished - response includes the extracted data
**Common mistakes:** Using scrape for a list of URLs (use batch_scrape instead). If batch scrape doesnt work, just use scrape and call it multiple times.
273
273
**Other Features:** Use 'branding' format to extract brand identity (colors, fonts, typography, spacing, UI components) for design analysis or style replication.
274
-
**Prompt Example:** "Get the content of the page at https://example.com."
275
-
**Usage Example:**
276
-
\`\`\`json
277
-
{
278
-
"name": "firecrawl_scrape",
279
-
"arguments": {
280
-
"url": "https://example.com",
281
-
"formats": ["markdown"],
282
-
"maxAge": 172800000
283
-
}
284
-
}
285
-
\`\`\`
286
-
**Performance:** Add maxAge parameter for 500% faster scrapes using cached data.
287
-
**Returns:** Markdown, HTML, or other formats as specified.
288
-
**Token Limit Issues:** If you encounter "tokens exceeds maximum allowed tokens" errors or the scraped content is too large, use the JSON format with a schema to extract only the specific data you need. This dramatically reduces output size by returning structured data instead of the full page content.
289
-
**JSON Format Example:**
274
+
275
+
**IMPORTANT - Choosing the right format:**
276
+
- **Use JSON format (default):** For most use cases, use the JSON format with a schema to extract only the specific data needed. This keeps responses small and focused. Analyze the user's query to determine what fields to extract.
277
+
- **Use markdown format (rare):** Only when the task genuinely requires the full page content, such as: reading an entire article for summarization, analyzing the full structure of a page, or when the user needs to see all the content. This is uncommon.
278
+
279
+
**Usage Example (JSON format - preferred):**
290
280
\`\`\`json
291
281
{
292
282
"name": "firecrawl_scrape",
@@ -308,6 +298,30 @@ This is the most powerful, fastest and most reliable scraper tool, if available
308
298
}
309
299
}
310
300
\`\`\`
301
+
**Usage Example (markdown format - when full content needed):**
302
+
\`\`\`json
303
+
{
304
+
"name": "firecrawl_scrape",
305
+
"arguments": {
306
+
"url": "https://example.com/article",
307
+
"formats": ["markdown"],
308
+
"onlyMainContent": true
309
+
}
310
+
}
311
+
\`\`\`
312
+
**Usage Example (branding format - extract brand identity):**
313
+
\`\`\`json
314
+
{
315
+
"name": "firecrawl_scrape",
316
+
"arguments": {
317
+
"url": "https://example.com",
318
+
"formats": ["branding"]
319
+
}
320
+
}
321
+
\`\`\`
322
+
**Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication.
323
+
**Performance:** Add maxAge parameter for 500% faster scrapes using cached data.
324
+
**Returns:** JSON structured data, markdown, branding profile, or other formats as specified.
311
325
${
312
326
SAFE_MODE
313
327
? '**Safe Mode:** Read-only content extraction. Interactive actions (click, write, executeJavascript) are disabled for security.'
@@ -646,23 +660,26 @@ Extract structured information from web pages using LLM capabilities. Supports b
646
660
server.addTool({
647
661
name: 'firecrawl_agent',
648
662
description: `
649
-
Autonomous web data gathering agent. Describe what data you want, and the agent searches, navigates, and extracts it from anywhere on the web.
663
+
Autonomous web research agent. This is a separate AI agent layer that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query. You describe what you need, and the agent figures out where to find it.
650
664
651
-
**Best for:** Complex data gathering tasks where you don't know the exact URLs; research tasks requiring multiple sources; finding data in hard-to-reach places.
652
-
**Not recommended for:** Simple single-page scraping (use scrape); when you already know the exact URL (use scrape or extract).
653
-
**Key advantages over extract:**
654
-
- No URLs required - just describe what you need
655
-
- Autonomously searches and navigates the web
656
-
- Faster and more cost-effective for complex tasks
657
-
- Higher reliability for varied queries
665
+
**How it works:** The agent performs web searches, follows links, reads pages, and gathers data autonomously. This runs **asynchronously** - it returns a job ID immediately, and you poll \`firecrawl_agent_status\` to check when complete and retrieve results.
666
+
667
+
**Async workflow:**
668
+
1. Call \`firecrawl_agent\` with your prompt/schema → returns job ID
669
+
2. Do other work while the agent researches (can take minutes for complex queries)
670
+
3. Poll \`firecrawl_agent_status\` with the job ID to check progress
671
+
4. When status is "completed", the response includes the extracted data
672
+
673
+
**Best for:** Complex research tasks where you don't know the exact URLs; multi-source data gathering; finding information scattered across the web; tasks where you can do other work while waiting.
674
+
**Not recommended for:** Simple single-page scraping where you know the URL (use scrape with JSON format instead - faster and cheaper).
658
675
659
676
**Arguments:**
660
677
- prompt: Natural language description of the data you want (required, max 10,000 characters)
661
678
- urls: Optional array of URLs to focus the agent on specific pages
662
679
- schema: Optional JSON schema for structured output
663
680
664
681
**Prompt Example:** "Find the founders of Firecrawl and their backgrounds"
665
-
**Usage Example (no URLs):**
682
+
**Usage Example (start agent, then poll for results):**
666
683
\`\`\`json
667
684
{
668
685
"name": "firecrawl_agent",
@@ -687,7 +704,9 @@ Autonomous web data gathering agent. Describe what data you want, and the agent
687
704
}
688
705
}
689
706
\`\`\`
690
-
**Usage Example (with URLs):**
707
+
Then poll with \`firecrawl_agent_status\` using the returned job ID.
708
+
709
+
**Usage Example (with URLs - agent focuses on specific pages):**
691
710
\`\`\`json
692
711
{
693
712
"name": "firecrawl_agent",
@@ -697,7 +716,7 @@ Autonomous web data gathering agent. Describe what data you want, and the agent
697
716
}
698
717
}
699
718
\`\`\`
700
-
**Returns:** Extracted data matching your prompt/schema, plus credits used.
719
+
**Returns:** Job ID for status checking. Use \`firecrawl_agent_status\` to poll for results.
701
720
`,
702
721
parameters: z.object({
703
722
prompt: z.string().min(1).max(10000),
@@ -719,7 +738,7 @@ Autonomous web data gathering agent. Describe what data you want, and the agent
@@ -730,7 +749,9 @@ Autonomous web data gathering agent. Describe what data you want, and the agent
730
749
server.addTool({
731
750
name: 'firecrawl_agent_status',
732
751
description: `
733
-
Check the status of an agent job.
752
+
Check the status of an agent job and retrieve results when complete. Use this to poll for results after starting an agent with \`firecrawl_agent\`.
753
+
754
+
**Polling pattern:** Agent research can take minutes for complex queries. Poll this endpoint periodically (e.g., every 10-30 seconds) until status is "completed" or "failed".
734
755
735
756
**Usage Example:**
736
757
\`\`\`json
@@ -742,8 +763,8 @@ Check the status of an agent job.
742
763
}
743
764
\`\`\`
744
765
**Possible statuses:**
745
-
- processing: Agent is still working
746
-
- completed: Extraction finished successfully
766
+
- processing: Agent is still researching - check back later
767
+
- completed: Research finished - response includes the extracted data
747
768
- failed: An error occurred
748
769
749
770
**Returns:** Status, progress, and results (if completed) of the agent job.
0 commit comments