-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
3 pointsCreated by Linear-GitHub SyncCreated by Linear-GitHub SyncMedium priorityCreated by Linear-GitHub SyncCreated by Linear-GitHub SyncenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Problem
The web scraper module has hardcoded timeout and delay values that cannot be customized without modifying code. This limits flexibility for different use cases (e.g., slow networks, rate-limited APIs).
Affected Files
-
cognee/tasks/web_scraper/config.pytimeout: float = 15.0(hardcoded)
-
cognee/tasks/web_scraper/default_url_crawler.py(lines 25-26)max_crawl_delay: float = 10.0(hardcoded)timeout: float = 15.0(hardcoded)
Proposed Solution
Make these configurable via environment variables with sensible defaults:
In config.py:
import os
class WebScraperConfig:
timeout: float = float(os.getenv("WEB_SCRAPER_TIMEOUT", "15.0"))
max_crawl_delay: float = float(os.getenv("WEB_SCRAPER_MAX_DELAY", "10.0"))Update .env.template:
# Web Scraper Configuration
WEB_SCRAPER_TIMEOUT=15.0
WEB_SCRAPER_MAX_DELAY=10.0Acceptance Criteria
- Add environment variable support for timeout values
- Update
.env.templatewith new variables - Keep existing defaults (15.0 and 10.0)
- Add validation (must be positive floats)
- Update documentation to mention new env vars
- Test with custom values to verify they're respected
Benefits
- Flexibility for different network conditions
- No code changes needed to adjust timeouts
- Easier to configure for different environments (dev/staging/prod)
- Follows existing configuration patterns in Cognee
Similar Issues
Consider also making configurable:
cognee/tasks/memify/extract_usage_frequency.py-batch_size: int = 100cognee/tasks/memify/get_triplet_datapoints.py-triplets_batch_size: int = 100cognee/tasks/translation/config.py-min_text_length_for_detection: int = 10
Time Estimate
20-30 minutes
References
- Check
cognee/config/for existing patterns - Review how other modules handle environment-based config
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
3 pointsCreated by Linear-GitHub SyncCreated by Linear-GitHub SyncMedium priorityCreated by Linear-GitHub SyncCreated by Linear-GitHub SyncenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed