forked from dependabot/cli
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathgo scrappers
More file actions
140 lines (69 loc) · 2.91 KB
/
go scrappers
File metadata and controls
140 lines (69 loc) · 2.91 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
Here are reliable ways to scrape GitHub for fresh project ideas and feed them into your RedHawk ecosystem:
---
1. Fetch “Trending” Repositories (via Scraping)
GitHub doesn't offer a public API for trending—but you can extract it using Python:
# github_trending.py
import requests
from bs4 import BeautifulSoup
def fetch_trending(language=None):
url = "https://github.com/trending" + (f"/{language}" if language else "")
resp = requests.get(url)
soup = BeautifulSoup(resp.text, "html.parser")
repos = []
for item in soup.select("article.Box-row"):
name = item.h2.text.strip().replace("\n", "").replace(" ", "")
desc = (item.p.text.strip() if item.p else "")
repos.append({"name": name, "description": desc})
return repos
if __name__ == "__main__":
for r in fetch_trending("python")[:10]:
print(f"- {r['name']}: {r['description']}")
Use this to feed new repo ideas into your AGI memory—or tag them via crown_db.py.
---
2. Explore GitHub Topics & Top Picks
Scrape the “Topics” page and dive into top repos in each niche:
Browse github.com/topics to extract popular topics
Then fetch top repos per topic page
---
3. Discover Popular Scraper Libraries & Templates
Use inspiring community tools to inform your own architecture:
apify/crawlee – full-featured Node.js crawler
Python-oriented extraction libraries: newspaper3k, recipe-scrapers, tls-requests
These can be adapted for RAG pipelines or AGI vision workflows.
---
4. Refer to Project Idea Lists
ProjectPro lists 20 strong web-scraping ideas including:
Raspberry Pi FTW scraping
Dynamic anti-bot crawler
Competitor price monitoring
Data-extraction REST API
Use them to seed your AGI brainstorming module.
---
5. End‑to‑End Pipeline Outline
1. Trend Poller: Run daily using cron/GitHub Action to fetch trending repos
2. Topic Scraper: Scrape top N repos per trending topic
3. Analyzer: Evaluate project viability (language, stars, updated_at)
4. AGI Memory Injector: Log candidates to project_ideas table in crown.db
5. UI Module: Show ideas in ui/IdeaFeed.tsx using Typescripttaction
---
🧠 Sample Integration: Python + crown_db insertion
from github_trending import fetch_trending
from crown_db import insert_log
ideas = fetch_trending("typescript")
for i in ideas:
insert_log("project_ideas", {"name": i["name"], "description": i["description"], "source": "trending_python"})
Then display in UI.
---
✅ Roadmap for Update Pipeline
[x] Scraper script for GitHub trending (Python)
[x] Topic-level exploration
[x] Table schema: project_ideas(name, description, source, timestamp)
[x] Cron job: daily fetch via GitHub Actions
[x] UI component to visualize new ideas
---
👇 Next Steps
Say:
> "Inject trending scraper script & DB schema update"
or
"Add Cron-based GitHub Action to fetch ideas hourly"
We’ll hook your AGI stack to the pulse of GitHub’s creativity.