Releases: JuaanReis/pepeScraper
speed?
pepeScraper - Sports car
Okay, I know the title is kind of confusing, this version has more search options and an absurd speed (I could say it's in the blink of an eye, but the human average is... oh fuck you, you don't even care about that information).
Flags
Now this contains the flags:
--title, -t <w>: Apply the keyword to the thread title.
It's like the search engine for Google or YouTube, or other video sites (now, which videos are up to you).
--proxy, -p <w>: Applies the proxy to the requests.
This is used to mask requests (I don't know why anyone would activate this... what are you trying to hide?)
--log <w>: It saves a very detailed log of your search by saving the file to the path passed in the flag.
I did this to spy on you, but someone told me not to leave it on automatic mode (my readme file says it doesn't save anything, so I won't contradict that).
--all-boards, -ab: Shows a detailed table of the boards available on 4chan.
The car board is really cool. (There's no point in pretending you don't go to the +18 boards.)
Configuration file
-
config.py: This file has various extra settings available that can be modified, from the output color to the program's final speed, so if you're a developer or just a daredevil, feel free to look for whatever you want there. (It is found in: project root.)
-
config_net.py: This file also modifies client-side behavior in requests, so you can either make the program faster or render it unusable; only modify it with the help of the chatGPT. (It is found in: ./src/network/config_net.py)
Speed
In controlled tests (by no one) on an optimized version of PepeScraper, the program reached 215 req/s; try to do better and fail miserably.
- Maximum speed: 215 req/s
- Total: 151 requests in 0.70 seconds
first version
pepeScreper - a new scraper to 4chan (faster and more reliable)
This project aims to help users find specific content on 4chan quickly and easily. If you'd like (not that I'm begging), please give the project a star; maybe someone important will notice it and I'll finally find a worthwhile project.
Search speed
pepeScreper is fast because it uses HTTP/2 with the httpx library, threads from the poolThreadExecutor library, and extensive optimization in every part of the code.
Post search
PepeScreper uses the 4chan API for these searches; perhaps in future releases it will have a real HTML scraper (I don't know if it's slower).
Limits
In all my tests, the fastest time was 1.52 seconds for a thread, with an average of 132 requests/second, perhaps reaching the limit of Python or the API (maybe my code isn't that good). Don't do too much research (the API might return a 429 error), and be careful with the program's usage (criminal content, for example). If your laptop is weak, don't use such high thread counts (my laptop crashed).