-
-
Notifications
You must be signed in to change notification settings - Fork 190
gallery-dl support in page parsers #1733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Looks nice, but hydrus internals are already cryptic enough, so it would be nice if you could add a lot more inline comments about why you did things. |
|
I think there are these points worth noting
For my usecase, only issue with x.com is the scraping of the tweet pages being blocked, raw file URLs seem unaffected (for right now, I have yet to encounter any rate limits/bans) |
|
OK other issue I found is gallery-dl uses But doesnt seem to be used a lot of places so could be amenable to a PR over there for |
|
The lengths people will go to avoid using hydownloader... 😔 You probably shouldn't read the configuration from ~ though, but from inside the Hydrus folder instead, as is done for mpv. The current way makes Hydrus no longer fully portable by moving the folder (and people might not want their general gallery-dl settings affecting Hydrus anyway). |
|
Yeah youre right thank you, updated the config loading like For the time being I dont think this should be merged until I can figure out how to terminate longrunning gallery-dl jobs, may need PR and wait for release on their end
But this is the "simpler, less powerful solution" in the hydownloader README I found for myself ... 😃 That being said, I just want backup solution so my subscriptions arent borked. The hydownloader docs are really verbose and ward off users saying "you need time and investment to learn this whole thing" when all I wanted was not a better hydrus downloader overall, but a hydrus downloader that wasnt broken for the sites I want. I shouldnt have to jump through all those hoops to gain back a single feature already supposed to be there thats critical to the whole app Assuming gallery-dl can play well enough with hydrus in the future, I can now fix the page parsing issues myself within hydrus, this is a useful power granted to all hydrus users not there before Its the difference between going from 0 -> 1 and 1 -> 10, the 10 is rewarding with the time and patience spent but I'm happy with the 1. Just a difference in philosophy between users of both programs is all ! |
|
OK another issue I found gallery-dl config is shared global to all jobs, may cause unexpected behavior if user updates config later Plus I wanna have custom parser-based config (e.g. for parser test URL fetch, grab only small range of posts) but this seems not possible right now So this requires change in gallery-dl |
|
Should also mention gallery-dl behavior changes depending on user config, this will cause issues in parser reproducibility So idea to solve is:
|
|
Neat! |
|
I do strongly support some way to add fully python downloaders to hydrus without having to run an entire other api-connected app. this would open the door to yt-dlp, gallery-dl, advanced beautifulsoup parsers, advanced authentication (eg cookies, js challenges) and so much more. |

This change is rather large but restored the x.com fetching single pages
For page parsers, adds new "downloader type" field
User have the option to select "gallery-dl" or "hydrus"
For gallery-dl option, it uses their JSON output and sends to the parser:
So users can use hydrus' own parser to extract the needed details
gallery-dl is added as dependency to
requirements.txtso will be installed alongside hydrushydrus instance of gallery-dl uses systemnow configured as local to hydrus likegallery-dl.confloaded at startupmpv.conf. for x.com settings stored in browser this will load the cookies{ "cookies": ["chromium", "", "", "", ""], }Networking jobs for files/galleries/subscriptions were updated to account for gallery-dl parsers. Current code does not ship any gallery-dl parsers and they should be made by users now
I believe it is a start but for things like archiving whole x.com timelines, rate-limiting is still an issue, so there needs to be way to send post range as "next gallery URL" but somehow as options to gallery-dl job