Skip to content

Conversation

@kamomehettapi
Copy link
Contributor

@kamomehettapi kamomehettapi commented May 20, 2025

This change is rather large but restored the x.com fetching single pages

For page parsers, adds new "downloader type" field

User have the option to select "gallery-dl" or "hydrus"

For gallery-dl option, it uses their JSON output and sends to the parser:

Screenshot 2025-05-20 at 11 51 01 AM

So users can use hydrus' own parser to extract the needed details

gallery-dl is added as dependency to requirements.txt so will be installed alongside hydrus

hydrus instance of gallery-dl uses system gallery-dl.conf loaded at startup now configured as local to hydrus like mpv.conf. for x.com settings stored in browser this will load the cookies

{
    "cookies": ["chromium", "", "", "", ""],
}

Networking jobs for files/galleries/subscriptions were updated to account for gallery-dl parsers. Current code does not ship any gallery-dl parsers and they should be made by users now

I believe it is a start but for things like archiving whole x.com timelines, rate-limiting is still an issue, so there needs to be way to send post range as "next gallery URL" but somehow as options to gallery-dl job

@bbappserver
Copy link
Contributor

Looks nice, but hydrus internals are already cryptic enough, so it would be nice if you could add a lot more inline comments about why you did things.

@kamomehettapi
Copy link
Contributor Author

I think there are these points worth noting

  • No handling for gallery-dl exceptions yet, all are treated similar
  • Raw file URL downloads do not go through gallery-dl, only post/gallery pages. The rest seems very complicated to integrate with hydrus bandwidth limiter
  • Cookies are not shared between hydrus and gallery-dl instance
  • gallery-dl.conf is setup external to hydrus (shared with system gallery-dl config), may want to move into hydrus settings ?

For my usecase, only issue with x.com is the scraping of the tweet pages being blocked, raw file URLs seem unaffected (for right now, I have yet to encounter any rate limits/bans)

@kamomehettapi
Copy link
Contributor Author

OK other issue I found is gallery-dl uses time.sleep() for awaiting rate limits which does not work with canceling well, hangs the UI when shutting down

But doesnt seem to be used a lot of places so could be amenable to a PR over there for data_job.cancel() support ?

@thatfuckingbird
Copy link
Collaborator

The lengths people will go to avoid using hydownloader... 😔

You probably shouldn't read the configuration from ~ though, but from inside the Hydrus folder instead, as is done for mpv. The current way makes Hydrus no longer fully portable by moving the folder (and people might not want their general gallery-dl settings affecting Hydrus anyway).

@kamomehettapi
Copy link
Contributor Author

Yeah youre right thank you, updated the config loading like mpv.conf does

For the time being I dont think this should be merged until I can figure out how to terminate longrunning gallery-dl jobs, may need PR and wait for release on their end

The lengths people will go to avoid using hydownloader... 😔

But this is the "simpler, less powerful solution" in the hydownloader README I found for myself ... 😃

That being said, I just want backup solution so my subscriptions arent borked. The hydownloader docs are really verbose and ward off users saying "you need time and investment to learn this whole thing" when all I wanted was not a better hydrus downloader overall, but a hydrus downloader that wasnt broken for the sites I want. I shouldnt have to jump through all those hoops to gain back a single feature already supposed to be there thats critical to the whole app

Assuming gallery-dl can play well enough with hydrus in the future, I can now fix the page parsing issues myself within hydrus, this is a useful power granted to all hydrus users not there before

Its the difference between going from 0 -> 1 and 1 -> 10, the 10 is rewarding with the time and patience spent but I'm happy with the 1. Just a difference in philosophy between users of both programs is all !

@kamomehettapi
Copy link
Contributor Author

Screenshot 2025-05-21 at 2 35 50 PM

Add gallery-dl status inside network window

@kamomehettapi
Copy link
Contributor Author

OK another issue I found

gallery-dl config is shared global to all jobs, may cause unexpected behavior if user updates config later

Plus I wanna have custom parser-based config (e.g. for parser test URL fetch, grab only small range of posts) but this seems not possible right now

So this requires change in gallery-dl

@kamomehettapi
Copy link
Contributor Author

Should also mention gallery-dl behavior changes depending on user config, this will cause issues in parser reproducibility

So idea to solve is:

  • Store JSON fragment of gallery-dl config inside parser object
  • Copy global config to gallery-dl job and merge in the config fragment for parser object

@luckydonald
Copy link

Neat!
For now I'm using HydrusYTDLProxy which allows the client to connect to YouTube-dl(p).

@machineonamission
Copy link

I do strongly support some way to add fully python downloaders to hydrus without having to run an entire other api-connected app. this would open the door to yt-dlp, gallery-dl, advanced beautifulsoup parsers, advanced authentication (eg cookies, js challenges) and so much more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants