Skip to content

[Enhancement] In need of better cache-to-game-selection options in config #168

@SineSwiper

Description

@SineSwiper

There is not a lot of automation-friendly control over how scrapers select which games to update. IMO, this is critical when many of the scrapers only allow limited access, and large collections could take days to update.

Looking at the current options, while includeFrom/excludeFrom allows for scripted hooks from other sources (such as cache reports), there's not a lot of other controls built into the config or CLI for, say, a daily cronjob or some other regular automation.

The only one that is useful here is onlyMissing, and the docs don't explain what is missing that this flag/config would skip. Missing cache data for a specific game/scraper combo? Missing any cache data for a specific game? Or does it poll any game that doesn't have 100% of its data set? Does it record the last time it polled a specific game/scraper combo and not poll it again? All of these questions could turn into useful config options.

The main problem I'd like to solve is not wasting a scraper's time against a platform/game/scraper combo I've already tried. If I tried it yesterday and scraped data, it shouldn't try that game again today.

Based on this, I think an expansion of onlyMissing (from a boolean to an multi-value) could fill these gaps:

  • scrapercache: Skip games that have cached data for that specific platform/game/scraper combo.
  • gamecache: Skip games that have cached data for that specific platform/game combo, as long as it has cached data from ANY scraper.

Another option could enhance onlyMissing, capturing which fields are considered hits to skip. For example:

requiredCacheHitFields="title,description,tags,screenshot,releasedate"
requiredCacheHitFields="title,description,tags"
requiredCacheHitFields="none"

That last one is important to skip games that Skyscraper has hit before with the specific platform/game/scraper combo, even if it was a missed hit previously:

onlyMissing="scrapercache"
requiredCacheHitFields="none"

The others would skip a game if the title + description + tags are all filled in (in the second example).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions