Skip to content

Metrics: Cross-reference with other datasets (popularity indicator) #2564

@lukepayyapilli

Description

@lukepayyapilli

Context

Related to #2336 (Metrics for downloads) and #2563 (tracking start date). Raw view counts lack context - is 1,000 views good or bad? Comparing against similar datasets would provide meaningful context.

Problem

  • Absolute view counts don't indicate relative popularity
  • A new dataset with 500 views in 1 month may be more "popular" than an old dataset with 2,000 views over 5 years
  • Users have no way to gauge how a dataset compares to others in its category

Proposed Approach

Popularity Indicator

  • Calculate a "popularity score" relative to datasets of similar age or category
  • Display a subtle indicator (e.g., "Popular", "Highly viewed", or percentile ranking)
  • Only show for datasets that meet a threshold (avoid labeling everything)

Calculation Options

  1. Age-normalized views: Views per month since publication
  2. Percentile ranking: "More views than X% of datasets published in the same year"
  3. Category comparison: Compare within resource type (database, software, challenge)

Implementation Details

  • Add method to calculate relative popularity
  • Cache the calculation to avoid expensive queries on every page load
  • Update periodically (e.g., daily) via management command or celery task
  • Display only for datasets above a threshold (e.g., top 25%)

Alternatives Considered

  1. Show raw percentile: "87th percentile" - precise but may not resonate with users
  2. Star rating visual: 1-5 stars based on popularity - familiar but may imply quality judgment
  3. No indicator, just comparison table: Show how this dataset ranks among similar ones - more information but more complex UI
  4. Trending indicator: Show if views are increasing - different metric but could complement

Design Considerations

  • Must not imply quality or endorsement, only popularity
  • Should account for the tracking start date issue (Metrics: Display view count tracking start date #2563)
  • Consider whether to compare across all datasets or within categories
  • Badge/label should be subtle and not dominate the page

Acceptance Criteria

  • Popularity is calculated relative to comparable datasets
  • Indicator is displayed for datasets meeting threshold
  • Calculation accounts for dataset age and tracking limitations
  • Clear that this indicates views, not quality
  • Performance is acceptable (caching implemented if needed)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions