Skip to content

v2: Remove Custom field and enhance Extensions for custom elements#251

Merged
mmcdole merged 3 commits intov2from
v2-custom-elements
May 26, 2025
Merged

v2: Remove Custom field and enhance Extensions for custom elements#251
mmcdole merged 3 commits intov2from
v2-custom-elements

Conversation

@mmcdole
Copy link
Owner

@mmcdole mmcdole commented May 26, 2025

Summary

This PR removes the Item.Custom field entirely and enhances the existing Extensions field to handle all custom and non-standard elements with full structural support including attributes and nested elements. Custom element support has been added to both RSS and Atom parsers.

Changes

  • Remove Custom field from gofeed.Feed and gofeed.Item structs
  • Remove Custom field from rss.Item struct
  • Update RSS parser to add non-namespaced elements to Extensions under "_custom" namespace
  • Update Atom parser to add non-namespaced elements to Extensions under "_custom" namespace
  • Add support for custom elements at both channel/feed and item/entry level in both formats
  • Add convenient helper methods for accessing extensions
  • Fix RSS parser to properly handle RDF structural elements
  • Add comprehensive test coverage for both RSS and Atom custom elements

Breaking Changes

  • Item.Custom field removed completely
  • Feed.Custom field removed completely (if it existed)

Migration Guide

// Old way
value := item.Custom["myField"]

// New way - simple migration
value := item.GetCustomValue("myField")

// New way - with full access to attributes
exts := item.GetExtension("_custom", "myField")
if len(exts) > 0 {
    value = exts[0].Value
    attrs = exts[0].Attrs
}

Benefits

  • Custom elements now support attributes (fixes the limitation of map[string]string)
  • Custom elements can have nested structure
  • Unified approach - all non-standard elements go through Extensions in both RSS and Atom
  • Better namespace safety with "_custom" prefix to avoid conflicts
  • Consistent behavior across all feed formats
  • Better type safety with the Extension struct
  • Backward compatibility through helper methods

Test Coverage

Added comprehensive tests for:

  • RSS custom elements with attributes
  • RSS custom elements alongside namespaced extensions
  • RSS custom elements with CDATA content
  • RSS multiple custom elements with the same name
  • RSS feed-level custom elements
  • Atom feed-level custom elements
  • Atom entry-level custom elements
  • Atom multiple custom elements with same name
  • Helper method functionality

Issues Fixed

Example Usage

// Parse a feed with custom elements (works for both RSS and Atom)
feed, _ := parser.ParseURL("https://ctftime.org/event/list/upcoming/rss/")

// Access custom elements in RSS
for _, item := range feed.Items {
    weight := item.GetCustomValue("weight")           // "54.67"
    format := item.GetCustomValue("format_text")      // "Jeopardy" 
    startDate := item.GetCustomValue("start_date")    // "20250530T120000"
}

// Access custom elements in Atom
atomFeed, _ := parser.Parse(atomReader)
for _, entry := range atomFeed.Entries {
    priority := entry.GetCustomValue("priority")      // "1"
    customId := entry.GetCustomValue("customId")      // "entry-123"
    
    // Get with attributes
    priorityExt := entry.GetExtension("_custom", "priority")
    if len(priorityExt) > 0 {
        level := priorityExt[0].Attrs["level"]        // "high"
    }
}

BREAKING CHANGE: Remove Item.Custom and Feed.Custom fields entirely

- Remove Custom field from gofeed.Feed and gofeed.Item structs
- Remove Custom field from rss.Item struct
- Update RSS parser to add non-namespaced elements to Extensions under "rss" namespace
- Add support for custom elements at both channel and item level
- Add helper methods for easier access to extensions:
  - GetExtension(namespace, element) - returns all matching extensions
  - GetExtensionValue(namespace, element) - returns text value of first match
  - GetCustomValue(element) - convenience method for RSS custom elements
- Fix RSS parser to skip RDF "items" structural element
- Add comprehensive test coverage for custom element scenarios
- Update existing tests to reflect removal of Custom field

Migration: Replace item.Custom["key"] with item.GetCustomValue("key")

This change allows parsing of custom RSS elements with full attribute
support and nested structure, addressing the limitations of the previous
map[string]string approach.

Fixes #246
Fixes #205 (custom elements now support nested structure and attributes)
Fixes #82 (custom tags like <weight> are now accessible via GetCustomValue)
mmcdole added 2 commits May 26, 2025 01:04
- Change namespace key from 'rss' to '_custom' to avoid conflicts
- '_custom' clearly indicates non-namespaced custom elements
- Add custom element support to Atom parser for both feed and entry levels
- Update all tests and test data to use '_custom' namespace
- Add GetCustomValue helper method for Feed type
- Add comprehensive Atom custom element test coverage

The '_custom' namespace is less likely to conflict with actual XML
namespaces and clearly indicates its special purpose for non-namespaced
elements across all feed formats.
@mmcdole mmcdole merged commit 493eda3 into v2 May 26, 2025
1 check passed
@mmcdole
Copy link
Owner Author

mmcdole commented May 26, 2025

@infogulch let me know if this change seems fine to you as well.

@infogulch
Copy link
Contributor

Looks ok to me. Does this save pagination information like what's described at https://stackoverflow.com/questions/1301392/pagination-in-feeds-like-atom-and-rss ?

@mmcdole
Copy link
Owner Author

mmcdole commented May 26, 2025

Does this save pagination information

I think this is still fine, if I understand the StackOverflow link correctly.

Looking at our Atom parser, we already capture link elements with their rel attributes in the Links field. So pagination links like:

  <link rel="next" href="http://example.org/index.atom?page=2"/>
  <link rel="previous" href="http://example.org/index.atom?page=1"/>
  <link rel="first" href="http://example.org/index.atom"/>
  <link rel="last" href="http://example.org/index.atom?page=10"/>

These get parsed into the atom.Feed.Links array, where each Link struct has the Href and Rel fields you'd need to follow pagination.

So you should be able to access pagination info like:

atomFeed, _ := parser.Parse(reader)
 for _, link := range atomFeed.Links {
     if link.Rel == "next" {
         nextPageURL := link.Href
         // fetch next page
     }
 }

This PR shouldn't change that, as we are only touching the custom fields and these are handled by the standard Link parsing.

@mmcdole mmcdole mentioned this pull request May 26, 2025
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants