Skip to content

feat: Add structured HTML output with document tree and inner content#77

Open
dawsmith06 wants to merge 1 commit intoueberdosis:mainfrom
dawsmith06:main
Open

feat: Add structured HTML output with document tree and inner content#77
dawsmith06 wants to merge 1 commit intoueberdosis:mainfrom
dawsmith06:main

Conversation

@dawsmith06
Copy link

Description

This PR adds structured HTML output capabilities to the DOMSerializer, allowing consumers to get both the rendered HTML and a structured representation of the document tree. This is particularly useful for applications that need to process or analyze the document structure programmatically.

New Features

processStructured(array $value): array

Returns a structured array representation of the document with each node containing:

  • type: The node type (e.g., 'table', 'paragraph', 'heading')
  • html: The complete rendered HTML including outer tags
  • innerHtml: The inner content HTML without outer tags (for nodes with children)
  • children: Array of child nodes with the same structure

structuredHtml($node): array

Internal method that processes individual nodes recursively to build the structured tree.

Use Cases

This feature enables:

  • Analytics: Track document structure and content patterns
  • Custom Rendering: Frontend frameworks can use the structured data for custom components
  • Content Transformation: Easier migration or processing of document content
  • Accessibility Tools: Better understanding of document hierarchy
  • Testing: More granular testing of document structure

Example Output

$serializer = new DOMSerializer($schema);
$structured = $serializer->processStructured($document);

// Returns:
[
  [
    'type' => 'table',
    'html' => '<table><tr><th>Header</th></tr></table>',
    'innerHtml' => '<tr><th>Header</th></tr>',
    'children' => [
      [
        'type' => 'tableRow',
        'html' => '<tr><th>Header</th></tr>',
        'innerHtml' => '<th>Header</th>',
        'children' => [
          [
            'type' => 'tableHeader',
            'html' => '<th>Header</th>',
            'innerHtml' => 'Header'
          ]
        ]
      ]
    ]
  ]
]

Breaking Changes

None. This is a purely additive feature that doesn't affect existing functionality.

Testing

The feature has been tested with a variety of document structures, including tables, lists, and nested content. The structured output accurately preserves the document hierarchy and provides both complete and inner HTML representations. Unit tests have also been added to ensure reliability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants