Skip to content

feat: Add hybrid node/web dom parsing (NR-477)#24

Merged
lentil32 merged 10 commits intoian/nr-531from
ian/nr-477
Jun 20, 2025
Merged

feat: Add hybrid node/web dom parsing (NR-477)#24
lentil32 merged 10 commits intoian/nr-531from
ian/nr-477

Conversation

@lentil32
Copy link
Collaborator

  1. MDR ground truth 생성에는 jsdom 활용 (linkedom, node-html-parser보다 DOMParser 에 가까운 결과)
  2. 서버에서 쓸 떄는 linkedom, 브라우저에서 쓸 때는 브라우저 dom 쓰도록 함.
  • 필요에 따라 jsdom 주입해서 쓰는 것도 가능

linkedom Advantages:

  • Full DOM API compatibility (TreeWalker, NodeFilter, createHTMLDocument)
  • Seamless browser/Node.js code sharing
  • Supports all DOM properties your code uses (parentElement, previousElementSibling, nodeType
    constants)
  • Better for complex DOM traversal operations

node-html-parser Disadvantages:

  • Missing critical APIs (TreeWalker, createHTMLDocument)
  • Limited DOM traversal methods
  • Would require significant code refactoring
  • Less browser-compatible API surface

@lentil32 lentil32 requested a review from maestrojeong June 18, 2025 03:12
@vercel
Copy link

vercel bot commented Jun 18, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
next-eval-irdb ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 18, 2025 5:20am

@lentil32 lentil32 changed the title feat: Add hybrid node/web dom parsing feat: Add hybrid node/web dom parsing (NR-477) Jun 18, 2025
@lentil32
Copy link
Collaborator Author

@maestrojeong 요걸 먼저 리뷰한 이후에 nr-531 봐주세요!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node-html-parser 대신 jsdom 을 쓰면서 변경된 파일.

Comment on lines +79 to 85
const slimmedResult = pipe(
content,
p.parseHtml,
p.slimDocument,
(result) => result.slimmedHtml,
);

Copy link
Collaborator Author

@lentil32 lentil32 Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

이런 API가 되었습니다
그냥 parseHtml, slimDocument 를 쓸 수도 있고
이렇게 FP로도 가능한 것..

@lentil32 lentil32 merged commit ddebda1 into ian/nr-531 Jun 20, 2025
2 checks passed
@lentil32 lentil32 deleted the ian/nr-477 branch June 20, 2025 07:14
lentil32 added a commit that referenced this pull request Jun 20, 2025
* refactor: Move footer to sole file

* refactor: extract LLM tab into modular component architecture

- Create shared atoms for state management (processedData, randomNumber, htmlId, feedbackSent, activeExtractTab)
- Extract LLM logic into dedicated useLlm hook
- Create reusable useFeedback hook for feedback functionality
- Refactor LlmTab component to use atoms instead of props
- Update MdrTab to use atoms for consistency
- Create LlmInteractionSection wrapper component
- Eliminate prop drilling throughout component hierarchy
- Maintain feature parity while improving code organization

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: reset MDR state when changing files

MDR results were persisting when switching between files. Now MDR state properly resets when:
- Selecting a new file
- Processing a file
- Loading sample data
- Fetching from URL

Also simplified the code by removing the unnecessary resetMdr function and using direct atom setters instead.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: update docs

* chore: remove unused files

* feat: setup syncpack

* CI: Update release.yml

* chore: version up

* feat: Add hybrid node/web dom parsing (NR-477) (#24)

* feat: Add hybrid node/web dom parsing

* feat: edit vercel timeout function

* feat: add vercel timeout

* feat: add slack webhook url as env

* jsdom ground truth

* refactor: Make node types as constant

* feat: Update @wordbricks/next-eval API to FP friendly

* chore: remove redundant file

* choer: update package version

---------

Co-authored-by: maestroJeong <legend4020@snu.ac.kr>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: maestroJeong <legend4020@snu.ac.kr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments