-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Firefox and its forks use the <HR> tag to introduce a line separator between bookmarks and this prevent the parser from working correctly. Chrome and its forks do not introduce such a tag as far as I can tell.
A possible solution could be to remove each <HR> tag before parsing e.g. in parser.py create the following function:
# remember to import re
def __remove_hr_tags(html_lines):
# Compile the regex pattern for matching <HR> tags (case-insensitive)
hr_pattern = re.compile(r'<hr[^>]*>', re.IGNORECASE)
# Process each line
cleaned_lines = []
for line in html_lines:
# Remove <HR> tags from the line
cleaned_line = hr_pattern.sub('', line)
cleaned_lines.append(cleaned_line)
return cleaned_linesAnd then, in parse() function:
def parse(netscape_bookmarks_file: NetscapeBookmarksFile):
"""
Responsible to start parsing, getting metadata information
and start the folder recursion
:param netscape_bookmarks_file: a NetscapeBookMarkFile
:return: the NetscapeBookMarkFile, but parsed
"""
line_num = 0
file = netscape_bookmarks_file
lines = netscape_bookmarks_file.html.splitlines()
# Remove the <HR> tag
lines = __remove_hr_tags(lines)
# rest of the code...Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels