Skip to content

PDF Parser Error #4

@nsamarin

Description

@nsamarin
Traceback (most recent call last):
  File "/Users/nsamarin/Projects/ccpa-compliance/scripts/scraper/main.py", line 159, in <module>
    scrape_policies(**kwargs)
  File "/Users/nsamarin/Projects/ccpa-compliance/scripts/scraper/main.py", line 132, in scrape_policies
    future.result()
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/concurrent/futures/_base.py", line 433, in result
    return self.__get_result()
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/polipy/polipy.py", line 292, in download_policy
    policy.extract(extractors=extractors)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/polipy/polipy.py", line 112, in extract
    content = extract(extractor, **vargs)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/polipy/extractors.py", line 11, in extract
    content = extract_text(**kwargs)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/polipy/extractors.py", line 18, in extract_text
    content = extract_pdf(static_source)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/polipy/extractors.py", line 28, in extract_pdf
    text = parse_pdf(f)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/pdfminer/high_level.py", line 114, in extract_text
    for page in PDFPage.get_pages(
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/pdfminer/pdfpage.py", line 128, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/pdfminer/pdfdocument.py", line 596, in __init__
    raise PDFSyntaxError('No /Root object! - Is this really a PDF?')
pdfminer.pdfparser.PDFSyntaxError: No /Root object! - Is this really a PDF?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions