-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Open
Labels
bugSomething isn't workingSomething isn't workingtriageIssue needs to be triaged/prioritizedIssue needs to be triaged/prioritized
Description
Bug Description
Similar to the behavior previously fixed in #19302, the DocumentBlock class incorrectly treats empty strings ("") as falsy values for its optional fields (such as document_mimetype, url, and title), converting them to None during initialization or validation.
This inconsistent behavior violates the principle of data integrity—if a user explicitly provides an empty string, the library should preserve it rather than defaulting it to None. This is particularly important for serialization and downstream validation where a str type is expected.
Version
llama-index-core==0.14.12
Steps to Reproduce
from llama_index.core.llms import DocumentBlock
doc = DocumentBlock(
data=b"",
url="",
title="",
document_mimetype=""
)
doc.document_validation()
print(f"URL: {doc.url!r}")
print(f"Mimetype: {doc.document_mimetype!r}")
assert doc.url == "", f"Expected empty string, but got {doc.url!r}"
assert doc.document_mimetype == "", f"Expected empty string, but got {doc.document_mimetype!r}"Relevant Logs/Tracbacks
URL: ''
Mimetype: None
Traceback (most recent call last):
line 16, in <module>
assert doc.document_mimetype == "", f"Expected empty string, but got {doc.document_mimetype!r}"
AssertionError: Expected empty string, but got NoneReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingtriageIssue needs to be triaged/prioritizedIssue needs to be triaged/prioritized