Fix unexpected document end when importing drops with large XLIFF files#1049
Fix unexpected document end when importing drops with large XLIFF files#1049wadimw wants to merge 2 commits intoupstream-patchedfrom
Conversation
|
Notes on the go: During direct isolated testing of the flow of The funny part is that this solution (i.e. reverting to CharSequence-backed document, but then force changing the encoding) actually works https://github.com/box/mojito/actions/runs/21634652420/job/62356567792 |
|
Possibly the same issue as #1021 |
5b00af0 to
211a757
Compare
211a757 to
b25a3d0
Compare
|
New findings: Seems like we can get rid of the Now, onto the actual issue. The way stream-based Note that this behaviour didn't surface earlier, because To confirm these findings, I first tried adding Using a debugger I discovered that the The idea then is to move the So, eventually I decided to split the integrity checks into two pipeline steps - one for document-level checks that read the whole file content from its stream while it's still allowed (i.e. before EDIT: IT WOOOOOORKS WOOHOOOOOOOO https://github.com/box/mojito/actions/runs/21718444824/job/62640989530?pr=1049 |
b25a3d0 to
e0890a4
Compare
This PR fixes the following error:
which would occur when importing a drop containing XLIFF files larger than 8 KiB. On the UI side, this would appear as
Import Failedin theProject Requestspage, and would result in a partial import (translations for strings before the 8KiB mark would be imported correctly).Click here to see the full stack trace
Note that the logged error position [128,206] was exacly 8192 characters. Additionally, this happens regardless of selected DropExporter (i.e. this is not caused by Box SDK failing to provide document content).
The root cause was one step in the Okapi pipeline (
IntegrityCheckStep) advanding the underlying document stream to the end, while another step (RawDocumentToFilterEventsStepwithXLIFFFIlter) was in the middle of parsing it. This issue was introduced in #731 which changed the method to retrieve content of the whole document fromRawDocument#getCharSequencetoRawDocument#getReader. According to the findings described in #1049 (comment), it seems like it's only allowed to access the reader/stream withinBasePipelineStep#handleRawDocument, not later (in any Filter Events handlers).The fix is tested through new
DropServiceTest#forTranslationLargeXliffFile.