Skip to content

Issue with CSV Parsing for Embedded Quotes Using EntityDataLoader #641

@puru-khedre

Description

@puru-khedre

When using EntityDataLoader to import CSV files, I encountered issues with CSV values containing double quotes ("). The error encountered was:

{'message':'IOException reading next record: java.io.IOException: (line 3) invalid char between encapsulated token and delimiter (line 3) invalid char between encapsulated token and delimiter','errorName':'Internal Server Error','error':500,'path':'/apps/tools/Entity/DataImport/load'}

Example CSV Data:

co.example.bi.fact.OrderItemFulfillmentFact
orderId,orderItemSeqId,externalId,orderName,orderTypeId,productStoreId,salesChannelEnumId,entryDate,orderDate,shippingCharges,productId,itemDescription,
FAO10117,101,5669763023132,"#1010101240",SALES_ORDER,STORE,POS_SALES_CHANNEL,1705232954323,1705232896000,,10016,"\"And\" Pride Tank in Grey Mix",

Current Behavior:

The parser throws an IOException due to embedded quotes in the data.

Proposed Solution:

To handle this, I found that using the withEscape method of CSVFormat helps manage escape characters effectively

escapeSeq = '\\'
CSVFormat format = CSVFormat.newFormat(edli.csvDelimiter)
        .withCommentMarker(edli.csvCommentStart)
        .withQuote(edli.csvQuoteChar)
        .withSkipHeaderRecord(true) // TODO: remove this? does it even do anything?
        .withIgnoreEmptyLines(true)
        .withIgnoreSurroundingSpaces(true)

format = format.withEscape(escapeSeq) // Added escape character support
CSVParser parser = format.parse(reader)

If my proposed solution is good, I'd be happy to create a pull request for it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions