Skip to content

Annotation fails with String index out of range: 4  #73

@acxcv

Description

@acxcv

Hi, I'm using Spotlight to annotate ~40k texts.

In around 3.5k instances, the annotation does not work as expected and Spotlight produces String index out of range: 4 instead of the annotation XML.

I can't find the reason why this happens. From what I can tell, the texts where Spotlight fails are of similar length and structure as those that work flawlessly.

I've tried removing all non-alphanumeric characters from sample texts that failed, but the error still persists.

This is the last shell output I'm getting on the REST server before the CURL command returns the error.

...]
492713 [Grizzly-2222(5)] INFO org.dbpedia.spotlight.filter.annotations.ConfidenceFilter - (c=0.45) filtered out by similarity score threshold (0.000<0.450): SurfaceForm[Black] -0.000-> DBpediaResource[Black(DBpedia:Colour)] - at position *7371* in - Text[... rres management told them that if they played the Black Angels Death Song again theyd be fired the V ...]                                
492713 [Grizzly-2222(5)] INFO org.dbpedia.spotlight.filter.annotations.ConfidenceFilter - (c=0.45) filtered out by similarity score threshold (0.000<0.450): SurfaceForm[Black] -0.000-> DBpediaResource[Black_Canadians(Wikidata:Q41710,DBpedia:EthnicGroup)] - at position *7371* in - Text[... rres management told them that if they played the Black Angels Death Song again theyd be fired the V ...]

Does anybody have an idea why this could happen? I can provide a text file containing the texts in question for reference.

I'm using Java 1.8.0, dbpedia-sporlight-1.0.0 jarfile, latest en core data release

Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions