Skip to content

Assembling social movement organizations from Stanford tags #8

@erleholgersen

Description

@erleholgersen

The Stanford NER tagger tags individual words as SMO or not. For example, Occupy Wall Street is returned as [('Occupy', 'ORGANIZATION'), ('Wall', 'ORGANIZATION'), ('Street', 'ORGANIZATION')].

To parse this into a single string I've made the assumption that all consecutive organization tags indicate the same SMO. Does this seem like a reasonably robust approach, or should we try to come up with something else?

It seems to work as long as punctuation is included as separate tokens (i.e. a list of SMOs is separated by non-organization tagged commas), but I probably haven't thought about all edge cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions