Skip to content

[dictionaries] General improvements to Norwegian dictionaries#706

Open
Bjoern-Rapp wants to merge 1 commit intoopenvenues:masterfrom
Bjoern-Rapp:dictionary-nb
Open

[dictionaries] General improvements to Norwegian dictionaries#706
Bjoern-Rapp wants to merge 1 commit intoopenvenues:masterfrom
Bjoern-Rapp:dictionary-nb

Conversation

@Bjoern-Rapp
Copy link

Improved the dictionaries for Norwegian street name, street type and post office.
This are the most common "street types" and landforms used in Norwegian street names, that I can find.

I could try to expand the concatenated_suffixes_inseparable.txt and the concatenated_suffixes_separable.txt dictionaries to, but I don't quite understand when a word is supposed to be in one list versus the other.

@albarrentine
Copy link
Contributor

albarrentine commented Aug 14, 2025

concatenated_suffixes_separable.txt is something like straße/strasse in German, where it may appear concatenated to the string or may appear separated by a space or hyphen. For instance, we'd like that:

  • "Lange Strasse"
  • "Lange-Strasse"
  • "Langestrasse"
  • "Langestr."
  • "Lange Str."
  • "Lange-Str."

should all share an expansion in common.

concatenated_suffixes_separable.txt tends to be more for endings of place names like "burg", which may sometimes appear as "bg." but wouldn't be separated from the city name the way a street ending could.

Some had no abbreviation but for the language classifier it is easier to get wrong answers with the 4-gram model in languages like German, Dutch, Danish, Norwegian, and Swedish which have many character sequences in common, so having a few more suffixes, etc. in those dictionaries tended to help even if they're not abbreviations.

@Bjoern-Rapp
Copy link
Author

The reason that I am unsure, is that in Norwegian nouns can take a indefinite and a definite inflection, and the definiteness decides if the street ending can be separated from the core.
So, as in the example above, "upper street" could be written as:

  • Øvre gate (indefinite)
  • Øvregata (definite)

but not as "Øvre gata".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants