-
Notifications
You must be signed in to change notification settings - Fork 107
Open
Description
Hi! I just recently came across Trankit, even though it's been around for three years! I’m surprised that I haven’t seen this solution before, although I’ve worked quite a lot with classic NLP.
We are currently working on a corpus of a low-resource language (Faroese) and we have some hand-labeled data (about 1200 sentences, ~15000 tokens) in CONLL-U. XML-RoBERTa does not support Faroese, and therefore it is not listed as supported in your trainable pipeline, but there is another BERT-like model that does have Faroese data. I have several questions:
- Can we expect Trankit to perform better when training on a small dataset than other tools such as Stanza or SpaCy?
- Does Trainable Pipeline support the ability to specify a custom BERT model as an embedding model?
- How can I specify a language that is not natively supported?
- How were the models trained, for example, for ancient languages such as Ancient Greek, Old Russian or Old French? Were the modern languages specified for them (Greek, Russian and French)?
I will be very grateful for your answers!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels