Possibly low F1 when finetuning BERT base

Hi Mandar,

When I finetune BERT base, I get an OntoNotes dev F1 of 73.69. I was wondering if this is within the variance that you saw for BERT base, or could there be some problem with my setup?

I'm using the requirements versions from `requirements.txt` (except with the MarkupSafe version changed to 1.1.1, https://github.com/mandarjoshi90/coref/pull/40, and psycopg2 changed to psycopg2-binary), and am training on a V100 32GB, with these commands:
```
python train.py train_bert_base
python evaluate.py train_bert_base
```


When evaluating your finetuned BERT base model on dev (`python evaluate.py bert_base`), I get an F1 of 74.05. This is closer to the 74.3 dev F1 number from Table 4, but should it match exactly? I'm wondering if there could be some difference in my setup which affects eval a bit but gets magnified during training.

Thanks,
Daniel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly low F1 when finetuning BERT base #44

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possibly low F1 when finetuning BERT base #44

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions