Scenario: use My own RNN model (very simple model ) on my dataset by using (ORG, PER, LOC, O)tag scheme my model gives score about (F1 = 0.775) but when i use NCRFpp to train my model and it gives me F1 score = -1
and after debugging I changed my data format according to the requirement (BIOES tag sheme used)
and then it gives my very poor results (F1 = 0.554)
Q-1 how do you compute f1-Score replacing tag with original or just weighted average?
Q-2 if I want to use bert Embedding How can I use?