Pretraining with FlexBert but can't export to HF

Hi @warner-benjamin and team,
Thank you for your fantastic work on ModernBERT and FlexBert!

I’ve trained a FlexBert model using your codebase and I’m now trying to convert it to Hugging Face Transformers format for easier sharing and downstream use. I found the convert_to_hf.py script referenced in some previous issues and used it, along with the model definitions from [this YAML](https://github.com/AnswerDotAI/ModernBERT/blob/pretraining_documentation/yamls/modernbert/modernbert-base-pretrain.yaml).

The model loads correctly with AutoModel and AutoTokenizer, but when I run a forward pass, the outputs are all NaN.
```
from transformers import AutoModel, AutoConfig

model_path = "</path/to/model/from/convert_to_hf>"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModel.from_pretrained(model_path)
inputs = tokenizer("Hello world", return_tensors="pt")
with torch.no_grad():
    outputs = test_model(**inputs) # <- returns all NaN
```

Do you have any advice or best practices for converting models to HF format, or tips on what might be causing this issue? Any guidance would be greatly appreciated!

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretraining with FlexBert but can't export to HF #234

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pretraining with FlexBert but can't export to HF #234

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions