Add NLLB (M2M100) support by vrmer · Pull Request #769 · adapter-hub/adapters

vrmer · 2024-12-17T16:34:16Z

I implemented AdapterHub support for the Facebook NLLB model and its underlying M2M100 architecture. I've carried out and ran all the relevant tests, auto formatting and quality checks.

The code passes 124 tests, skipping 7, and failing 11. The 11 failed tests are all connected to Parallel composition blocks (that I did not implement) and flex heads, which I also did not implement. As the model is a machine translation model, it does not need to have classification heads on top of it, but I didn't find how to disable the irrelevant head_types in the ADAPTER_MODEL_MAPPING dictionary to be able to skip these tests.

Any advice on this is greatly appreciated!

Key addition:

A new M2M100AdapterModel class with the relevant WithAdapters and AdaptersMixin classes implemented.

calpt

Thanks a lot for working on this! Already looks pretty good overall, I left some comments regarding the open issues that are hopefully helpful.

Once that's done, please also add this new model type to the docs as described in the contributing guide, thanks!

calpt · 2025-01-04T15:50:48Z

src/adapters/models/m2m_100/adapter_model.py

+    head_types = [
+        "classification",
+        "multilabel_classification",
+        "question_answering",
+        "seq2seq_lm",
+    ]


this defines the range of supported heads. Since I believe we'd only want to support sequence generation, you can remove everything except for "seq2seq_lm" here.

calpt · 2025-01-04T15:51:42Z

tests/test_m2m_100.py

+    ParallelAdapterInferenceTestMixin,
+    ParallelTrainingMixin,


In case we don't want to support Parallel composition (which is totally fine), please remove these two mixins to disable the tests.

Otherwise, by adding the model type here:

adapters/src/adapters/composition.py

Line 121 in f0ca962

SUPPORTED_MODELS = {

, you can declare it as supported (since I believe the implementation is already mostly there from looking at your code)

calpt · 2025-01-04T16:01:18Z

tests/test_m2m_100.py

+from .test_adapter_heads import PredictionHeadModelTestMixin
+
+
+class M2M100AdapterTestBase(AdapterTestBase):


Since this model doesn't support text classification tasks (and we test text classification training by default in the test runs), we'd have to override the add_heead() and dataset() methods here. That would roughly look like here:

adapters/tests/test_bert_generation.py

Lines 29 to 70 in f0ca962

def add_head(self, model, name, **kwargs):

model.add_masked_lm_head(name)

return self.default_input_samples_shape[-1]

def dataset(self, tokenizer=None):

# setup tokenizer

if tokenizer is None:

tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, use_fast=False)

if tokenizer.pad_token is None:

tokenizer.pad_token = tokenizer.eos_token

def preprocess_function(examples):

inputs = examples["document"]

targets = examples["summary"]

inputs = ["Summarize: " + inp for inp in inputs]

model_inputs = tokenizer(inputs, padding="max_length", truncation=True, max_length=128)

# Setup the tokenizer for targets

with tokenizer.as_target_tokenizer():

labels = tokenizer(targets, padding="max_length", truncation=True, max_length=128)

# If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore

# padding in the loss.

labels["input_ids"] = [

[(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels["input_ids"]

]

model_inputs["labels"] = labels["input_ids"]

return model_inputs

data_args = {

"task_name": "xsum",

"path": "./tests/fixtures/samples/xsum/sample.json",

}

dataset = load_dataset("json", data_files=data_args["path"])

train_dataset = dataset["train"]

train_dataset = train_dataset.map(

preprocess_function,

batched=True,

desc="Running tokenizer on train dataset",

)

return train_dataset

vrmer · 2025-02-25T09:35:01Z

Thank you so much for the helpful tips! It seems I have one major issue left, which is connected to the issue raised here:

#578

FAILED tests/test_m2m_100.py::M2M100AdapterTest::test_batch_split_training - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
FAILED tests/test_m2m_100.py::M2M100AdapterTest::test_train_adapter_fusion - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
FAILED tests/test_m2m_100.py::M2M100AdapterTest::test_train_ia3 - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
FAILED tests/test_m2m_100.py::M2M100AdapterTest::test_train_lora - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
FAILED tests/test_m2m_100.py::M2M100AdapterTest::test_train_loreft - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
FAILED tests/test_m2m_100.py::M2M100AdapterTest::test_train_mam_adapter - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
FAILED tests/test_m2m_100.py::M2M100AdapterTest::test_train_prefix_tuning - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
FAILED tests/test_m2m_100.py::M2M100AdapterTest::test_train_shared_phm_compacter - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
FAILED tests/test_m2m_100.py::M2M100AdapterTest::test_train_shared_w_compacter - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
FAILED tests/test_m2m_100.py::M2M100AdapterTest::test_train_single_adapter - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
FAILED tests/test_m2m_100.py::M2M100AdapterTest::test_train_unipelt - RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Do you have any advice by any chance to address this?

Thank you!

calpt · 2025-04-10T20:00:52Z

Hey, I tried to reproduce these issues, but got the following error message instead:

ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

It might be that the latest code state pushed here is not the latest one?

add nllb model integration

bd0a5f7

calpt reviewed Jan 4, 2025

View reviewed changes

calpt mentioned this pull request Apr 20, 2025

Add Support for M2M100 #445

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NLLB (M2M100) support#769

Add NLLB (M2M100) support#769
vrmer wants to merge 1 commit intoadapter-hub:mainfrom
vrmer:main

vrmer commented Dec 17, 2024

Uh oh!

calpt left a comment

Uh oh!

calpt Jan 4, 2025

Uh oh!

calpt Jan 4, 2025

Uh oh!

calpt Jan 4, 2025

Uh oh!

vrmer commented Feb 25, 2025

Uh oh!

calpt commented Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from .test_adapter_heads import PredictionHeadModelTestMixin


		class M2M100AdapterTestBase(AdapterTestBase):

	def add_head(self, model, name, **kwargs):
	model.add_masked_lm_head(name)
	return self.default_input_samples_shape[-1]

	def dataset(self, tokenizer=None):
	# setup tokenizer
	if tokenizer is None:
	tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, use_fast=False)
	if tokenizer.pad_token is None:
	tokenizer.pad_token = tokenizer.eos_token

	def preprocess_function(examples):
	inputs = examples["document"]
	targets = examples["summary"]
	inputs = ["Summarize: " + inp for inp in inputs]
	model_inputs = tokenizer(inputs, padding="max_length", truncation=True, max_length=128)

	# Setup the tokenizer for targets
	with tokenizer.as_target_tokenizer():
	labels = tokenizer(targets, padding="max_length", truncation=True, max_length=128)

	# If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore
	# padding in the loss.
	labels["input_ids"] = [
	[(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels["input_ids"]
	]

	model_inputs["labels"] = labels["input_ids"]
	return model_inputs

	data_args = {
	"task_name": "xsum",
	"path": "./tests/fixtures/samples/xsum/sample.json",
	}
	dataset = load_dataset("json", data_files=data_args["path"])
	train_dataset = dataset["train"]
	train_dataset = train_dataset.map(
	preprocess_function,
	batched=True,
	desc="Running tokenizer on train dataset",
	)
	return train_dataset

Conversation

vrmer commented Dec 17, 2024

Uh oh!

calpt left a comment

Choose a reason for hiding this comment

Uh oh!

calpt Jan 4, 2025

Choose a reason for hiding this comment

Uh oh!

calpt Jan 4, 2025

Choose a reason for hiding this comment

Uh oh!

calpt Jan 4, 2025

Choose a reason for hiding this comment

Uh oh!

vrmer commented Feb 25, 2025

Uh oh!

calpt commented Apr 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants