Couldn't find a way to mask the assistant with zeros in MultiTurnSFTDataset. For example, if I want to set loss_mask=0 for all of the assistant's responses except the last one, this can also be useful in long agent trajectories, where some intermediate steps may be incorrect and it would be desirable to explicitly disregard them during training.
One option is to add the trainable flag to the message and check it when creating the loss_mask.
{
"role": ...,
"content": ...,
"trainable":...,
}