This repository contains the code for the paper "Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process".
The code is based on the eric-mitchell/direct-preference-optimization repository.
pip install -r requirements.txt
bash commands/run_mistral_ift.sh
-
Temporal Residual Connection:lambda_schedule: The schedule mode oflambda. The default value is set tonull, which means the static mode.linearmode is also provided for the dynamic mode.min_lambda&max_lambda: The minimum value oflambda. The default value of both is set to 0.2, which means the static mode. If thelambda_scheduleis set tolinear, themin_lambdaandmax_lambdawill be used to control the start and end value oflambdaduring training.lambda_disturb: The disturbance distribution oflambda. The default value is set tonull, which means no disturbance.normalmode is also provided for the disturbance distribution.disturb_std: The standard deviation of thelambda_disturb. This hyperparameter is only worked when thelambda_disturbis notnull.
-
Relation Propagation:gamma: The decay factor of the Relation Propagation. The default value is set to 0.95.propagation_type: The variable attribute to Relation Propagation. The default value is set toloss.maskandlogpsare also provided for the variable attribute.propagation_side: The side of the Relation Propagation. The default value is set toleft.rightis also provided for the side of the Relation Propagation.propagation_norm: The normalization mode of the Relation Propagation. The default value is set toL1.L2,softmaxandlogare also provided for the normalization mode.
If you find IFT useful in your research, please consider citing the following paper:
@article{
hua2024intuitive,
title={Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process},
author={Hua, Ermo and Qi, Biqing and Zhang, Kaiyan and Yu, Yue and Ding, Ning and Lv, Xingtai and Tian, Kai and Zhou, Bowen},
journal={arXiv preprint arXiv:2405.11870},
year={2024}
}