This repository contains the code for reproducing the results and figures in the paper Controlling Reading Ease with Gaze-Guided Text Generation (EACL 2026).
Code:
gaze_model.ipynb: Code for reproducing the gaze model traininggenerate.py/generate_all.sh: Scripts for generating the textsanalysis.ipynb: Code for reproducing the statistical analysis and figuresmodeling/: Modules for data preprocessing, model training, and text generation
Data:
emtec/: Scripts for downloading and converting the EMTeC dataset (Bolliger et al., 2024)eyetracking/: Preprocessed gaze data from the eye-tracking study (corresponds togaze/measures/cleaned.zipin the dataset repository)responses/: Response data from the eye-tracking study (comprehension questions and ratings; corresponds toresponses/responses.csvin the dataset repository)
Outputs:
models/: Trained gaze modelsstories/prompts.jsonl: Story prompts used for generating the textsstories/output-Llama-3B: Generated textspaper/figures/: Figures generated by the analysis notebook
- Install Python >= 3.12
pip install -r requirements.txt
gaze_model.ipynb contains all the code necessary to train and evaluate the GPT-2-based gaze model used in the paper, as well as the linear regression baseline.
The trained models are also included in the models directory. Retraining the models is not necessary for generating texts in the next section.
Running generate_all.sh will reproduce the texts with the settings described in the paper. A GPU with about 80GB of memory is recommended.
To generate texts with other settings, use generate.py:
python generate.py \
--prompts stories/prompts.jsonl \
--language-model <hugging-face-model-name> \
--gaze-model <gaze-model-name> \
--gaze-weight <gaze-weight> \
--beam-size <beam-size> \
--gpu <device-id> \
> output.jsonl<hugging-face-model-name>can be any instruction-tuned language model on the Hugging Face Hub (or local)<gaze-model-name>can betrf(transformer) orlr(linear regression)<gaze-weight>can be any real number<beam-size>can be any positive non-zero integer<device-id>refers to the CUDA device IDs (comma-separated) for the GPUs to use (will be passed toCUDA_VISIBLE_DEVICES)--verbosecan be added to print the beam with the highest total score at each generation step
analysis.ipynb contains all the code necessary to reproduce the statistical analyses and figures in the paper.
