Gaze-controlled text generation

This repository contains the code for reproducing the results and figures in the paper Controlling Reading Ease with Gaze-Guided Text Generation (EACL 2026).

📄 Paper | 💾 Dataset

emtec/: Scripts for downloading and converting the EMTeC dataset (Bolliger et al., 2024)
eyetracking/: Preprocessed gaze data from the eye-tracking study (corresponds to gaze/measures/cleaned.zip in the dataset repository)
responses/: Response data from the eye-tracking study (comprehension questions and ratings; corresponds to responses/responses.csv in the dataset repository)

Outputs:

models/: Trained gaze models
stories/prompts.jsonl: Story prompts used for generating the texts
stories/output-Llama-3B: Generated texts
paper/figures/: Figures generated by the analysis notebook

Reproducing results

Requirements and setup

Install Python >= 3.12
pip install -r requirements.txt

Gaze model training

gaze_model.ipynb contains all the code necessary to train and evaluate the GPT-2-based gaze model used in the paper, as well as the linear regression baseline.

The trained models are also included in the models directory. Retraining the models is not necessary for generating texts in the next section.

Text generation

Running generate_all.sh will reproduce the texts with the settings described in the paper. A GPU with about 80GB of memory is recommended.

To generate texts with other settings, use generate.py:

python generate.py \
    --prompts stories/prompts.jsonl \
    --language-model <hugging-face-model-name> \
    --gaze-model <gaze-model-name> \
    --gaze-weight <gaze-weight> \
    --beam-size <beam-size> \
    --gpu <device-id> \
    > output.jsonl

<hugging-face-model-name> can be any instruction-tuned language model on the Hugging Face Hub (or local)
<gaze-model-name> can be trf (transformer) or lr (linear regression)
<gaze-weight> can be any real number
<beam-size> can be any positive non-zero integer
<device-id> refers to the CUDA device IDs (comma-separated) for the GPUs to use (will be passed to CUDA_VISIBLE_DEVICES)
--verbose can be added to print the beam with the highest total score at each generation step

Analysis

analysis.ipynb contains all the code necessary to reproduce the statistical analyses and figures in the paper.

License

The code in this repository is licensed under MIT.
The dataset containing the generated texts, eye-tracking and response data is available here and licensed under CC-BY-NC.
The EMTeC dataset is available here and licensed under CC-BY.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gaze-controlled text generation

Contents

Reproducing results

Requirements and setup

Gaze model training

Text generation

Analysis

License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
emtec		emtec
eyetracking		eyetracking
modeling		modeling
models		models
paper/figures		paper/figures
responses		responses
stories		stories
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analysis.ipynb		analysis.ipynb
gaze_model.ipynb		gaze_model.ipynb
generate.py		generate.py
generate_all.sh		generate_all.sh
requirements.txt		requirements.txt

License

mainlp/gaze-guided-text-generation

Folders and files

Latest commit

History

Repository files navigation

Gaze-controlled text generation

Contents

Reproducing results

Requirements and setup

Gaze model training

Text generation

Analysis

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages