Convert the image of the formula to Latex.
Under the project root directory, run
python training/run_experiment.py --max_epochs=3 --gpus='0,' --num_workers=2 --model_class=ResnetTransformer --data_class=Im2Latex100K --batch_size=16Use the wandb init command to set up a new W&B project, so we can add --wandb to record our experiments through the service provided by W&B.
python training/run_experiment.py --wandb --max_epochs=3 --gpus='0,' --num_workers=2 --model_class=ResnetTransformer --data_class=Im2Latex100K --batch_size=8If you want to test your model you can add --overfit_batches argument.
For more argument usage, you can refer to pytorch-lightning Trainer.
- CNNLSTM
- ResnetTransformer
- Im2Latex100K
Under the project root directory, run
python training/save_best_model.py --entity=zhengkai --project=im2latex --trained_data_class=Im2Latex100K--entity: your W&B user name--project: your W&B project
Under the project root directory, run python im2latex/im2latex_inference.py <image_path>, for example:
python im2latex/im2latex_inference.py im2latex/tests/support/im2latex_100k/7944775fc9.pngUnder the project root directory, run
docker build -t im2latex/api-server -f api_server/Dockerfile .If you want to rebuild the image, you can use the following command to remove the existing image.
docker rmi -f im2latex/api-serverUnder the project root directory, run
docker run -p 60000:60000 -p 60001:60001 -it --rm --name im2latex-api im2latex/api-serverThen, we can use the model API through port 60000 and use the Streamlit App through port 60001.
If the container is already running, you can use the following command to remove the existing container.
docker rm -f im2latex-apiUnder the project root directory, run
pytest -s ./im2latex/tests/test_im2latex_inference.pyUnder the project root directory, run
pytest -s ./im2latex/evaluation/evaluate_im2latex_inference.pyUnder the project root directory, run
pytest -s api_server/tests/test_app.pyYou can try Image to LaTeX App on Hugging Face Space. But please note that for images other than the training data set, the model performance is still very poor.
- Small data can cause exposure bias. We may be able to mitigate exposure bias through scheduled sampling.
- Decoding algorithms can lead to different results. We might be able to use nucleus sampling instead of beam search to reduce the possibility of duplicate tokens.