A discord chatbot powered by gpt2 and flavored with dialogue from The Office.
This repo provides all the code necessary for you to train Microsoft's gpt2 text generation model to speak like any persona by providing it with dialogue as a .csv file. I have also included the files I use to run this bot on replit to communicate with the AI through discord. A quick note: the model's .bin file is too large to host here on Github, but it can be downloaded from HuggingFace.
Regarding the model itself, the model files I have included are the result of training the gpt2-medium model with dialogue from The Office found in this kaggle dataset. I removed actions or asides (typically enclosed inside of brackets) from the data, along with discarding lines with more than 300 characters (trimming out long confessionals). Afterwards, I was left with close to 12,000 lines of Michael Scott dialogue alone, with thousands more for the rest of the characters.
After I had a prepared dialogue .csv file, I loaded up a jupyter notebook I found on freecodecamp to train the model. After a few training sessions I modified the following parameters:
- Use the
gpt2-mediummodel as a base - Reduce the
per_gpu_train_batch_sizeto 2 - Raise the
save_stepsamount to 10,000 (recommended for larger models)
The notebook uploads its produced model to HuggingFace, where there is a built-in API to run the model. But I wanted a fancier and more intuitive way to interact with the AI, so I created a discord bot and am hosting it on replit, which is kept alive through the services of UptimeRobot.
- Kaggle account (downloading datasets)
- Python 3 or greater
- Pip
- Re via
pip install re
- Google Drive (minimum 4 GB empty space)
- Google Colab
- HuggingFace account
If you are merely curious to see how well this bot works, feel free to test it using the Inference API over on HuggingFace. Drop a like if it meets your expectations! However, if you are interested in doing your own training, I have a detailed tutorial below documenting how to train your very own customized chatbot.
First, data is required to train the model. I used a dataset I found on Kaggle, but many other sites have .csv files of dialogue from a variety of shows, movies, and games. Keep in mind that parse_data.py was specifically tailored to the dataset I was using; it may require a little editing to work for your specific dataset. Once you have a satisfactory dataset, upload your .csv file and the scottbot.ipynb notebook to Google Drive.
Before running the notebook, certain parameters have to be changed:
data = pd.read_csv("filename.csv")needs to match your dialogue filenameCHARACTER_NAMEneeds to be set to your intended character--global user.emailneeds to match your HuggingFace account email address--global user.nameneeds to match your HuggingFace usernameMY_MODEL_NAMEcan be whatever you wish to name your modelHUGGINGFACE_API_KEYwill be a key found under "Access Tokens" in the settings tab of your HuggingFace profile
Other parameters are fun to tweak:
n(the number of lines included in context; by default, 7)args.num_train_epochsfor longer trainingargs.per_gpu_train_batch_sizeto cope with limited RAMargs.seedfor replicabilitytokenizer/model(options beingmicrosoft/DialoGPT-small,microsoft/DialoGPT-medium, ormicrosoft/DialoGPT-large)
After having made the changes above, connect to a runtime and hit run all. Depending on the size of your dataset and the model you are using, training can take anywhere betweeen a couple of minutes to a few hours.
After your model is trained, it will be uploaded to your HuggingFace account. My model was detected as text generation by default (which is actually a finish-the-sentence model) and if that is what you want to experiment with, no changes are necessary. But because I wanted a question/response-based chatbot, I needed to add a README.md file with a tag denoting the model as conversational, as is visible in the model/README.md file in this repo. The HuggingFace website provides a GUI for its Inference API, which is more than enough for casual input/output testing. But to create a more personal experience, I chose to use the medium of a discord bot.
On the discord developer portal, as soon as you create a new bot, you will be shown a bot token. Be sure and copy it, as it is necessary to link the bot to your conversational model. On Replit, you can find a button to create a new NodeJS repl; I have included my replit files in the repl.it folder. index.js handles bot activity, while server.js handles keeping the bot online (the package files add the necessary javascript libraries). To make the repl run, you will need to access the Secrets tab and set two secrets:
DISCORD_TOKEN= the bot token you copied earlier.HF_TOKEN= the API key that you added to the end of thescottbot.ipynbfile
Once everything is configured, you should be able to hit the green "Run" button to launch your repl.
After your repl is launched, a webview should open with a cursory webpage declaring that your bot is alive. If you log in to UptimeRobot, you can add a new monitor by using the webview's URL and the http(s) monitor option to keep your repl (and thus your bot) up and running even after you close your repl tab.
Congratulations! You have successfully created an AI chatbot!
Inside of repl.it/index.js, the API requst I use is preloaded with parameters to change how the model responds to its text prompts. A full list of parameters can be found here, but here are the ones I chose to implement and the reasons for their incluson:
do_sample: forces randomized resultsuse_cache: prevents the bot from responding with identical answers in the same sessionno_repeat_ngram_size: prevent excessive word repetition (I set it to 5)top_k: limits the breadth of word choice (I set it to 100)max_time: returns afterintamount of time if the bot hasn't already respondedtemperature: while I did not include this one in my request code, this parameter has the most personality influence on output. Low temperature makes standard responses while high temperature contributes to wild responses. Standard temperature is usually between 0.7-0.9.
By changing the API request parameters, you can change the behavior of the model without retraining it.
I would love to see any cool modifications of this project! Feel free to fork this repo and create a pull request. If you are interested in seeing any future developments of this project, don't forget to star this repo. I have a few ideas for extending the functionality of the bot that I have yet to implement. :)
I have chosen the MIT License for this particular repo. Read up on the details here.
I could not have even begun the project without the aid of these talented people:
- Fabrizio Cominetti and his concise dataset
- Lynn Zheng and her AI expertise
- Beau Carnes and his JavaScript demos
- Rostyslav Neskorozhenyi, the origin of the specialized chatbot idea