Approach Google Summer of Code 2025 - gprMax

Project : AI Chatbot for support

This project aims to improve the user experience of gprMax, an open-source software for simulating electromagnetic wave propagation by modelling Ground Penetrating Radar (GPR) and electromagnetic wave propagation, through the development of an AI chatbot and assistant. The purpose of the AI chatbot is to answer and troubleshoot any questions that the gprMax users may have, while the AI assistant is capable of turning natural language into an input file in the required gprMax format. We leverage pretrained LLM models from OpenAI, the LangChain framework, RAG, and fine-tuning techniques, drawing on the existing gprMax documentation and years worth of discussion in the Google groups and the GitHub issue tracker for data. The ultimate goal is to deliver and deploy an AI chatbot and assistant to enhance the accessibility of gprMax, as well as save both the users and the development team valuable time by automating the troubleshooting process .

contributor : Vivek Sharma

Mentors: Iraklis Giannakis, Antonis Giannopoulos

What i have made :

Upload PDF

User selects and upload a DOCUMENT (PDF in this case)
The file is locally saved to document/pdfs

Extraction and Chunking of Test

PDFPlumberLoader extracts text from the PDF.
The text is split into overlapping chunks (1000 characters, 200 overlap) using RecursiveCharacterTextSplitter . It can be fine-tuned accordingly .

Indexing and Vectorizing

The extracted text chunks are converted into vector embeddings using OllamaEmbeddings.
These embeddings are stored in ChromaDB vector store .

Query Processing & Answer Generation

When user queries anything , the text is first converted to vector .
Similarity search finds the most relevant document chunks.
Here i have used open-source Large Language Models like

deepseek-r1:1.5b
llama3.2:3b
llama3.2:1b

SCREENSHOTS

IMPORTANT NOTE

The main aim of this repository it to show the scope and use case on how the open-source llms can be used as chatbot .
I have tried to implement that how Open-source LLM's and chroma DB vector can be used for storing embeddings .
The current code is not modular , the modular approach would be followed in actual codebase . similar to : https://github.com/eddieleejw/gprmax_chatbot
I have understood that what necessary changes needs to be made in actual codebase

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
chroma_db		chroma_db
documents		documents
src		src
static		static
.gitignore		.gitignore
DockerFile		DockerFile
FINE_TUNING.md		FINE_TUNING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
app.txt		app.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Approach Google Summer of Code 2025 - gprMax

What i have made :

SCREENSHOTS

IMPORTANT NOTE

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

uchiha-vivek/gpr_max-Chatbot

Folders and files

Latest commit

History

Repository files navigation

Approach Google Summer of Code 2025 - gprMax

What i have made :

SCREENSHOTS

IMPORTANT NOTE

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages