Data retrieval is a Python / Jupyter Notebook project that implements the following:
- Web scraping (Part1)
- Punctuation and stopword removal (Part2)
- Creating an inverted index for all remaining words (Part3)
- Boolean Retrieval search (Part4a | Part4b)
- Vector Space Model search (Part4b)
- Probabilistic Retrieval search (Part4b)
- Assessment like: (Part5)
- Precision
- Recall
- F1
- Mean Average Precision
In order to run the code please run all code in a row. Make sure you have the following installed before running the project
pip install nltk
pip install scikit-learn
pip install rank-bm25
pip install import-ipynb
If you are running the code for the first time, please make sure to uncomment the following line and comment it again once
nltk.download('stopwords')