Skip to content

PetrosIbrah/Data-Retrieval

Repository files navigation

What is this project?

Data retrieval is a Python / Jupyter Notebook project that implements the following:

  • Web scraping (Part1)
  • Punctuation and stopword removal (Part2)
  • Creating an inverted index for all remaining words (Part3)
  • Boolean Retrieval search (Part4a | Part4b)
  • Vector Space Model search (Part4b)
  • Probabilistic Retrieval search (Part4b)
  • Assessment like: (Part5)
    • Precision
    • Recall
    • F1
    • Mean Average Precision

Requirements

In order to run the code please run all code in a row. Make sure you have the following installed before running the project

pip install nltk
pip install scikit-learn
pip install rank-bm25
pip install import-ipynb

If you are running the code for the first time, please make sure to uncomment the following line and comment it again once

nltk.download('stopwords')

Releases

No releases published

Packages

No packages published