Skip to content

A comprehensive set of Jupyter notebooks that take you from NLP fundamentals to advanced techniques. Covers text preprocessing, POS tagging, NER, sentiment analysis (with VADER), text classification, word embeddings, and transformer models like BERT. Built with real-world datasets using NLTK, spaCy, scikit-learn, and Hugging Face Transformers.

License

Notifications You must be signed in to change notification settings

prakash-ukhalkar/NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Language Processing using Python Programming

Project Status License Environment
Active Development License: MIT Python

Introduction & Project Vision

Welcome to Natural Language Processing (NLP)!

This repository serves as a beginner-friendly, step-by-step guide to mastering Natural Language Processing (NLP) using the Python programming language. My approach is uniquely focused on Practical Learning, Code Implementation, and Concept Understanding, providing comprehensive insights through hands-on examples and real-world datasets.

Whether you're a student, a self-learner, or someone transitioning into data science, this repo provides a clear, structured path to understanding the fundamental concepts of NLP.

Focus Areas

  • NLTK & SpaCy Mastery: Deep-dive into NLP libraries like NLTK, SpaCy, and text preprocessing techniques.
  • Text Processing: Feature engineering, tokenization, stemming, lemmatization, and text vectorization.
  • Algorithm Implementation: Step-by-step implementation of classic NLP algorithms with detailed explanations.
  • Storytelling: Every analysis is accompanied by clear, educational markdown explanations and practical business applications.

Repository Structure

The project is organized as a sequential learning path via Jupyter Notebooks.

NLP/
│
├── README.md                                    <- This file
├── LICENSE.md                                   <- Project's MIT License
├── requirements.txt                             <- Python dependencies
├── app/                                         <- Flask deployment code
├── data/
│   ├── raw/                                     <- Original, unmodified datasets
│   └── processed/                               <- Cleaned and feature-engineered data
├── models/                                      <- Trained models and serialized objects
└── notebooks/                                   <- All course notebooks, organized by chapter
    ├── 01_chapter_introduction/                 <- NLP Pipeline, Environment Setup
    ├── 02_chapter_preprocessing/                <- Tokenization, Stemming, Lemmatization
    ├── 03_chapter_corpora/                      <- NLTK Corpora, Data Loading
    ├── 04_chapter_pos_syntax/                   <- POS Tagging, Dependency Parsing
    ├── 05_chapter_ner/                          <- Named Entity Recognition
    ├── 06_chapter_vectorization/                <- BoW, TF-IDF, Feature Engineering
    ├── 07_chapter_sentiment/                    <- Sentiment Analysis (VADER, ML)
    ├── 08_chapter_classification/               <- Text Classification, Model Evaluation
    ├── 09_chapter_embeddings/                   <- Word2Vec, GloVe, Embeddings
    ├── 10_chapter_advanced/                     <- Transformers, BERT, Fine-tuning
    └── 11_chapter_capstone/                     <- Capstone Projects, Deployment

Getting Started

To run the notebooks locally, follow these steps.

1. Prerequisites

  • Python: Version 3.8 or higher.
  • Git: For cloning the repository.

2. Setup Instructions

  1. Clone the repository:

    git clone https://github.com/prakash-ukhalkar/NLP.git
    cd NLP
  2. Create and activate a virtual environment (Recommended):

    # Using venv (standard Python)
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Launch Jupyter:

    jupyter notebook
    # OR
    jupyter lab

3. Running the Analysis

Start with the notebook 01_chapter_introduction and proceed sequentially through the numbered directories.


Notebooks: A Detailed Roadmap

Notebook Title Key Learning Outcomes
01 Introduction & Setup NLP Pipeline Overview, NLTK/SpaCy installation, environment setup, and first NLP program.
02 Text Preprocessing Tokenization techniques, stopwords removal, stemming vs. lemmatization, and text normalization.
03 Working with Data NLTK corpora exploration, loading real-world datasets with pandas, and exploratory data analysis.
04 POS Tagging & Syntax Part-of-speech tagging, dependency parsing with SpaCy, syntax trees, and grammar analysis.
05 Named Entity Recognition NER fundamentals, using pre-trained SpaCy models, custom entity creation, and entity linking.
06 Text Vectorization Bag of Words (BoW) implementation, TF-IDF mathematical intuition, and scikit-learn vectorizers.
07 Sentiment Analysis Lexicon-based VADER (NLTK), machine learning for sentiment, and multi-class classification.
08 Text Classification Classification pipeline design, Naive Bayes & Logistic Regression, and model evaluation metrics.
09 Word Embeddings Word2Vec with Gensim, GloVe embeddings, cosine similarity, t-SNE visualization, and semantic search.
10 Advanced Topics Introduction to Transformers (BERT), fine-tuning with Hugging Face, and Flask deployment.
11 Capstone Projects Text summarization project, chatbot development, final project showcase, and portfolio development.

Dependencies

The core libraries used are:

  • nltk
  • spacy
  • pandas
  • numpy
  • scikit-learn
  • matplotlib
  • seaborn
  • gensim
  • transformers
  • flask
  • jupyter

Contributions

Contributions are welcome! If you'd like to improve examples, add topics, or fix something, feel free to open a pull request.

Happy Learning!


Author

Natural Language Processing (NLP) is created and maintained by Prakash Ukhalkar


Built with ❤️ for the Python community

About

A comprehensive set of Jupyter notebooks that take you from NLP fundamentals to advanced techniques. Covers text preprocessing, POS tagging, NER, sentiment analysis (with VADER), text classification, word embeddings, and transformer models like BERT. Built with real-world datasets using NLTK, spaCy, scikit-learn, and Hugging Face Transformers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages