Skip to content

MkYacine/NLP-playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Playground 🧠

Welcome to my NLP Playground! This repository documents my self-taught journey in Natural Language Processing and Deep Learning. Here, I implement research papers, conduct curiosity-driven experiments, and deploy models following MLOps industry standards.
The ressources for these experiments are limited. My main goal here is to learn.

Projects Overview

1. LLMs From Scratch ✅

A comprehensive implementation of modern Language Models from the ground up, guided by Sebastian Raschka's "Building LLMs from Scratch". This project covers:

  • Text preprocessing pipelines
  • Attention mechanism implementation and visualization
  • Complete transformer architecture
  • GPT-2 model implementation
  • Pretraining phase development
  • Finetuning for:
    • Text classification
    • Instruction following

References:

2. Financial NER with FiNER-ORD ✅

Implementation and experimentation with the Financial NER Open Research Dataset (FiNER-ORD), exploring various approaches to Named Entity Recognition in the financial domain.
Main notebook can be seen here

  • BERT finetuning on FiNER-ORD
  • Model deployment on AWS SageMaker using structured jobs (for reproducability and practice purposes)
  • Closed-source LLM
  • Base GLiNER model
  • Document findings and compare performance
  • Finetuned GLiNER model 🚧

References:

3. RAG 🚧

Implementation and experimentation with retrieval augmented generation, using the huggingface documentation as a knowledge base, trying different approaches in the RAG pipeline.

  • Embedding and Retrieval: Test different strategies for chunking, embedding, retrieval, and reranking
  • Generator: Test different generator models, prompts, and verification strategies.

References:

More to come in the future

Legend

  • ✅ Completed
  • 🚧 In Progress
  • 📋 Planned

About

A playground for me to experiment and implement the latest advances in NLP research

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors