-
Notifications
You must be signed in to change notification settings - Fork 2
ErolOZKAN-/Language-Modelling
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
run python3 ___________
src/Runner_First.py -- Basic example with basic dataset (data/train.txt)
A simple dataset with three sentences is used.
Ngram models for these sentences are calculated. (Unigram, Bigram, Trigram, Add-one smoothing, good-turing smoothing)
Models are tested using some unigram, bigram, trigram word units.
New sentences are generated and perpexility score calculated.
src/Runner_Second.py -- Real dataset
Ngram models are built using Brown corpus.
Brown test dataset perpexility score calculated.
New sentences are generated and perpexility score calculated.
src/Runner_Interpolation.py -- Real dataset - Interpolation
Interplation model implemented.
Different lambda values are testted on validation dataset. (using brute-force approach - manually selected)
New perpexility scoring function using interpolated model is implemented.
Finally, best lambda value is used for Brown test dataset. (perpexility)
src/Runner_Discounting_Simple.py -- Basic example discounting
Discounting method is implemented.
For β = 0.5, new bigram and trigram models are calculated.
New perpexility scoring using discounted models are implemented. (Bigram and Trigram)About
N-gram Language Model
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published