Anonymous Author Matcher

Analyses the latent clues in writing styles and identifies the author if the database has content written by them.

Posts and comments from 50 redditors are processed as per dataset_maker.ipynb and written to ./data.
Reusable ETL pipeline for training is in redditors_comments_dataset.py.

Train a sequence encoder model to distinguish whether the two input sequences are written by same author by setting it up as a binary classification task.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
log		log
models		models
README.md		README.md
a_gru_bin_classifier.py		a_gru_bin_classifier.py
anon_author_identifier_report.pdf		anon_author_identifier_report.pdf
dataset_maker.ipynb		dataset_maker.ipynb
dbms.py		dbms.py
delta_method.py		delta_method.py
function_words.py		function_words.py
redditors_comments_dataset.py		redditors_comments_dataset.py
requirements.txt		requirements.txt

Provide feedback