Analyses the latent clues in writing styles and identifies the author if the database has content written by them.
- Posts and comments from 50 redditors are processed as per
dataset_maker.ipynband written to./data. - Reusable ETL pipeline for training is in
redditors_comments_dataset.py.
- Train a sequence encoder model to distinguish whether the two input sequences are written by same author by setting it up as a binary classification task.