Skip to content

Commit 80b4ddf

Browse files
authored
Update README.md
1 parent 746524d commit 80b4ddf

File tree

1 file changed

+15
-0
lines changed

1 file changed

+15
-0
lines changed

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,19 @@
11
# Advertisement Topic Modeling & Classifcation
22
Creating topics from copy of advertisements, then attempting to predict said topics.
33

4+
## Why Ads?
5+
* Wanted to focus on a topic that was relevant to a domain I have experience in, but also be creative and different!
6+
* Most marketing data science projects involve performance data, but in lieu of that I thought analyzing ad copy seemed fun and unique.
7+
8+
## Table of Contents
9+
1. [Summary](#summary)
10+
2. [Obtaining the Data](#obtaining-the-data)
11+
3. [Cleaning](#cleaning)
12+
4. [Exploratory Analysis](#exploratory-analysis)
13+
5. [Topic Modeling](#topic-modeling)
14+
6. [Predicting Topics](#predicting-topics)
15+
7. [Next Steps](#next-steps)
16+
417
## Summary
518
* Scraped approximately 146,000 advertisements from [welovead.com](welovead.com) and stored in MongoDB
619
* Initially planned to predict industry of an ad, but through EDA discovered that there was a lot of overlap so new topics would need to be created
@@ -64,3 +77,5 @@ Creating topics from copy of advertisements, then attempting to predict said top
6477
* Naturally topic modelling is meaningless if said topics don't make any sense! So trying to account for this is absolutely essential
6578
* More text data is needed, so I plan to use LDA on descriptions since there was so much more text to train on in there
6679
* Another option would to be gather more ad copy data, but since I scraped my initial source for everything it had, I would need to find another one, which might be impossible without paying for it
80+
* Experimenting more with non-negative matrix factorization - was unable to automatically tune using randomized search
81+
* Look into biterm topic modeling and other methods that are more ideal for processing shorter documents

0 commit comments

Comments
 (0)