|
1 | 1 | # Advertisement Topic Modeling & Classifcation |
2 | 2 | Creating topics from copy of advertisements, then attempting to predict said topics. |
3 | 3 |
|
| 4 | +## Why Ads? |
| 5 | +* Wanted to focus on a topic that was relevant to a domain I have experience in, but also be creative and different! |
| 6 | +* Most marketing data science projects involve performance data, but in lieu of that I thought analyzing ad copy seemed fun and unique. |
| 7 | + |
| 8 | +## Table of Contents |
| 9 | +1. [Summary](#summary) |
| 10 | +2. [Obtaining the Data](#obtaining-the-data) |
| 11 | +3. [Cleaning](#cleaning) |
| 12 | +4. [Exploratory Analysis](#exploratory-analysis) |
| 13 | +5. [Topic Modeling](#topic-modeling) |
| 14 | +6. [Predicting Topics](#predicting-topics) |
| 15 | +7. [Next Steps](#next-steps) |
| 16 | + |
4 | 17 | ## Summary |
5 | 18 | * Scraped approximately 146,000 advertisements from [welovead.com](welovead.com) and stored in MongoDB |
6 | 19 | * Initially planned to predict industry of an ad, but through EDA discovered that there was a lot of overlap so new topics would need to be created |
@@ -64,3 +77,5 @@ Creating topics from copy of advertisements, then attempting to predict said top |
64 | 77 | * Naturally topic modelling is meaningless if said topics don't make any sense! So trying to account for this is absolutely essential |
65 | 78 | * More text data is needed, so I plan to use LDA on descriptions since there was so much more text to train on in there |
66 | 79 | * Another option would to be gather more ad copy data, but since I scraped my initial source for everything it had, I would need to find another one, which might be impossible without paying for it |
| 80 | +* Experimenting more with non-negative matrix factorization - was unable to automatically tune using randomized search |
| 81 | +* Look into biterm topic modeling and other methods that are more ideal for processing shorter documents |
0 commit comments