python script to scrape reviews from tripadvisor. perform sentiment analysis and word frequency count.
this built upon cs50 sentiments assignment.
https://captmomo.github.io/tripadvisor-singapore-zoo/
Made a python script to scrape reviews from Tripadvisor and process the raw text. And another script to do a sentiment analysis and word frequency count. I used the CS50 Sentiments project as a starting point. Scraping was done with a combination of beautiful soup and selenium. The analysis was done using nltk
The results are quite different from what was on the tripadvisor page.
You may notice that there’s a difference between my results and tripadvisor’s most talked about topics. I think this is because they are counting the frequency of ngrams. What I did was after tokenizing the text, I singularized (is that a word?) the words and compared them to a list of animals I obtained from the Singapore Zoo wikipedia page.
| Word | Occurances |
|---|---|
| zoo | 20561 |
| animals | 12605 |
| singapore | 6001 |
| see | 5729 |
| day | 5504 |
| well | 5305 |
| great | 4819 |
| get | 4554 |
| good | 4076 |
| time | 3998 |
| around | 3962 |
| one | 3916 |
| visit | 3899 |
| safari | 3711 |
| kids | 3350 |
| night | 3180 |
| place | 3174 |
| also | 3099 |
| breakfast | 2854 |
| really | 2773 |
| best | 2722 |
| shows | 2633 |
| would | 2539 |
| take | 2520 |
| go | 2513 |
| like | 2493 |
| tram | 2393 |
| show | 2383 |
| many | 2364 |
| animal | 2349 |
| experience | 2228 |
| food | 2212 |
| worth | 2089 |
| close | 1942 |
| water | 1901 |
| park | 1899 |
| orangutans | 1783 |
| zoos | 1781 |
| walk | 1777 |
| much | 1758 |
| must | 1723 |
| feeding | 1691 |
| enclosures | 1570 |
| amazing | 1540 |
| lot | 1538 |
| nice | 1532 |
| area | 1530 |
| children | 1488 |
| lots | 1485 |
| bus | 1483 |
| Animal | Occurances |
|---|---|
| orangutan | 2160 |
| bear | 1523 |
| monkey | 1057 |
| lion | 674 |
| lemur | 271 |
| snake | 234 |
| kangaroo | 176 |
| baboon | 174 |
| hippo | 147 |
| zebra | 130 |
| penguin | 115 |
| cheetah | 112 |
| leopard | 100 |
| otter | 90 |
| komodo | 80 |
| sloth | 80 |
| dog | 76 |
| deer | 74 |
| goat | 66 |
| fox | 66 |
| parrot | 47 |
| lizard | 42 |
| python | 42 |
| tapir | 33 |
| tamarin | 32 |
| meerkat | 31 |
| flamingo | 27 |
| gibbon | 23 |
| rabbit | 19 |
| rat | 19 |
| cobra | 16 |
| mole | 14 |
| warthog | 12 |
| arapaima | 10 |
| iguana | 8 |
| panther | 8 |
| raccoon | 8 |
| pig | 7 |
| babirusa | 7 |
| hog | 7 |
| boa | 7 |
| saki | 5 |
| giraffe | 4 |
| falabella | 3 |
| rhinoceros | 3 |
| hippopotamus | 1 |
| terrapin | 1 |



