Scrapes data from San Diego's Independence Sportfishing, who posts catch reports from deep sea fishing trips such as this, then scrapes data from weather underground's San Diego Airport location.
The data collected allows us to ask interesting questions such as, "what is typical of a good fishing trip in each season?", or, "when are the peak months for sportfishing in San Diego?".
A rudimentary view of the project is this: we wish to correlate fish report information with weather. Thus we collect fish reports and weather data, store them in a MySQL database, then query the updated scraped data to run text analysis.
There is a long way and a short way to use this repo. You can replicate what I did here
by setting up your own MySQL database and populating it using my tools, or use my checkpointed data in the CSV
by reading raw_fishing_data.csv.
If you don't want to scrape the data again, I've dumped the data as of December 20th, 2020. It can be accessed like,
import pandas as pd
pd.read_csv("raw_fishing_data.csv")
or simply read into excel or your spreadsheet tool of choice.
This involves making a mysql database and waiting a couple days to scrape the data.
# TODO: Documentation of running the scrapers and populating the database
The populated MySQL database can be queried easily into a pandas object like,
from db_utils.db_queries import sql_all_reports_with_weather
from db_utils.mysql_db_connection import get_mysql_connection
db = get_mysql_connection()
fishing_data = sql_all_reports_with_weather(db)
fishing_data.head(3)yielding a table with a schema like:
| date_posted | headline | post_body | low_temp | avg_temp | high_temp | inches_precip | miles_visible | max_wind | sea_pressure | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2020-12-17 | ON THE HUNT | We were on the hunt for Yellowtail today but never able to connect. We... | 46.0 | 57.34 | 64.0 | 0.0 | 10.0 | 10.0 | 30.08 |
| 1 | 2020-12-18 | SCRATCHING AWAY | Searched for Yellowtail again today to find non biters. Good action on bass, ... | 52.0 | 58.04 | 66.0 | 0.0 | 10.0 | 9.0 | 30.23 |
| 2 | 2020-12-16 | GOOD ACTION | Another good day of fishing for the guys. It was slow... | 43.0 | 56.38 | 72.0 | 0.0 | 10.0 | 9.0 | 30.17 |

