Skip to content

Analysis and data engineering for sportfishing reports in San Diego, based on Point Loma fleet data. Uses weather reports and temporal info to understand optimal times to go fishing

Notifications You must be signed in to change notification settings

Mlawrence95/sportfishing-hauls-scraper

Repository files navigation

Sportfishing Hauls Scraper

Scrapes data from San Diego's Independence Sportfishing, who posts catch reports from deep sea fishing trips such as this, then scrapes data from weather underground's San Diego Airport location.

The data collected allows us to ask interesting questions such as, "what is typical of a good fishing trip in each season?", or, "when are the peak months for sportfishing in San Diego?".

Posts from Independence Sportfishing by month

High Level Workflow

A rudimentary view of the project is this: we wish to correlate fish report information with weather. Thus we collect fish reports and weather data, store them in a MySQL database, then query the updated scraped data to run text analysis.

workflow

Interface

There is a long way and a short way to use this repo. You can replicate what I did here by setting up your own MySQL database and populating it using my tools, or use my checkpointed data in the CSV by reading raw_fishing_data.csv.

The short way

If you don't want to scrape the data again, I've dumped the data as of December 20th, 2020. It can be accessed like,

import pandas as pd
pd.read_csv("raw_fishing_data.csv")

or simply read into excel or your spreadsheet tool of choice.

The long way

This involves making a mysql database and waiting a couple days to scrape the data.

# TODO: Documentation of running the scrapers and populating the database

The populated MySQL database can be queried easily into a pandas object like,

from db_utils.db_queries import sql_all_reports_with_weather
from db_utils.mysql_db_connection import get_mysql_connection

db = get_mysql_connection()
fishing_data = sql_all_reports_with_weather(db)
fishing_data.head(3)

yielding a table with a schema like:

date_posted headline post_body low_temp avg_temp high_temp inches_precip miles_visible max_wind sea_pressure
0 2020-12-17 ON THE HUNT We were on the hunt for Yellowtail today but never able to connect. We... 46.0 57.34 64.0 0.0 10.0 10.0 30.08
1 2020-12-18 SCRATCHING AWAY Searched for Yellowtail again today to find non biters. Good action on bass, ... 52.0 58.04 66.0 0.0 10.0 9.0 30.23
2 2020-12-16 GOOD ACTION Another good day of fishing for the guys. It was slow... 43.0 56.38 72.0 0.0 10.0 9.0 30.17

About

Analysis and data engineering for sportfishing reports in San Diego, based on Point Loma fleet data. Uses weather reports and temporal info to understand optimal times to go fishing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published