Graph plotted from data in ER-wait-times.csv.
This project is designed to scrape emergency room wait times from a specified URL (https://www.lhsc.on.ca/adult-ed/emergency-department-wait-times), process the data, and log it into CSV files. The project includes functions for web scraping using different libraries, string manipulation, and logging.
To run the scraping and logging process, execute the app.py script. This will start an infinite loop that scrapes the data at specified intervals (default is every 15 minutes).
python app.pyEnsure you have the necessary libraries installed, which you can do using the following command:
pip install requests beautifulsoup4 cloudscraper requests_html pandas- Modify the default file names for data and log storage in the
mainfunction ofapp.py. - Adjust the
scrape_intervalvariable inapp.pyto change the frequency of scraping.
project/
│
├── request_soup.py # Contains various functions for web scraping using requests, cloudscraper, and requests_html.
├── helper_fns.py # Contains utility functions for string manipulation and HTTP request status formatting.
├── file_builders.py # Contains a function to log data to a CSV file, either creating a new file or appending to an existing one.
├── app.py # The main script that orchestrates the scraping, processing, and logging of emergency room wait times.
├── plot_csv.py # Plots the data from the csv file generated by app.py.
└── README.md # Project readme file.
- linkToSoup_scrapingAnt: Uses the ScrapingAnt API to fetch and parse a webpage, optionally using a proxy country and CSS selector.
- linkToSoup: Fetches and parses a webpage using the requests library, with optional configurations for headers, cookies, etc.
- linkToSoup_h: Fetches and parses a webpage using the requests_html library.
- linkToSoup_c: Fetches and parses a webpage using the cloudscraper library to bypass anti-bot measures.
- stripStr: Removes extra whitespace from a string.
- truncateIfLong: Truncates a string to a specified maximum length, adding '...' if the string is too long.
- miniStr: Converts an object to a string, removes extra whitespace, joins lines and words with specified separators, and optionally truncates the string.
- reqStatus: Formats the status of an HTTP request as a string, including status code, reason, elapsed time, and URL.
- log_data: Logs data to a CSV file. Creates a new file if necessary, or appends to an existing file.
- main: Scrapes emergency room wait times from a specified URL and logs the data. Handles scenarios including successful data retrieval, warnings for unexpected data formats, and errors during the scraping process.
This project is licensed under the MIT License. See the LICENSE file for details.
