Skip to content

Pauly-DData/Data-Scraping-with-Selenium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Data-Scraping-with-Selenium

Web Scraping for Data Collection

This project is a part of a master's thesis which aims to collect data on standard questions available on websites like Datacamp. The main tools used for this task are the Selenium and Pandas libraries.

Overview

  • Selenium WebDriver: A tool that automates web browsers, simulating user interactions with a website.
  • Pandas: A powerful data manipulation and analysis library.

Modules and Packages

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import pandas as pd

Workflow

URL Collection:

As a part of my master's thesis project, I needed to collect data on standard questions available on websites like Datacamp. For this assignment, I used the Selenium and Pandas libraries to complete the task.

The URLs are stored in a list called urls. The script imports the necessary modules and packages such as webdriver, Service, By, and pandas.

The Selenium WebDriver is a tool that automates web browsers and allows us to simulate user interactions with a website. The webdriver module is used to create a WebDriver instance for a specific browser. In this case, we are using Chrome, which is why we also import the Service module.

The By module is used to locate elements on a webpage using various methods like ID, class, tag name, etc.

The pandas module is used to manipulate and analyze data that has been scraped from the webpages.

The urls list contains a list of URLs to scrape. These URLs are all from the same GitHub repository, and each URL points to a specific Markdown file containing instructions for a different chapter of a Python data science course.

Data Extraction:

For each URL in the urls list, the script uses the WebDriver to load the page and extract specific data. The extracted data is then stored in a pandas DataFrame.

Data Export:

Finally, the data stored in the DataFrame is exported to a CSV file named 'Finale.csv'.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages