Skip to content

sinemistoktas/project-big-data

Repository files navigation

project-big-data

Amazon Books Data Analysis Project

Overview

This repository contains the code and analysis for the Amazon Books Data Analysis Project conducted as part of the course X_400645: Project Big Data during the P6/2024 period of Vrije Universiteit Amsterdam. The analysis aims to explore and uncover trends in books, authors, and publishers, as well as the behaviors of Amazon Books users based on their age and location.

Authors

  • Sinemis Toktaş
  • Arda Cem Çakmak
  • Isabelle de Beijer
  • Haojia Lu

Course Information

  • Course Name: X_400645: Project Big Data
  • Period/Year: P6/2024
  • Instructor: Dr. Alessandro Zocca

Contents

Report

The full report is available in the Amazon_Books_Report.pdf file, detailing the following sections:

  1. Introduction: Overview of Amazon Books and the objectives of our analysis.
  2. Dataset Description: Description of the datasets used (books.csv, ratings.csv, and users.csv) and their structure.
  3. Data Cleaning: Steps taken to clean the datasets, including handling missing values and filtering erroneous data.
  4. Exploratory Data Analysis (EDA): In-depth analysis of the data to answer specific research questions, including:
    • Best & Worst Authors, Publishers, and Books for All and for Different Age Groups
    • The Effect of Location and Age on User Ratings
    • Gender Bias in Author Ratings
    • Sentiment Analysis of Book Titles
    • Prediction and Recommendation Models
  5. Conclusion: Summary of findings and implications of the analysis.
  6. References: Sources and references used in the report.

Code

The code used for the analysis is provided in the Jupyter Notebook Amazon_Books_Analysis.ipynb. The notebook includes:

  • Data loading and cleaning
  • Exploratory data analysis and visualizations
  • Statistical tests and models

Midway Presentation

The project presentation, reflecting the state of the project midway through, is available in the Amazon_Books_Presentation.pdf file. This presentation provides a snapshot of our interim findings, methodology, and progress at that point in time.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published