Skip to content

AWS Lambda function that automatically backs up Google Drive files to S3 using the Changes API for incremental sync.

Notifications You must be signed in to change notification settings

tcampbell01/google-drive-to-s3

Repository files navigation

Google Drive to S3 Backup

AWS Lambda function that automatically backs up Google Drive files to Amazon S3.

Overview

This system backs up files from a specific Google Drive folder to S3 on a daily schedule. Two implementations are available: a simple folder sync (currently deployed) and an advanced incremental sync (available for future enhancement).

Current Implementation (Simple Version)

File: lambda_function.py

Features

  • Folder Sync: Backs up specific "Mac Backup" folder from Google Drive
  • Scheduled Execution: Runs daily at 6 AM Eastern via CloudWatch Events
  • Duplicate Prevention: Checks if files exist in S3 before uploading
  • Folder Structure: Maintains original Drive folder hierarchy in S3
  • Secure Storage: Credentials stored in AWS Secrets Manager
  • Binary Files Only: Skips Google Workspace files (Docs, Sheets, Slides)

Architecture (Current)

Architecture Diagram

The current system follows this flow:

  1. CloudWatch Events triggers Lambda daily at 6 AM Eastern
  2. Lambda authenticates using credentials from Secrets Manager
  3. Lambda scans "Mac Backup" folder in Google Drive
  4. Lambda downloads files that don't already exist in S3
  5. Lambda uploads files to S3 maintaining folder structure

Future Enhancement (Advanced Version)

File: drive_to_s3_backup.py

Additional Features

  • Incremental Sync: Uses Google Drive Changes API to only process modified files
  • Format Conversion: Converts Google Docs/Sheets/Slides to .docx/.xlsx/.pptx
  • State Management: Tracks sync progress via SSM Parameter Store
  • Stable S3 Keys: Prevents file duplication with consistent naming
  • Full Drive Sync: Can sync entire Drive, not just specific folders

Advanced Architecture Flow

  1. CloudWatch Events triggers Lambda on schedule
  2. Lambda authenticates using credentials from Secrets Manager
  3. Lambda checks SSM Parameter Store for last sync state
  4. Lambda queries Google Drive Changes API for modified files
  5. Lambda downloads and converts files, then uploads to S3
  6. Lambda updates sync state in SSM Parameter Store

Quick Start

  1. Install dependencies:

    pip install -r requirements.txt
  2. Set up AWS resources:

    • Create S3 bucket: google-drivesync-backup
    • Store Google OAuth credentials in Secrets Manager
    • Create SSM parameter for state tracking
  3. Deploy to Lambda:

    zip -r function.zip drive_to_s3_backup.py
    # Upload to AWS Lambda

Configuration

Environment Variables

SECRET_ID = "drivesync/google-oauth"     # Secrets Manager secret ID
SSM_PARAM = "/drivesync/startPageToken"  # SSM parameter name
S3_BUCKET = "google-drivesync-backup"    # S3 bucket name
S3_PREFIX = "drivesync"                  # S3 key prefix

Google OAuth Setup

Store credentials in AWS Secrets Manager as JSON:

{
  "token": {
    "refresh_token": "your_refresh_token",
    "client_id": "your_client_id", 
    "client_secret": "your_client_secret",
    "token_uri": "https://oauth2.googleapis.com/token",
    "scopes": ["https://www.googleapis.com/auth/drive.readonly"]
  }
}

File Processing

Supported Conversions

  • Google Docs → .docx (Word)
  • Google Sheets → .xlsx (Excel)
  • Google Slides → .pptx (PowerPoint)

S3 Key Structure

s3://bucket/drivesync/folder/filename__file_id

AWS Permissions

Lambda execution role needs:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "secretsmanager:GetSecretValue",
        "ssm:GetParameter",
        "ssm:PutParameter"
      ],
      "Resource": "*"
    }
  ]
}

Scheduling

Set up CloudWatch Events:

  • Hourly: rate(1 hour)
  • Every 6 hours: cron(0 */6 * * ? *)

Monitoring

Return Values

{
  "status": "ok",
  "uploaded": 5,
  "skipped": 12
}

CloudWatch Logs

  • File processing details
  • Error messages
  • Performance metrics

Dependencies

  • google-auth: Google API authentication
  • google-api-python-client: Drive API client
  • boto3: AWS SDK

Related Projects

Part of a complete file management system:

License

MIT License

About

AWS Lambda function that automatically backs up Google Drive files to S3 using the Changes API for incremental sync.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages