Skip to content

Add Robust Backup System (Manual, Automatic, Integrity, and Restore)#276

Open
PStarH wants to merge 10 commits intomindverse:developfrom
PStarH:master
Open

Add Robust Backup System (Manual, Automatic, Integrity, and Restore)#276
PStarH wants to merge 10 commits intomindverse:developfrom
PStarH:master

Conversation

@PStarH
Copy link
Contributor

@PStarH PStarH commented Apr 24, 2025

Overview

This PR introduces a comprehensive backup system for the project, including:

  • Manual and automatic backup creation
  • Compression and encryption support
  • Integrity verification with SHA-256 checksums
  • Backup listing, restore, and deletion APIs
  • Automatic backup scheduling and cleanup
  • Seamless integration with model training and data directories

Key Features

1. Backup Core (BackupService, BackupIntegrity)

  • Manual backup : Create backups of key directories (resources, database, vector DB) via API or service call.
  • Compression : Optionally compress backup files using zlib, configurable via settings.
  • Encryption : Optionally encrypt backup files using Fernet symmetric encryption.
  • Integrity manifest : Each backup includes a manifest with checksums and file sizes for later verification.
  • Restore : Restore data from any backup, with decompression/decryption and integrity checks.
  • Delete : Remove backups by ID, with safe cleanup.

2. Automatic Backup (AutoBackupManager)

  • Pre-training backup : Automatically create a backup before model training starts.
  • Periodic backup : Schedule regular backups during long-running training jobs.
  • Failure handling : Retries with exponential backoff and logging.
  • Cleanup : Automatically remove old backups based on count and size limits.

3. API Integration

  • RESTful endpoints for creating, listing, restoring, and deleting backups.
  • Status and error reporting for all operations.

4. Configuration

  • All backup parameters (paths, compression, encryption, intervals, limits) are configurable via the main config system.

Integration

  • The backup system is fully integrated with the training workflow and data storage.
  • All model checkpoints, training data, and metadata are included in the backup if they reside in the configured resource/data directories.
  • Automatic backup is triggered at key points in the training lifecycle.

Usage

  • Manual backup : Trigger via API or service call.
  • Automatic backup : Enabled by default, configurable interval and retention.
  • Restore : Use API to restore any backup by ID.
  • Delete : Use API to delete old or unwanted backups.

Files Added/Modified

  • lpm_kernel/backup/backup_service.py
  • lpm_kernel/backup/backup_integrity.py
  • lpm_kernel/backup/auto_backup.py
  • lpm_kernel/backup/init.py
  • lpm_kernel/api/domains/backup/routes.py
  • (Plus config and logging integration)

PStarH added 7 commits April 24, 2025 12:00
Add API Authentication and improve security
Optimized backup and recovery processes with enhanced progress reporting and detailed error handling. Automatic backups are now more robust with improved monitoring, logging, and failure recovery. Backup status tracking is also refined to provide more detailed statistics and handle partial successes.
Optimized backup retries with smart exponential backoff and random jitter for efficiency.
Improved file compression with configurable levels (1-9), chunked large file processing, and compression ratio display.
Enhanced backup cleanup strategy with dual limits (quantity and size) and dynamic management.
Optimized log output to include more detailed backup status like compression ratio and size.
@kevin-mindverse kevin-mindverse changed the base branch from master to develop April 24, 2025 08:47
@kevin-mindverse
Copy link
Contributor

Hey there. Thank you for your contribution!

according the announcement published yesterday. I've been already redirect this pr to develop. and there's a little conflict to solve :)

@PStarH
Copy link
Contributor Author

PStarH commented Apr 24, 2025

The conflict is resolved now I think

@PStarH
Copy link
Contributor Author

PStarH commented May 23, 2025

The conflict is resolved now I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants