Skip to content

Performance: Redundant Hash Calculations #25

@kitzy

Description

@kitzy

Performance Inefficiency

Location: Lines 743-750 in FleetImporter/FleetImporter.py

Issue: Downloads entire file from S3 just to calculate SHA256 hash - wasteful for large packages.

if package_was_uploaded:
    hash_sha256 = self._calculate_file_sha256(pkg_path)
else:
    hash_sha256 = self._calculate_s3_file_sha256(aws_s3_bucket, s3_key)

Impact:

  • Unnecessary network bandwidth usage
  • Slower execution for large packages
  • Increased AWS data transfer costs

Current Implementation:
The _calculate_s3_file_sha256 method downloads the entire file just to hash it.

Recommended Fix:

  1. Store SHA256 hash as S3 object metadata during upload
  2. Retrieve hash from metadata instead of downloading file
  3. Use S3 ETag for verification (multipart uploads are different)
# During upload
s3_client.put_object(
    Bucket=bucket,
    Key=s3_key,
    Body=file_data,
    Metadata={
        'sha256': calculated_hash
    }
)

# During retrieval
head_response = s3_client.head_object(Bucket=bucket, Key=s3_key)
hash_sha256 = head_response['Metadata'].get('sha256')

Priority: Low-Medium - Optimization opportunity

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood first issueGood for newcomersperformancePerformance optimization or inefficiency

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions