This service is a component of the ScanFlow system, responsible for scanning files using ClamAV. It is designed to run as a Google Cloud Run Job or a standalone container.
- ClamAV Integration: Scans files for viruses and malware.
- Google Cloud Storage: Downloads files from a landing bucket and uploads clean files to a clean bucket.
- Pub/Sub Notification: Publishes messages to a Pub/Sub topic when an infected file is detected.
- Cloud Run Job Support: Can be triggered via Cloud Run Jobs for serverless execution.
scanner.py: Main logic for scanning files.trigger_service/: A Flask service to trigger the scanner job (e.g., from Pub/Sub push subscriptions).Dockerfile: Container definition including ClamAV and Python dependencies.entrypoint.sh: Entrypoint script to handle ClamAV database updates and execution.utils.py: Helper functions for GCS operations.clamav_updater.py: Script to update ClamAV definitions.
- Docker
- Google Cloud SDK (for deploying to GCP)
The following environment variables are required:
LANDING_BUCKET: Name of the GCS bucket where files are uploaded.CLEAN_BUCKET: Name of the GCS bucket where clean files are moved.INFECTED_TOPIC: Pub/Sub topic for infected file notifications.PROJECT_ID: GCP Project ID.REGION: GCP Region (for Cloud Run).
-
Build the Docker image:
docker build -t scanner . -
Run the container (ensure you have credentials mounted or set up):
docker run -e LANDING_BUCKET=... -e CLEAN_BUCKET=... scanner