Skip to content

Commit ab01b02

Browse files
authored
feat: implement Python storage cleaner with Appwrite integration (#338)
* feat: implement Python storage cleaner with Appwrite integration * feat: update Appwrite service and improve error handling in storage cleaner * fix: improve error handling for failed file deletions and handle missing API key gracefully * feat: Implement concurrent file deletion, improve error handling for missing environment variables, and standardize JSON responses. * feat: Introduce local development utilities, including a storage population script and a local runner for the storage cleaner.
1 parent 512c5db commit ab01b02

File tree

6 files changed

+356
-0
lines changed

6 files changed

+356
-0
lines changed

python/storage-cleaner/.gitignore

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
share/python-wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
MANIFEST
28+
29+
# PyInstaller
30+
# Usually these files are written by a python script from a template
31+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
32+
*.manifest
33+
*.spec
34+
35+
# Installer logs
36+
pip-log.txt
37+
pip-delete-this-directory.txt
38+
39+
# Unit test / coverage reports
40+
htmlcov/
41+
.tox/
42+
.nox/
43+
.coverage
44+
.coverage.*
45+
.cache
46+
nosetests.xml
47+
coverage.xml
48+
*.cover
49+
*.py,cover
50+
.hypothesis/
51+
.pytest_cache/
52+
cover/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
.pybuilder/
76+
target/
77+
78+
# Jupyter Notebook
79+
.ipynb_checkpoints
80+
81+
# IPython
82+
profile_default/
83+
ipython_config.py
84+
85+
# pyenv
86+
# For a library or package, you might want to ignore these files since the code is
87+
# intended to run in multiple environments; otherwise, check them in:
88+
# .python-version
89+
90+
# pipenv
91+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
93+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
94+
# install all needed dependencies.
95+
#Pipfile.lock
96+
97+
# poetry
98+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
99+
# This is especially recommended for binary packages to ensure reproducibility, and is more
100+
# commonly ignored for libraries.
101+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102+
#poetry.lock
103+
104+
# pdm
105+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106+
#pdm.lock
107+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108+
# in version control.
109+
# https://pdm.fming.dev/#use-with-ide
110+
.pdm.toml
111+
112+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
113+
__pypackages__/
114+
115+
# Celery stuff
116+
celerybeat-schedule
117+
celerybeat.pid
118+
119+
# SageMath parsed files
120+
*.sage.py
121+
122+
# Environments
123+
.env
124+
.venv
125+
env/
126+
venv/
127+
ENV/
128+
env.bak/
129+
venv.bak/
130+
131+
# Spyder project settings
132+
.spyderproject
133+
.spyproject
134+
135+
# Rope project settings
136+
.ropeproject
137+
138+
# mkdocs documentation
139+
/site
140+
141+
# mypy
142+
.mypy_cache/
143+
.dmypy.json
144+
dmypy.json
145+
146+
# Pyre type checker
147+
.pyre/
148+
149+
# pytype static type analyzer
150+
.pytype/
151+
152+
# Cython debug symbols
153+
cython_debug/
154+
155+
# PyCharm
156+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
157+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
158+
# and can be added to the global gitignore or merged into this file. For a more nuclear
159+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
160+
#.idea/
161+
162+
# Directory used by Appwrite CLI for local development
163+
.appwrite

python/storage-cleaner/README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# 🧹 Python Storage Cleaner Function
2+
3+
Storage cleaner function to remove all files older than X number of days from the specified bucket.
4+
5+
## 🧰 Usage
6+
7+
### GET /
8+
9+
Remove files older than X days from the specified bucket
10+
11+
**Response**
12+
13+
Sample `200` Response: Buckets cleaned
14+
15+
## ⚙️ Configuration
16+
17+
| Setting | Value |
18+
| ----------------- | --------------------------------- |
19+
| Runtime | Python (3.9) |
20+
| Entrypoint | `src/main.py` |
21+
| Build Commands | `pip install -r requirements.txt` |
22+
| Permissions | `any` |
23+
| CRON | `0 1 * * *` |
24+
| Timeout (Seconds) | 15 |
25+
26+
## 🔒 Environment Variables
27+
28+
### RETENTION_PERIOD_DAYS
29+
30+
The number of days you want to retain a file.
31+
32+
| Question | Answer |
33+
| ------------ | ------ |
34+
| Required | Yes |
35+
| Sample Value | `1` |
36+
37+
### APPWRITE_BUCKET_ID
38+
39+
The ID of the bucket from which the files are to be deleted.
40+
41+
| Question | Answer |
42+
| ------------ | -------------- |
43+
| Required | Yes |
44+
| Sample Value | `652d...b4daf` |
45+
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
appwrite
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
"""
2+
Appwrite service module to handle storage cleanup operations.
3+
"""
4+
5+
import os
6+
from appwrite.client import Client
7+
from appwrite.services.storage import Storage
8+
from appwrite.query import Query
9+
from .utils import get_expiry_date
10+
from concurrent.futures import ThreadPoolExecutor, as_completed
11+
12+
13+
class AppwriteService:
14+
"""
15+
Service class to interact with Appwrite's storage service.
16+
"""
17+
18+
def __init__(self, api_key: str):
19+
client = (
20+
Client()
21+
.set_endpoint(os.getenv("APPWRITE_FUNCTION_API_ENDPOINT"))
22+
.set_project(os.getenv("APPWRITE_FUNCTION_PROJECT_ID"))
23+
.set_key(api_key)
24+
)
25+
self.storage = Storage(client)
26+
27+
def clean_bucket(self, bucket_id: str):
28+
"""
29+
Clean up files from the storage bucket by removing files older than a
30+
specified retention period.
31+
32+
:param bucket_id: The ID of the storage bucket to clean.
33+
"""
34+
queries = [
35+
Query.less_than("$createdAt", get_expiry_date()),
36+
Query.limit(25),
37+
]
38+
39+
deleted_files_count = 0
40+
failed_files = []
41+
42+
while True:
43+
try:
44+
response = self.storage.list_files(bucket_id, queries)
45+
except Exception as e:
46+
raise RuntimeError(
47+
f"Failed to list files from bucket {bucket_id}: {str(e)}"
48+
) from e
49+
files = response.get("files", [])
50+
51+
if not files:
52+
break
53+
54+
batch_failed = False
55+
with ThreadPoolExecutor() as executor:
56+
future_to_file = {
57+
executor.submit(self.storage.delete_file, bucket_id, f.get("$id")): f
58+
for f in files
59+
if f.get("$id")
60+
}
61+
62+
for future in as_completed(future_to_file):
63+
file_info = future_to_file[future]
64+
file_id = file_info.get("$id")
65+
try:
66+
future.result()
67+
deleted_files_count += 1
68+
except Exception as e:
69+
failed_files.append({"id": file_id, "error": str(e)})
70+
batch_failed = True
71+
72+
if batch_failed:
73+
break
74+
75+
if failed_files:
76+
raise RuntimeError(
77+
f"Deleted {deleted_files_count} files, but failed to delete {len(failed_files)} files: {failed_files}"
78+
)

python/storage-cleaner/src/main.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
import os
2+
from .appwrite_service import AppwriteService
3+
from .utils import throw_if_missing
4+
5+
6+
def main(context):
7+
try:
8+
throw_if_missing(os.environ, ["RETENTION_PERIOD_DAYS", "APPWRITE_BUCKET_ID"])
9+
except ValueError as e:
10+
return context.res.json({"error": str(e)}, 500)
11+
12+
api_key = context.req.headers.get("x-appwrite-key")
13+
14+
if not api_key:
15+
return context.res.json(
16+
{"error": "Missing API key in x-appwrite-key header"}, 401
17+
)
18+
19+
appwrite = AppwriteService(api_key)
20+
21+
try:
22+
appwrite.clean_bucket(os.environ["APPWRITE_BUCKET_ID"])
23+
return context.res.json({"message": "Buckets cleaned"}, 200)
24+
except ValueError as e:
25+
return context.res.json({"error": str(e)}, 400)
26+
except Exception as e:
27+
print(f"Error cleaning bucket: {e}")
28+
return context.res.json({"error": "Failed to clean bucket"}, 500)
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
"""
2+
Utility functions for storage cleaner.
3+
Includes functions for calculating expiry dates and validating required fields.
4+
"""
5+
6+
import os
7+
from datetime import datetime, timedelta, timezone
8+
9+
def get_expiry_date():
10+
"""
11+
Returns a date subtracted by the retention period from the current date.
12+
The retention period is fetched from the RETENTION_PERIOD_DAYS environment variable.
13+
Defaults to 30 days if the environment variable is not set or invalid.
14+
15+
:return: The calculated expiry date in ISO 8601 format.
16+
"""
17+
try:
18+
retention_period = int(os.getenv("RETENTION_PERIOD_DAYS", "30"))
19+
except ValueError:
20+
retention_period = 30
21+
22+
expiry_date = datetime.now(timezone.utc) - timedelta(days=retention_period)
23+
return expiry_date.isoformat() + "Z"
24+
25+
26+
def throw_if_missing(obj, keys):
27+
"""
28+
Throws an error if any of the keys are missing from the dictionary or None/0.
29+
30+
:param obj: Dictionary to check
31+
:param keys: List of required keys
32+
:raises ValueError: If required keys are missing
33+
"""
34+
missing = []
35+
for key in keys:
36+
# Disallow 0 retention to prevent immediate deletion of objects, which can cause data loss.
37+
if key not in obj or obj[key] is None or obj[key] == 0:
38+
missing.append(key)
39+
40+
if missing:
41+
raise ValueError(f"Missing required fields: {', '.join(missing)}")

0 commit comments

Comments
 (0)