-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Part of Epic #48
User Story
As a developer
I want to create a Lambda function that downloads and processes student data from SFTP
So that the system can import data automatically
Technical Requirements
Lambda Function Setup
Create CDK construct in infra/lib/student-data-import-stack.ts:
Configuration:
- Runtime: Node.js 20.x
- Memory: 1024 MB (adjust based on file size)
- Timeout: 10 minutes (600 seconds)
- Environment Variables:
SECRET_NAME: jockular-kangaroo/sftp-credentialsDATABASE_SECRET_ARNDATABASE_ARNLOG_LEVEL: info
IAM Permissions:
secretsmanager:GetSecretValuefor SFTP credentialsrds-data:ExecuteStatementfor database accessrds-data:BatchExecuteStatementfor batch inserts- CloudWatch Logs write permissions
Lambda Handler Code
Create infra/student-data-import/lambda/index.ts:
import { RDSDataClient, ExecuteStatementCommand } from "@aws-sdk/client-rds-data"
import { SecretsManagerClient, GetSecretValueCommand } from "@aws-sdk/client-secrets-manager"
import Client from "ssh2-sftp-client"
import { parse } from "csv-parse/sync"
import { z } from "zod"
interface ImportEvent {
jobId: number
dryRun?: boolean
triggeredBy: 'manual' | 'scheduled'
userId?: number
}
export async function handler(event: ImportEvent) {
// 1. Fetch SFTP credentials from Secrets Manager
// 2. Connect to SFTP server
// 3. Download file to /tmp
// 4. Parse CSV file
// 5. Validate each row with Zod
// 6. Batch upsert to database
// 7. Update job status in jobs table
// 8. Return results
}Dependencies (package.json)
{
"dependencies": {
"@aws-sdk/client-rds-data": "^3.893.0",
"@aws-sdk/client-secrets-manager": "^3.893.0",
"ssh2-sftp-client": "^12.0.1",
"csv-parse": "^5.5.6",
"zod": "^4.1.11"
}
}CSV Parser
Create infra/student-data-import/lambda/parser.ts:
import { parse } from "csv-parse/sync"
import { StudentDataRow, StudentDataRowSchema } from "./schemas"
export function parseStudentDataCsv(fileBuffer: Buffer): StudentDataRow[]
export function validateRow(row: unknown): StudentDataRow | { error: string }Data Validation Schema
Create infra/student-data-import/lambda/schemas.ts:
import { z } from "zod"
export const StudentDataRowSchema = z.object({
studentId: z.string(),
ssid: z.string(),
firstName: z.string(),
lastName: z.string(),
preferredName: z.string().optional(),
absencesYtd: z.coerce.number().int().min(0).nullable(),
tardiesYtd: z.coerce.number().int().min(0).nullable(),
// ... all 56 fields with validation
})
export type StudentDataRow = z.infer<typeof StudentDataRowSchema>Database Upsert Logic
Create infra/student-data-import/lambda/db.ts:
import { RDSDataClient, ExecuteStatementCommand, BatchExecuteStatementCommand } from "@aws-sdk/client-rds-data"
export async function upsertStudentData(
client: RDSDataClient,
rows: StudentDataRow[],
jobId: number
): Promise<{ inserted: number; updated: number; errors: string[] }>Use PostgreSQL INSERT ... ON CONFLICT (student_id) DO UPDATE SET ...
Structured Logging
Create infra/student-data-import/lambda/logger.ts:
interface LogContext {
jobId: number
phase: string
[key: string]: unknown
}
export function createLogger(jobId: number)
export function logInfo(context: LogContext, message: string)
export function logError(context: LogContext, message: string, error?: Error)Acceptance Criteria
- CDK construct created for Lambda function
- Lambda handler fetches SFTP credentials from Secrets Manager
- Lambda connects to SFTP server using ssh2-sftp-client
- Lambda downloads file to /tmp directory
- CSV parser handles headers (case-insensitive matching)
- Zod schema validates all 56 data fields
- Database upsert uses ON CONFLICT to update existing records
- Batch operations for performance (100 rows per batch)
- Job status updated in jobs table (pending → running → completed/failed)
- Structured logging to CloudWatch
- Error handling with detailed messages
- Unit tests for parser and validation
Testing
- Test with sample CSV file (10 rows)
- Test with large file (10,000 rows)
- Test with invalid data (validation errors)
- Test SFTP connection failures
- Test database transaction rollback on errors
- Verify CloudWatch logs are structured and searchable
Labels: enhancement
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request