Skip to content

Infrastructure: Lambda function for SFTP import processing #52

@krishagel

Description

@krishagel

Part of Epic #48

User Story

As a developer
I want to create a Lambda function that downloads and processes student data from SFTP
So that the system can import data automatically

Technical Requirements

Lambda Function Setup

Create CDK construct in infra/lib/student-data-import-stack.ts:

Configuration:

  • Runtime: Node.js 20.x
  • Memory: 1024 MB (adjust based on file size)
  • Timeout: 10 minutes (600 seconds)
  • Environment Variables:
    • SECRET_NAME: jockular-kangaroo/sftp-credentials
    • DATABASE_SECRET_ARN
    • DATABASE_ARN
    • LOG_LEVEL: info

IAM Permissions:

  • secretsmanager:GetSecretValue for SFTP credentials
  • rds-data:ExecuteStatement for database access
  • rds-data:BatchExecuteStatement for batch inserts
  • CloudWatch Logs write permissions

Lambda Handler Code

Create infra/student-data-import/lambda/index.ts:

import { RDSDataClient, ExecuteStatementCommand } from "@aws-sdk/client-rds-data"
import { SecretsManagerClient, GetSecretValueCommand } from "@aws-sdk/client-secrets-manager"
import Client from "ssh2-sftp-client"
import { parse } from "csv-parse/sync"
import { z } from "zod"

interface ImportEvent {
  jobId: number
  dryRun?: boolean
  triggeredBy: 'manual' | 'scheduled'
  userId?: number
}

export async function handler(event: ImportEvent) {
  // 1. Fetch SFTP credentials from Secrets Manager
  // 2. Connect to SFTP server
  // 3. Download file to /tmp
  // 4. Parse CSV file
  // 5. Validate each row with Zod
  // 6. Batch upsert to database
  // 7. Update job status in jobs table
  // 8. Return results
}

Dependencies (package.json)

{
  "dependencies": {
    "@aws-sdk/client-rds-data": "^3.893.0",
    "@aws-sdk/client-secrets-manager": "^3.893.0",
    "ssh2-sftp-client": "^12.0.1",
    "csv-parse": "^5.5.6",
    "zod": "^4.1.11"
  }
}

CSV Parser

Create infra/student-data-import/lambda/parser.ts:

import { parse } from "csv-parse/sync"
import { StudentDataRow, StudentDataRowSchema } from "./schemas"

export function parseStudentDataCsv(fileBuffer: Buffer): StudentDataRow[]
export function validateRow(row: unknown): StudentDataRow | { error: string }

Data Validation Schema

Create infra/student-data-import/lambda/schemas.ts:

import { z } from "zod"

export const StudentDataRowSchema = z.object({
  studentId: z.string(),
  ssid: z.string(),
  firstName: z.string(),
  lastName: z.string(),
  preferredName: z.string().optional(),
  absencesYtd: z.coerce.number().int().min(0).nullable(),
  tardiesYtd: z.coerce.number().int().min(0).nullable(),
  // ... all 56 fields with validation
})

export type StudentDataRow = z.infer<typeof StudentDataRowSchema>

Database Upsert Logic

Create infra/student-data-import/lambda/db.ts:

import { RDSDataClient, ExecuteStatementCommand, BatchExecuteStatementCommand } from "@aws-sdk/client-rds-data"

export async function upsertStudentData(
  client: RDSDataClient,
  rows: StudentDataRow[],
  jobId: number
): Promise<{ inserted: number; updated: number; errors: string[] }>

Use PostgreSQL INSERT ... ON CONFLICT (student_id) DO UPDATE SET ...

Structured Logging

Create infra/student-data-import/lambda/logger.ts:

interface LogContext {
  jobId: number
  phase: string
  [key: string]: unknown
}

export function createLogger(jobId: number)
export function logInfo(context: LogContext, message: string)
export function logError(context: LogContext, message: string, error?: Error)

Acceptance Criteria

  • CDK construct created for Lambda function
  • Lambda handler fetches SFTP credentials from Secrets Manager
  • Lambda connects to SFTP server using ssh2-sftp-client
  • Lambda downloads file to /tmp directory
  • CSV parser handles headers (case-insensitive matching)
  • Zod schema validates all 56 data fields
  • Database upsert uses ON CONFLICT to update existing records
  • Batch operations for performance (100 rows per batch)
  • Job status updated in jobs table (pending → running → completed/failed)
  • Structured logging to CloudWatch
  • Error handling with detailed messages
  • Unit tests for parser and validation

Testing

  • Test with sample CSV file (10 rows)
  • Test with large file (10,000 rows)
  • Test with invalid data (validation errors)
  • Test SFTP connection failures
  • Test database transaction rollback on errors
  • Verify CloudWatch logs are structured and searchable

Labels: enhancement

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions