Skip to content

Epic: Automated Student Data Import System #48

@krishagel

Description

@krishagel

Product Requirements Document: Student Data Import System

1. Executive Summary

Problem Statement

The Jocular Kangaroo intervention tracking system currently lacks automated student demographic and assessment data. Staff must manually enter or update attendance records, behavioral incidents, i-Ready scores, and DIBELS assessments, creating data entry burden and increasing the risk of outdated or incorrect information that affects intervention decisions.

Proposed Solution

Build an automated student data import system that connects to an SFTP server to retrieve comprehensive student data files nightly. The system will provide administrators with manual import capabilities initially, with automatic nightly synchronization added subsequently. All imported data will overwrite existing records (no historical versioning) to maintain current assessment and attendance metrics.

Expected Outcomes

  • Efficiency: Eliminate manual data entry for 50+ student data fields
  • Accuracy: Ensure intervention decisions based on current, accurate data
  • Timeliness: Daily automatic updates keep all metrics current
  • Compliance: Maintain audit trails of all import operations

Timeline & Resources

  • Phase 1 (MVP): 3-4 weeks - Manual import capability
  • Phase 2: 1-2 weeks - Automated nightly imports
  • Engineers Required: 1 full-stack engineer, AWS/CDK experience preferred

2. User Personas & Stories

Primary Persona: District Administrator

  • Role: System administrator with full access
  • Jobs to be Done:
    • Configure SFTP connection settings securely
    • Manually trigger data imports to verify functionality
    • Monitor import job status and troubleshoot failures
    • Review import logs and audit trails
  • Pain Points:
    • No automated way to keep student data current
    • Manual entry of assessment scores is time-consuming and error-prone
    • Outdated attendance/behavior data affects intervention planning
  • Success Criteria:
    • Can configure SFTP settings in < 5 minutes
    • Manually triggered imports complete in < 2 minutes for typical datasets
    • Clear error messages when imports fail

User Stories

Story 1: SFTP Configuration

As a district administrator
I want to securely configure SFTP server credentials
So that the system can connect to our SIS data export location

Acceptance Criteria:

  • Given I am an administrator in the Settings section
  • When I enter SFTP host, port, username, and authentication credentials
  • Then credentials are encrypted and stored in AWS Secrets Manager
  • And I can test the connection before saving
  • And I receive clear error messages for connection failures

Story 2: Manual Data Import

As a district administrator
I want to manually trigger a student data import
So that I can verify the import process works correctly

Acceptance Criteria:

  • Given SFTP settings are configured
  • When I click "Import Student Data" button
  • Then the system fetches the file from SFTP server
  • And displays import progress (connecting, downloading, parsing, importing)
  • And shows success message with record count imported
  • And displays errors if the import fails with actionable guidance

Story 3: Student Data Table Structure

As a developer
I want to create a database table for student snapshot data
So that imported data has a structured storage location

Acceptance Criteria:

  • Given the schema definition for student_data_snapshots table
  • When a database migration is created and applied
  • Then the table contains all 56 data fields with appropriate types
  • And includes student_id as foreign key to students table
  • And includes import metadata (imported_at, import_job_id)
  • And supports upsert operations (replace existing data)

Story 4: File Parsing & Validation

As a developer
I want to parse and validate imported CSV/Excel files
So that only valid data is imported into the database

Acceptance Criteria:

  • Given a file downloaded from SFTP
  • When the parser processes the file
  • Then it validates required fields (Student ID, SSID)
  • And maps column headers to database fields (case-insensitive)
  • And validates data types (numbers, dates, enums)
  • And reports validation errors with row numbers
  • And supports both CSV and Excel formats

Story 5: Import Job Tracking

As a district administrator
I want to view import job history and status
So that I can monitor system reliability and troubleshoot issues

Acceptance Criteria:

  • Given import jobs have been executed
  • When I view the Import History page
  • Then I see a list of all import jobs with timestamps
  • And each job shows status (pending, running, completed, failed)
  • And I can view detailed logs for each job
  • And I can see record counts (total, inserted, updated, errors)

Story 6: Automated Nightly Import (Phase 2)

As a district administrator
I want to schedule automatic nightly imports
So that student data stays current without manual intervention

Acceptance Criteria:

  • Given SFTP settings are configured
  • When the scheduled time arrives (e.g., 2:00 AM daily)
  • Then the import runs automatically via EventBridge + Lambda
  • And sends email notifications on failure
  • And retries failed imports up to 3 times
  • And logs all operations for audit purposes

3. Feature Specifications

Functional Requirements

Priority Requirement Success Criteria
P0 (MVP) SFTP Configuration UI Settings page with SFTP fields, connection test, credential encryption via Secrets Manager
P0 (MVP) Manual Import Button Admin can trigger import, see progress, receive success/error feedback
P0 (MVP) Database Schema student_data_snapshots table with 56 fields, proper types, indexes on student_id
P0 (MVP) File Parser Supports CSV/Excel, validates required fields, maps headers to DB columns
P0 (MVP) Import Logic Fetches file via SFTP, parses, upserts data to DB, creates audit logs
P0 (MVP) Job Tracking jobs table records each import with status, timestamps, metadata
P0 (MVP) Error Handling Clear error messages, logging, graceful failures
P1 (Phase 2) Automated Scheduling EventBridge rule triggers Lambda at 2:00 AM daily
P1 (Phase 2) Import History UI Page showing all import jobs, filterable, detailed logs
P1 (Phase 2) Email Notifications Send alerts on import failures to configured admins

Non-Functional Requirements

Performance

  • Import Speed: Process 10,000 student records in < 2 minutes
  • SFTP Connection: Establish connection in < 5 seconds
  • UI Responsiveness: Settings page loads in < 1 second
  • Database Performance: Upsert operations < 50ms p95

Security

  • Credential Storage: All SFTP credentials stored in AWS Secrets Manager (encrypted at rest)
  • Access Control: Only administrators can configure SFTP settings or trigger imports
  • Audit Logging: All import operations logged with user ID, timestamp, operation details
  • Data Encryption: Files encrypted in transit (SFTP), at rest (RDS encryption)
  • Input Validation: All imported data validated and sanitized before DB insertion

4. Technical Architecture

System Design

┌─────────────────────────────────────────────────────────────────┐
│                          USER INTERFACE                          │
├─────────────────────────────────────────────────────────────────┤
│ Settings Page (Admin Only)                                      │
│  - SFTP Configuration Form                                      │
│  - "Import Student Data" Button                                 │
│  - Connection Test Button                                       │
│                                                                  │
│ Import History Page (Admin Only)                                │
│  - Job List (status, timestamps, record counts)                │
│  - Detailed Logs View                                           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ HTTP(S)
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      NEXT.JS APPLICATION                         │
├─────────────────────────────────────────────────────────────────┤
│ Server Actions:                                                  │
│  - importStudentData.actions.ts (trigger manual import)         │
│  - sftpSettings.actions.ts (save/retrieve credentials)         │
│  - importJobs.actions.ts (fetch job history)                   │
│                                                                  │
│ API Routes:                                                      │
│  - POST /api/admin/import/trigger (invoke Lambda)              │
│  - GET /api/admin/import/jobs (list jobs)                      │
│  - GET /api/admin/import/jobs/[id] (job details)              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ AWS SDK
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                        AWS SERVICES                              │
├─────────────────────────────────────────────────────────────────┤
│ Lambda Function: StudentDataImportHandler                       │
│  - Fetch credentials from Secrets Manager                       │
│  - Connect to SFTP server (ssh2-sftp-client)                   │
│  - Download student data file                                   │
│  - Parse CSV/Excel (csv-parse / xlsx)                          │
│  - Validate data (Zod schemas)                                  │
│  - Upsert to Aurora via RDS Data API                           │
│  - Update job status in jobs table                             │
│                                                                  │
│ EventBridge Rule: NightlyStudentDataImport (Phase 2)           │
│  - Trigger: cron(0 2 * * ? *)  # 2:00 AM daily                │
│  - Target: StudentDataImportHandler Lambda                      │
│                                                                  │
│ Secrets Manager: jockular-kangaroo/sftp                        │
│  - host, port, username, password/privateKey                   │
│                                                                  │
│ Aurora Serverless v2: jockular-kangaroo-db                     │
│  - Tables: student_data_snapshots, jobs, settings              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ SFTP
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     EXTERNAL SFTP SERVER                         │
│  - Student Information System (SIS) Export                      │
│  - File: student_data_export.csv (or .xlsx)                    │
└─────────────────────────────────────────────────────────────────┘

Data Model

New Table: student_data_snapshots

CREATE TABLE student_data_snapshots (
  id SERIAL PRIMARY KEY,
  student_id INTEGER NOT NULL REFERENCES students(id) ON DELETE CASCADE,
  ssid VARCHAR(50),
  
  -- Name fields (for reconciliation)
  first_name VARCHAR(100),
  last_name VARCHAR(100),
  preferred_name VARCHAR(100),
  
  -- Attendance metrics (14 fields)
  absences_ytd INTEGER,
  tardies_ytd INTEGER,
  absences_last_30_days INTEGER,
  tardies_last_30_days INTEGER,
  attendance_percentage NUMERIC(5,2),
  absence_percentage NUMERIC(5,2),
  unexcused_absences_ytd INTEGER,
  excused_absences_ytd INTEGER,
  unexcused_absences_last_30_days INTEGER,
  excused_absences_last_30_days INTEGER,
  
  -- Behavior metrics (2 fields)
  incident_count_ytd INTEGER,
  incident_count_last_30_days INTEGER,
  
  -- i-Ready assessments Spring & Fall 2025 (12 fields)
  spring_2025_iready_reading_score INTEGER,
  spring_2025_iready_reading_level VARCHAR(50),
  spring_2025_iready_reading_met_standard BOOLEAN,
  spring_2025_iready_math_score INTEGER,
  spring_2025_iready_math_level VARCHAR(50),
  spring_2025_iready_math_met_standard BOOLEAN,
  fall_2025_iready_reading_score INTEGER,
  fall_2025_iready_reading_level VARCHAR(50),
  fall_2025_iready_reading_met_standard BOOLEAN,
  fall_2025_iready_math_score INTEGER,
  fall_2025_iready_math_level VARCHAR(50),
  fall_2025_iready_math_met_standard BOOLEAN,
  
  -- DIBELS assessments Spring & Fall 2025 (24 fields)
  spring_2025_dibels_orf_wc_score INTEGER,
  spring_2025_dibels_orf_wc_level VARCHAR(50),
  spring_2025_dibels_orf_acc_score INTEGER,
  spring_2025_dibels_orf_acc_level VARCHAR(50),
  spring_2025_dibels_nwf_wrc_score INTEGER,
  spring_2025_dibels_nwf_wrc_level VARCHAR(50),
  spring_2025_dibels_nwf_cls_score INTEGER,
  spring_2025_dibels_nwf_cls_level VARCHAR(50),
  spring_2025_dibels_psf_score INTEGER,
  spring_2025_dibels_psf_level VARCHAR(50),
  spring_2025_dibels_lnf_score INTEGER,
  spring_2025_dibels_lnf_level VARCHAR(50),
  fall_2025_dibels_orf_wc_score INTEGER,
  fall_2025_dibels_orf_wc_level VARCHAR(50),
  fall_2025_dibels_orf_acc_score INTEGER,
  fall_2025_dibels_orf_acc_level VARCHAR(50),
  fall_2025_dibels_nwf_wrc_score INTEGER,
  fall_2025_dibels_nwf_wrc_level VARCHAR(50),
  fall_2025_dibels_nwf_cls_score INTEGER,
  fall_2025_dibels_nwf_cls_level VARCHAR(50),
  fall_2025_dibels_psf_score INTEGER,
  fall_2025_dibels_psf_level VARCHAR(50),
  fall_2025_dibels_lnf_score INTEGER,
  fall_2025_dibels_lnf_level VARCHAR(50),
  
  -- School info (2 fields)
  school VARCHAR(255),
  grade VARCHAR(10),
  
  -- Import metadata
  imported_at TIMESTAMP NOT NULL DEFAULT NOW(),
  import_job_id INTEGER REFERENCES jobs(id),
  created_at TIMESTAMP NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
  
  -- Ensure one snapshot per student (upsert on student_id)
  UNIQUE(student_id)
);

CREATE INDEX idx_student_data_snapshots_student_id ON student_data_snapshots(student_id);
CREATE INDEX idx_student_data_snapshots_imported_at ON student_data_snapshots(imported_at);
CREATE INDEX idx_student_data_snapshots_import_job_id ON student_data_snapshots(import_job_id);

Total Fields: 56 student data fields + 5 metadata fields = 61 columns


5. Implementation Plan

Phase 1: MVP (Weeks 1-4)

Week 1: Foundation & Database Schema

  • Day 1-2: Create database migration for student_data_snapshots table
  • Day 3-4: Create settings UI for SFTP configuration
  • Day 5: Create server actions for SFTP settings

Week 2: Lambda Function Development

  • Day 1-2: Set up Lambda infrastructure (CDK)
  • Day 3-4: Implement file download and parsing
  • Day 5: Implement database upsert logic

Week 3: Integration & UI

  • Day 1-2: Create manual import trigger
  • Day 3-4: Implement job tracking
  • Day 5: Error handling and logging

Week 4: Testing & Documentation

  • Day 1-2: Unit and integration tests
  • Day 3: End-to-end testing
  • Day 4: Documentation
  • Day 5: Deploy to production

Phase 2: Automation (Weeks 5-6)

Week 5: Scheduled Imports

  • Day 1: Create EventBridge rule in CDK
  • Day 2-3: Implement email notifications
  • Day 4: Retry logic and idempotency
  • Day 5: Monitoring and alerting

Week 6: Polish & Launch

  • Day 1-2: UI enhancements
  • Day 3: Performance optimization
  • Day 4: User acceptance testing
  • Day 5: Production launch

6. Success Metrics

30 Days Post-Launch

  • Adoption: 100% of administrators trained on SFTP configuration
  • Reliability: 95%+ automated import success rate
  • Performance: Average import time < 2 minutes for typical datasets
  • Data Quality: < 1% validation errors in imported data

60 Days Post-Launch

  • Efficiency: Zero manual data entry required for 56 tracked fields
  • Engagement: Staff report high confidence in data freshness
  • Reliability: 99%+ automated import success rate

90 Days Post-Launch

  • Business Impact: Intervention planning time reduced by 30%
  • Data Coverage: 98%+ of active students have current assessment data
  • System Health: Zero unplanned downtime for import system

7. Student Data Fields (56 Total)

Identification (5 fields)

  1. Student ID
  2. SSID
  3. Last Name
  4. First Name
  5. Preferred Name

Attendance (14 fields)

  1. Absences YTD
  2. Tardies YTD
  3. Absences in last 30 days
  4. Tardies in last 30 days
  5. Attendance %
  6. Absence %
  7. Unexcused absences YTD
  8. Excused absences YTD
  9. Unexcused absences in last 30 days
  10. Excused absences in last 30 days

Behavior (2 fields)

  1. Incident count YTD
  2. Incident count in last 30 days

i-Ready Assessments (12 fields)

  1. Spring 2025 i-Ready Reading Overall Score
  2. Spring 2025 i-Ready Reading Overall Performance Level
  3. Spring 2025 i-Ready Reading Overall Met Standard
  4. Spring 2025 i-Ready Math Overall Score
  5. Spring 2025 i-Ready Math Overall Performance Level
  6. Spring 2025 i-Ready Math Overall Met Standard
  7. Fall 2025 i-Ready Reading Overall Score
  8. Fall 2025 i-Ready Reading Overall Performance Level
  9. Fall 2025 i-Ready Reading Overall Met Standard
  10. Fall 2025 i-Ready Math Overall Score
  11. Fall 2025 i-Ready Math Overall Performance Level
  12. Fall 2025 i-Ready Math Overall Met Standard

DIBELS Assessments Spring 2025 (12 fields)

  1. Spring 2025 DIBELS ORF WC Score
  2. Spring 2025 DIBELS ORF WC Performance Level
  3. Spring 2025 DIBELS ORF Acc Score
  4. Spring 2025 DIBELS ORF Acc Performance Level
  5. Spring 2025 DIBELS NWF WRC Score
  6. Spring 2025 DIBELS NWF WRC Performance Level
  7. Spring 2025 DIBELS NWF CLS Score
  8. Spring 2025 DIBELS NWF CLS Performance Level
  9. Spring 2025 DIBELS PSF Score
  10. Spring 2025 DIBELS PSF Performance Level
  11. Spring 2025 DIBELS LNF Score
  12. Spring 2025 DIBELS LNF Performance Level

DIBELS Assessments Fall 2025 (12 fields)

  1. Fall 2025 DIBELS ORF WC Score
  2. Fall 2025 DIBELS ORF WC Performance Level
  3. Fall 2025 DIBELS ORF Acc Score
  4. Fall 2025 DIBELS ORF Acc Performance Level
  5. Fall 2025 DIBELS NWF WRC Score
  6. Fall 2025 DIBELS NWF WRC Performance Level
  7. Fall 2025 DIBELS NWF CLS Score
  8. Fall 2025 DIBELS NWF CLS Performance Level
  9. Fall 2025 DIBELS PSF Score
  10. Fall 2025 DIBELS PSF Performance Level
  11. Fall 2025 DIBELS LNF Score
  12. Fall 2025 DIBELS LNF Performance Level

School Information (2 fields)

  1. School
  2. Grade

8. Open Questions & Recommendations

  1. File Format: Will the file always be CSV, or could it be Excel/TSV?

    • Recommendation: Start with CSV, add Excel support if needed
  2. File Naming: Is the filename consistent, or does it include dates/timestamps?

    • Recommendation: Support glob patterns (e.g., student_data_*.csv)
  3. Student Matching: How do we match imported records to existing students?

    • Recommendation: Primary key = student_id, fallback to SSID if needed
  4. Missing Data: How should we handle blank/null assessment scores?

    • Recommendation: Store as NULL, don't overwrite existing values with NULL
  5. Notification List: Who should receive email alerts for failed imports?

    • Recommendation: Configurable in settings table (comma-separated emails)

Related Issues:

  • Will be broken into child issues for implementation

Labels: enhancement

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions