Skip to content

npresearchlab/navcity-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NavCity Analysis — Efficient Implementation

A refactored, performance-optimized Python implementation of the NavCity analysis pipeline.

What's Here?

This folder contains a rewritten version of the original Jupyter notebook pipeline. The analysis logic is identical, but the implementation has been optimized for better performance, maintainability, and usability.

📌 Note The original Jupyter notebooks are preserved in the archive/ subfolder for reference.


Table of Contents


Why Refactor?

The original notebooks worked correctly but had several inefficiencies common in early-stage analysis code:

Issue Impact
Row-by-row iterrows() loops Slow execution on large datasets
Repeated file I/O between scripts Unnecessary disk operations
Matplotlib figures not closed Memory leaks, warning messages
Hardcoded file paths Required editing source code
Jupyter %store magic for data passing Not portable to standard Python
Repeated pd.concat() in loops Quadratic memory allocation

This refactored version addresses all of these issues while maintaining identical output.


Module Structure

code/
│
├── run_analysis.py       # CLI entry point (replaces 0_runall.ipynb)
├── metrics.py            # Navigation metric calculations
├── visualization.py      # Plotting and trajectory extraction
├── post_processing.py    # File organization and data corrections
├── __init__.py           # Package initialization
├── README.md             # This file
│
└── archive/              # Original Jupyter notebooks (for reference)
    ├── 0_runall.ipynb
    ├── 1_calculate_outcomes.ipynb
    ├── 2_merge_data.ipynb
    ├── 3_average_data.ipynb
    ├── 4_target_data.ipynb
    ├── 5_graph_data.ipynb
    └── 6_post_analyses.ipynb

Module Mapping

Original Notebook New Module Function(s)
0_runall.ipynb run_analysis.py main(), CLI argument parsing
1_calculate_outcomes.ipynb metrics.py process_raw_data(), calculate_all_metrics()
2_merge_data.ipynb metrics.py merge_block_results()
3_average_data.ipynb metrics.py average_metrics()
4_target_data.ipynb visualization.py extract_target_trajectories()
5_graph_data.ipynb visualization.py plot_target_maps(), generate_participant_movement_plots()
6_post_analyses.ipynb post_processing.py post_analysis_cleanup(), organize_and_rename_files()

Key Improvements

1. Vectorized Operations

Before (slow):

for index, row in group_data.iterrows():
    if row['X'] == 0 and row['Z'] == -4.1:
        count += row['Time_Diff']

After (fast):

at_start = (group['X'] == START_X) & (group['Z'] == START_Z)
orientation_time = group.loc[at_start, 'Time_Diff'].sum()

2. Batch File Operations

Before: Append to CSV file on each iteration

data.to_csv(filepath, mode='a', header=False)  # Called N times

After: Accumulate in memory, write once

all_results.append(df)  # Fast list append
pd.concat(all_results).to_csv(filepath)  # Single write

3. Proper Memory Management

Before: Figures accumulate in memory

plt.figure()
plt.clf()  # Clears but doesn't release memory

After: Figures properly closed

fig, ax = plt.subplots()
plt.close(fig)  # Releases memory

4. Command-Line Interface

Before: Edit hardcoded paths in notebook

fp_folders = ['/Volumes/YB_Drive/NavAging_Paper/data/YA_Data/']

After: Pass paths as arguments

python run_analysis.py --data-folders /path/to/YA_Data /path/to/OA_Data

Requirements

Software Dependencies

Python 3.8+ with the following packages:

numpy>=1.20.0
pandas>=1.3.0
matplotlib>=3.4.0

Install with:

pip install numpy pandas matplotlib

Note: This implementation removes the Jupyter dependency. You can run directly from the command line or import as a library.


Usage

Command Line

Navigate to the code/ directory:

cd /path/to/navcity-analysis/code

Run Full Pipeline

python run_analysis.py \
    --data-folders /Volumes/YB_Drive/NavAging_Paper/data/YA_Data \
                   /Volumes/YB_Drive/NavAging_Paper/data/OA_Data \
    --base-folder /Volumes/YB_Drive/NavAging_Paper/data

Save Outputs to a Different Directory

Use --output-dir to save all results to a separate location (instead of the input data folders):

# Single data folder - outputs saved directly to output directory
python run_analysis.py \
    --data-folders /path/to/YA_Data \
    --output-dir /path/to/results

# Multiple data folders - creates subdirectories (YA_Data/, OA_Data/) in output directory
python run_analysis.py \
    --data-folders /path/to/YA_Data /path/to/OA_Data \
    --output-dir /path/to/results

If --output-dir is not specified, results are saved to the input data folders (original behavior).

Run Specific Steps

# Only calculate metrics
python run_analysis.py --data-folders /path/to/data --steps metrics

# Calculate and merge (no plots)
python run_analysis.py --data-folders /path/to/data --steps metrics merge average

# Only generate visualizations (assumes metrics already calculated)
python run_analysis.py --data-folders /path/to/data --steps trajectories plots

# Only run post-processing
python run_analysis.py --base-folder /path/to/data --steps post-process

Available Steps

Step Description Output
metrics Calculate navigation metrics per participant/block {participant}/b{1,2,3}_results.csv
merge Combine all block results merged_results.csv
average Average metrics across targets averaged_results.csv
trajectories Extract per-target coordinate data Target_Data/*.csv
plots Generate movement visualizations *.png files
post-process Organize files, fix known errors Renamed/moved files

As a Library

from metrics import process_raw_data, calculate_all_metrics
from visualization import plot_participant_movement

# Process a single file
data = process_raw_data('/path/to/Saved_data_BNC01_t1.csv')
metrics = calculate_all_metrics(data)
print(metrics)

# Generate a plot
plot_participant_movement(data, '/path/to/output.png', title='BNC01 Block 1')

Comparison to Original

Output Compatibility

The efficient implementation produces identical output files:

  • Same file names (b1_results.csv, merged_results.csv, etc.)
  • Same column names and order
  • Same calculated values
  • Same directory structure

You can safely replace the original pipeline with this implementation.

Performance

Operation Original Efficient Speedup
Metric calculation (per block) ~2-3s ~0.3-0.5s ~5-6x
Full pipeline (22 participants) ~5-10 min ~1-2 min ~5x
Memory usage (plotting) Grows unbounded Constant N/A

Actual performance depends on hardware and dataset size.


API Reference

metrics.py

process_raw_data(filepath: str) -> pd.DataFrame
    """Load and preprocess raw NavCity CSV data."""

calculate_all_metrics(data: pd.DataFrame) -> pd.DataFrame
    """Calculate all navigation metrics for each target."""

merge_block_results(data_folder: str, participant_ids: list) -> pd.DataFrame
    """Merge all block results into a single DataFrame."""

average_metrics(data_folder: str, participant_ids: list) -> pd.DataFrame
    """Calculate average metrics across targets for each participant/block."""

visualization.py

plot_participant_movement(data: pd.DataFrame, output_path: str, title: str = None)
    """Plot movement trajectories for a single participant/block."""

generate_participant_movement_plots(data_folder: str, participant_ids: list, output_dir: str = None)
    """Generate movement plots for all participants and blocks."""

extract_target_trajectories(data_folder: str, participant_ids: list, output_dir: str = None)
    """Extract and save trajectory data organized by target."""

plot_target_maps(data_folder: str, blocks: list = None)
    """Generate overhead maps showing all participant trajectories."""

post_processing.py

organize_and_rename_files(output_folder: str, ya_subfolder: str, oa_subfolder: str)
    """Move and rename output files with age group prefixes."""

fix_erroneous_data(merged_path: str, averaged_path: str, participant: str, ...)
    """Fix erroneous data for a specific participant/block/target."""

post_analysis_cleanup(output_folder: str)
    """Run all post-analysis cleanup operations."""

Metrics Calculated

Metric Description
Total_Time Complete time spent navigating to target
Orientation_Time Time spent at starting position (X=0, Z=-4.1)
Navigation_Time Active movement time (Total - Orientation)
Distance Total path length traveled
Speed Distance / Navigation_Time
Mean_Dwell Average time spent at each unique position
Teleportations Count of unique positions visited
Mean_Teleport_Distance Average distance between consecutive unique positions

Target Locations

The eight navigation targets in canonical order:

  1. Automobile shop
  2. Police station
  3. Fire Station
  4. Bank
  5. Pawn Shop
  6. Pizzeria
  7. Quattroki Restaurant
  8. High School

License

Same as parent repository: MIT License


Last Updated: January 2026 Original Author: Yasmine Bassil Refactored By: Claude (Anthropic)

About

Tools to analyze NavCity data to create outcome measures.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages