A refactored, performance-optimized Python implementation of the NavCity analysis pipeline.
This folder contains a rewritten version of the original Jupyter notebook pipeline. The analysis logic is identical, but the implementation has been optimized for better performance, maintainability, and usability.
📌 Note The original Jupyter notebooks are preserved in the
archive/subfolder for reference.
- Why Refactor?
- Module Structure
- Key Improvements
- Requirements
- Usage
- Comparison to Original
- API Reference
The original notebooks worked correctly but had several inefficiencies common in early-stage analysis code:
| Issue | Impact |
|---|---|
Row-by-row iterrows() loops |
Slow execution on large datasets |
| Repeated file I/O between scripts | Unnecessary disk operations |
| Matplotlib figures not closed | Memory leaks, warning messages |
| Hardcoded file paths | Required editing source code |
Jupyter %store magic for data passing |
Not portable to standard Python |
Repeated pd.concat() in loops |
Quadratic memory allocation |
This refactored version addresses all of these issues while maintaining identical output.
code/
│
├── run_analysis.py # CLI entry point (replaces 0_runall.ipynb)
├── metrics.py # Navigation metric calculations
├── visualization.py # Plotting and trajectory extraction
├── post_processing.py # File organization and data corrections
├── __init__.py # Package initialization
├── README.md # This file
│
└── archive/ # Original Jupyter notebooks (for reference)
├── 0_runall.ipynb
├── 1_calculate_outcomes.ipynb
├── 2_merge_data.ipynb
├── 3_average_data.ipynb
├── 4_target_data.ipynb
├── 5_graph_data.ipynb
└── 6_post_analyses.ipynb
| Original Notebook | New Module | Function(s) |
|---|---|---|
0_runall.ipynb |
run_analysis.py |
main(), CLI argument parsing |
1_calculate_outcomes.ipynb |
metrics.py |
process_raw_data(), calculate_all_metrics() |
2_merge_data.ipynb |
metrics.py |
merge_block_results() |
3_average_data.ipynb |
metrics.py |
average_metrics() |
4_target_data.ipynb |
visualization.py |
extract_target_trajectories() |
5_graph_data.ipynb |
visualization.py |
plot_target_maps(), generate_participant_movement_plots() |
6_post_analyses.ipynb |
post_processing.py |
post_analysis_cleanup(), organize_and_rename_files() |
Before (slow):
for index, row in group_data.iterrows():
if row['X'] == 0 and row['Z'] == -4.1:
count += row['Time_Diff']After (fast):
at_start = (group['X'] == START_X) & (group['Z'] == START_Z)
orientation_time = group.loc[at_start, 'Time_Diff'].sum()Before: Append to CSV file on each iteration
data.to_csv(filepath, mode='a', header=False) # Called N timesAfter: Accumulate in memory, write once
all_results.append(df) # Fast list append
pd.concat(all_results).to_csv(filepath) # Single writeBefore: Figures accumulate in memory
plt.figure()
plt.clf() # Clears but doesn't release memoryAfter: Figures properly closed
fig, ax = plt.subplots()
plt.close(fig) # Releases memoryBefore: Edit hardcoded paths in notebook
fp_folders = ['/Volumes/YB_Drive/NavAging_Paper/data/YA_Data/']After: Pass paths as arguments
python run_analysis.py --data-folders /path/to/YA_Data /path/to/OA_DataPython 3.8+ with the following packages:
numpy>=1.20.0
pandas>=1.3.0
matplotlib>=3.4.0
Install with:
pip install numpy pandas matplotlibNote: This implementation removes the Jupyter dependency. You can run directly from the command line or import as a library.
Navigate to the code/ directory:
cd /path/to/navcity-analysis/codepython run_analysis.py \
--data-folders /Volumes/YB_Drive/NavAging_Paper/data/YA_Data \
/Volumes/YB_Drive/NavAging_Paper/data/OA_Data \
--base-folder /Volumes/YB_Drive/NavAging_Paper/dataUse --output-dir to save all results to a separate location (instead of the input data folders):
# Single data folder - outputs saved directly to output directory
python run_analysis.py \
--data-folders /path/to/YA_Data \
--output-dir /path/to/results
# Multiple data folders - creates subdirectories (YA_Data/, OA_Data/) in output directory
python run_analysis.py \
--data-folders /path/to/YA_Data /path/to/OA_Data \
--output-dir /path/to/resultsIf --output-dir is not specified, results are saved to the input data folders (original behavior).
# Only calculate metrics
python run_analysis.py --data-folders /path/to/data --steps metrics
# Calculate and merge (no plots)
python run_analysis.py --data-folders /path/to/data --steps metrics merge average
# Only generate visualizations (assumes metrics already calculated)
python run_analysis.py --data-folders /path/to/data --steps trajectories plots
# Only run post-processing
python run_analysis.py --base-folder /path/to/data --steps post-process| Step | Description | Output |
|---|---|---|
metrics |
Calculate navigation metrics per participant/block | {participant}/b{1,2,3}_results.csv |
merge |
Combine all block results | merged_results.csv |
average |
Average metrics across targets | averaged_results.csv |
trajectories |
Extract per-target coordinate data | Target_Data/*.csv |
plots |
Generate movement visualizations | *.png files |
post-process |
Organize files, fix known errors | Renamed/moved files |
from metrics import process_raw_data, calculate_all_metrics
from visualization import plot_participant_movement
# Process a single file
data = process_raw_data('/path/to/Saved_data_BNC01_t1.csv')
metrics = calculate_all_metrics(data)
print(metrics)
# Generate a plot
plot_participant_movement(data, '/path/to/output.png', title='BNC01 Block 1')The efficient implementation produces identical output files:
- Same file names (
b1_results.csv,merged_results.csv, etc.) - Same column names and order
- Same calculated values
- Same directory structure
You can safely replace the original pipeline with this implementation.
| Operation | Original | Efficient | Speedup |
|---|---|---|---|
| Metric calculation (per block) | ~2-3s | ~0.3-0.5s | ~5-6x |
| Full pipeline (22 participants) | ~5-10 min | ~1-2 min | ~5x |
| Memory usage (plotting) | Grows unbounded | Constant | N/A |
Actual performance depends on hardware and dataset size.
process_raw_data(filepath: str) -> pd.DataFrame
"""Load and preprocess raw NavCity CSV data."""
calculate_all_metrics(data: pd.DataFrame) -> pd.DataFrame
"""Calculate all navigation metrics for each target."""
merge_block_results(data_folder: str, participant_ids: list) -> pd.DataFrame
"""Merge all block results into a single DataFrame."""
average_metrics(data_folder: str, participant_ids: list) -> pd.DataFrame
"""Calculate average metrics across targets for each participant/block."""plot_participant_movement(data: pd.DataFrame, output_path: str, title: str = None)
"""Plot movement trajectories for a single participant/block."""
generate_participant_movement_plots(data_folder: str, participant_ids: list, output_dir: str = None)
"""Generate movement plots for all participants and blocks."""
extract_target_trajectories(data_folder: str, participant_ids: list, output_dir: str = None)
"""Extract and save trajectory data organized by target."""
plot_target_maps(data_folder: str, blocks: list = None)
"""Generate overhead maps showing all participant trajectories."""organize_and_rename_files(output_folder: str, ya_subfolder: str, oa_subfolder: str)
"""Move and rename output files with age group prefixes."""
fix_erroneous_data(merged_path: str, averaged_path: str, participant: str, ...)
"""Fix erroneous data for a specific participant/block/target."""
post_analysis_cleanup(output_folder: str)
"""Run all post-analysis cleanup operations."""| Metric | Description |
|---|---|
Total_Time |
Complete time spent navigating to target |
Orientation_Time |
Time spent at starting position (X=0, Z=-4.1) |
Navigation_Time |
Active movement time (Total - Orientation) |
Distance |
Total path length traveled |
Speed |
Distance / Navigation_Time |
Mean_Dwell |
Average time spent at each unique position |
Teleportations |
Count of unique positions visited |
Mean_Teleport_Distance |
Average distance between consecutive unique positions |
The eight navigation targets in canonical order:
- Automobile shop
- Police station
- Fire Station
- Bank
- Pawn Shop
- Pizzeria
- Quattroki Restaurant
- High School
Same as parent repository: MIT License
Last Updated: January 2026 Original Author: Yasmine Bassil Refactored By: Claude (Anthropic)