NavCity Analysis — Efficient Implementation

A refactored, performance-optimized Python implementation of the NavCity analysis pipeline.

What's Here?

This folder contains a rewritten version of the original Jupyter notebook pipeline. The analysis logic is identical, but the implementation has been optimized for better performance, maintainability, and usability.

📌 Note The original Jupyter notebooks are preserved in the archive/ subfolder for reference.

Why Refactor?

The original notebooks worked correctly but had several inefficiencies common in early-stage analysis code:

Issue	Impact
Row-by-row `iterrows()` loops	Slow execution on large datasets
Repeated file I/O between scripts	Unnecessary disk operations
Matplotlib figures not closed	Memory leaks, warning messages
Hardcoded file paths	Required editing source code
Jupyter `%store` magic for data passing	Not portable to standard Python
Repeated `pd.concat()` in loops	Quadratic memory allocation

This refactored version addresses all of these issues while maintaining identical output.

Module Structure

code/
│
├── run_analysis.py       # CLI entry point (replaces 0_runall.ipynb)
├── metrics.py            # Navigation metric calculations
├── visualization.py      # Plotting and trajectory extraction
├── post_processing.py    # File organization and data corrections
├── __init__.py           # Package initialization
├── README.md             # This file
│
└── archive/              # Original Jupyter notebooks (for reference)
    ├── 0_runall.ipynb
    ├── 1_calculate_outcomes.ipynb
    ├── 2_merge_data.ipynb
    ├── 3_average_data.ipynb
    ├── 4_target_data.ipynb
    ├── 5_graph_data.ipynb
    └── 6_post_analyses.ipynb

Module Mapping

Original Notebook	New Module	Function(s)
`0_runall.ipynb`	`run_analysis.py`	`main()`, CLI argument parsing
`1_calculate_outcomes.ipynb`	`metrics.py`	`process_raw_data()`, `calculate_all_metrics()`
`2_merge_data.ipynb`	`metrics.py`	`merge_block_results()`
`3_average_data.ipynb`	`metrics.py`	`average_metrics()`
`4_target_data.ipynb`	`visualization.py`	`extract_target_trajectories()`
`5_graph_data.ipynb`	`visualization.py`	`plot_target_maps()`, `generate_participant_movement_plots()`
`6_post_analyses.ipynb`	`post_processing.py`	`post_analysis_cleanup()`, `organize_and_rename_files()`

Key Improvements

1. Vectorized Operations

Before (slow):

for index, row in group_data.iterrows():
    if row['X'] == 0 and row['Z'] == -4.1:
        count += row['Time_Diff']

After (fast):

at_start = (group['X'] == START_X) & (group['Z'] == START_Z)
orientation_time = group.loc[at_start, 'Time_Diff'].sum()

2. Batch File Operations

Before: Append to CSV file on each iteration

data.to_csv(filepath, mode='a', header=False)  # Called N times

After: Accumulate in memory, write once

all_results.append(df)  # Fast list append
pd.concat(all_results).to_csv(filepath)  # Single write

3. Proper Memory Management

Before: Figures accumulate in memory

plt.figure()
plt.clf()  # Clears but doesn't release memory

After: Figures properly closed

fig, ax = plt.subplots()
plt.close(fig)  # Releases memory

4. Command-Line Interface

Before: Edit hardcoded paths in notebook

fp_folders = ['/Volumes/YB_Drive/NavAging_Paper/data/YA_Data/']

After: Pass paths as arguments

python run_analysis.py --data-folders /path/to/YA_Data /path/to/OA_Data

Requirements

Software Dependencies

Python 3.8+ with the following packages:

numpy>=1.20.0
pandas>=1.3.0
matplotlib>=3.4.0

Install with:

pip install numpy pandas matplotlib

Note: This implementation removes the Jupyter dependency. You can run directly from the command line or import as a library.

Usage

Command Line

Navigate to the code/ directory:

cd /path/to/navcity-analysis/code

Run Full Pipeline

python run_analysis.py \
    --data-folders /Volumes/YB_Drive/NavAging_Paper/data/YA_Data \
                   /Volumes/YB_Drive/NavAging_Paper/data/OA_Data \
    --base-folder /Volumes/YB_Drive/NavAging_Paper/data

Save Outputs to a Different Directory

Use --output-dir to save all results to a separate location (instead of the input data folders):

# Single data folder - outputs saved directly to output directory
python run_analysis.py \
    --data-folders /path/to/YA_Data \
    --output-dir /path/to/results

# Multiple data folders - creates subdirectories (YA_Data/, OA_Data/) in output directory
python run_analysis.py \
    --data-folders /path/to/YA_Data /path/to/OA_Data \
    --output-dir /path/to/results

If --output-dir is not specified, results are saved to the input data folders (original behavior).

Run Specific Steps

# Only calculate metrics
python run_analysis.py --data-folders /path/to/data --steps metrics

# Calculate and merge (no plots)
python run_analysis.py --data-folders /path/to/data --steps metrics merge average

# Only generate visualizations (assumes metrics already calculated)
python run_analysis.py --data-folders /path/to/data --steps trajectories plots

# Only run post-processing
python run_analysis.py --base-folder /path/to/data --steps post-process

Available Steps

Step	Description	Output
`metrics`	Calculate navigation metrics per participant/block	`{participant}/b{1,2,3}_results.csv`
`merge`	Combine all block results	`merged_results.csv`
`average`	Average metrics across targets	`averaged_results.csv`
`trajectories`	Extract per-target coordinate data	`Target_Data/*.csv`
`plots`	Generate movement visualizations	`*.png` files
`post-process`	Organize files, fix known errors	Renamed/moved files

As a Library

from metrics import process_raw_data, calculate_all_metrics
from visualization import plot_participant_movement

# Process a single file
data = process_raw_data('/path/to/Saved_data_BNC01_t1.csv')
metrics = calculate_all_metrics(data)
print(metrics)

# Generate a plot
plot_participant_movement(data, '/path/to/output.png', title='BNC01 Block 1')

Comparison to Original

Output Compatibility

The efficient implementation produces identical output files:

Same file names (b1_results.csv, merged_results.csv, etc.)
Same column names and order
Same calculated values
Same directory structure

You can safely replace the original pipeline with this implementation.

Performance

Operation	Original	Efficient	Speedup
Metric calculation (per block)	~2-3s	~0.3-0.5s	~5-6x
Full pipeline (22 participants)	~5-10 min	~1-2 min	~5x
Memory usage (plotting)	Grows unbounded	Constant	N/A

Actual performance depends on hardware and dataset size.

API Reference

metrics.py

process_raw_data(filepath: str) -> pd.DataFrame
    """Load and preprocess raw NavCity CSV data."""

calculate_all_metrics(data: pd.DataFrame) -> pd.DataFrame
    """Calculate all navigation metrics for each target."""

merge_block_results(data_folder: str, participant_ids: list) -> pd.DataFrame
    """Merge all block results into a single DataFrame."""

average_metrics(data_folder: str, participant_ids: list) -> pd.DataFrame
    """Calculate average metrics across targets for each participant/block."""

visualization.py

plot_participant_movement(data: pd.DataFrame, output_path: str, title: str = None)
    """Plot movement trajectories for a single participant/block."""

generate_participant_movement_plots(data_folder: str, participant_ids: list, output_dir: str = None)
    """Generate movement plots for all participants and blocks."""

extract_target_trajectories(data_folder: str, participant_ids: list, output_dir: str = None)
    """Extract and save trajectory data organized by target."""

plot_target_maps(data_folder: str, blocks: list = None)
    """Generate overhead maps showing all participant trajectories."""

post_processing.py

organize_and_rename_files(output_folder: str, ya_subfolder: str, oa_subfolder: str)
    """Move and rename output files with age group prefixes."""

fix_erroneous_data(merged_path: str, averaged_path: str, participant: str, ...)
    """Fix erroneous data for a specific participant/block/target."""

post_analysis_cleanup(output_folder: str)
    """Run all post-analysis cleanup operations."""

Metrics Calculated

Metric	Description
`Total_Time`	Complete time spent navigating to target
`Orientation_Time`	Time spent at starting position (X=0, Z=-4.1)
`Navigation_Time`	Active movement time (Total - Orientation)
`Distance`	Total path length traveled
`Speed`	Distance / Navigation_Time
`Mean_Dwell`	Average time spent at each unique position
`Teleportations`	Count of unique positions visited
`Mean_Teleport_Distance`	Average distance between consecutive unique positions

Target Locations

The eight navigation targets in canonical order:

Automobile shop
Police station
Fire Station
Bank
Pawn Shop
Pizzeria
Quattroki Restaurant
High School

License

Same as parent repository: MIT License

Last Updated: January 2026 Original Author: Yasmine Bassil Refactored By: Claude (Anthropic)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NavCity Analysis — Efficient Implementation

What's Here?

Table of Contents

Why Refactor?

Module Structure

Module Mapping

Key Improvements

1. Vectorized Operations

2. Batch File Operations

3. Proper Memory Management

4. Command-Line Interface

Requirements

Software Dependencies

Usage

Command Line

Run Full Pipeline

Save Outputs to a Different Directory

Run Specific Steps

Available Steps

As a Library

Comparison to Original

Output Compatibility

Performance

API Reference

metrics.py

visualization.py

post_processing.py

Metrics Calculated

Target Locations

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
metrics.py		metrics.py
post_processing.py		post_processing.py
run_analysis.py		run_analysis.py
visualization.py		visualization.py

License

npresearchlab/navcity-analysis

Folders and files

Latest commit

History

Repository files navigation

NavCity Analysis — Efficient Implementation

What's Here?

Table of Contents

Why Refactor?

Module Structure

Module Mapping

Key Improvements

1. Vectorized Operations

2. Batch File Operations

3. Proper Memory Management

4. Command-Line Interface

Requirements

Software Dependencies

Usage

Command Line

Run Full Pipeline

Save Outputs to a Different Directory

Run Specific Steps

Available Steps

As a Library

Comparison to Original

Output Compatibility

Performance

API Reference

metrics.py

visualization.py

post_processing.py

Metrics Calculated

Target Locations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages