Y-Maze Behavioral Analysis Pipeline

A Python pipeline for processing Y-Maze behavioral data from Unity experiments.

Overview

This pipeline provides three standalone scripts:

Script	Purpose
`create_ymaze_timing.py`	Process coordinates → FSL timing files
`create_ymaze_outcomes.py`	Calculate behavioral outcome measures
`run_preference_analysis.py`	Aggregate preference classifications

The timing script processes coordinate data to:

Detect teleportation events - Identify when participants are teleported back to starting positions
Label trials - Categorize events as probe (p) or non-probe (np1, np2, etc.)
Extract timing - Calculate trial onset times and durations
Generate FSL files - Create timing files for neuroimaging analysis

Installation

# Install dependencies
pip install pandas numpy matplotlib

Quick Start

1. Download the pipeline

Download the ymaze_analysis folder from GitHub: [link to repo]

2. Set up your folder structure

your_project/
├── data/                        ← Put your raw data here
│   ├── NAV066/
│   │   ├── coordinates_NAV066C.csv
│   │   ├── coordinates_NAV066S.csv
│   │   ├── preferences_NAV066C.csv
│   │   └── preferences_NAV066S.csv
│   ├── NAV067/
│   │   ├── coordinates_NAV067.csv
│   │   └── preferences_NAV067.csv
│   └── ...
├── output/                      ← Results will go here (create this folder)
│
└── ymaze_analysis/              ← The analysis pipeline (from GitHub)
    ├── create_ymaze_timing.py   # Generate FSL timing files from coordinates
    ├── create_ymaze_outcomes.py # Calculate behavioral outcome measures
    ├── run_preference_analysis.py  # Aggregate preference data
    ├── config.py                # Configuration settings
    ├── README.md
    │
    ├── data_io/                 # Input/output utilities
    │   ├── __init__.py
    │   ├── loaders.py
    │   └── writers.py
    │
    ├── processing/              # Core analysis modules
    │   ├── __init__.py
    │   ├── teleport_detection.py
    │   ├── trial_labeling.py
    │   └── timing_extraction.py
    │
    ├── post/                    # Post-processing and aggregation
    │   ├── __init__.py
    │   ├── outcome_measures.py
    │   └── preference_aggregation.py
    │
    └── visualization/           # Plotting utilities
        ├── __init__.py
        └── trajectory_plots.py

3. Run the analysis

# Navigate to the ymaze_analysis folder
cd ymaze_analysis

# Step 1: Generate FSL timing files from coordinate data
python create_ymaze_timing.py --data-dir ../data --output-dir ../output

# Step 2 (optional): Calculate behavioral outcome measures
python create_ymaze_outcomes.py --output-dir ../output

# Step 3 (optional): Aggregate preference data
python run_preference_analysis.py --data-dir ../data --output-dir ../output

The timing script processes all participants and generates FSL timing files.

Input Data Format

Coordinate Files

CSV files with columns (note: original files may have leading spaces):

Environment - Environment identifier
Cummulative_Time - Timestamp
X - X coordinate
Z - Z coordinate
Rotation - Rotation angle

Task Restart Handling

If a coordinate file contains multiple header rows (indicating the task was restarted), the pipeline automatically keeps only the data from the last restart. A log message will indicate when this occurs:

Found 1 task restart(s) in coordinates_BNC21.csv, using data from last restart (row 236)

Preference Files

The preference analysis script (run_preference_analysis.py) looks for files with "preferences" in the filename. These should follow the same naming convention as coordinate files:

preferences_SUBID[C|S][_N].csv

Transition Period Handling

The Y-maze task switches environment labels during inter-trial pauses (black screen) before the participant is actually teleported to the new environment. The pipeline automatically detects and marks these ~6 second transition periods:

An is_transition column is added to the raw data (True during pauses)
Environment labels are preserved for correct trial attribution
This ensures probe trials are correctly assigned to their actual environment

Probe Trial Detection

Probe trials are identified by their destination, not just position in the sequence:

NP trials teleport to one position (e.g., 36.52, 22.00)
Probe trials teleport to the opposite position (e.g., 0.00, -32.00)

Auto-detection: The pipeline automatically detects the two start positions by analyzing teleport destinations in the data. This means it works with any maze configuration — no hardcoded coordinates.

If a participant stops mid-environment (incomplete), the last teleport still goes to the NP position — so it's correctly labeled as an NP trial, not a probe. A warning is logged when incomplete environments are detected.

Expected Directory Structure

data/
├── NAV066/
│   ├── coordinates_NAV066C.csv
│   ├── coordinates_NAV066S.csv
│   ├── preferences_NAV066C.csv
│   └── preferences_NAV066S.csv
├── NAV067/
│   ├── coordinates_NAV067.csv
│   └── preferences_NAV067.csv
└── ...

File Naming Convention

Files must follow this naming pattern:

File Type	Pattern	Examples
Coordinates	`coordinates_SUBID[C\|S][_N].csv`	`coordinates_NAV066.csv`, `coordinates_NAV066C.csv`, `coordinates_NAV090S_2.csv`
Preferences	`preferences_SUBID[C\|S][_N].csv`	`preferences_NAV066.csv`, `preferences_NAV066S.csv`, `preferences_NAV090S_2.csv`

Where:

SUBID = Subject ID (e.g., NAV066, BNC21)
C or S = Optional run marker (Computer or Scanner)
_N = Optional run number for repeat runs (e.g., _2, _3)

Multiple runs per subject: If a subject has multiple runs (e.g., scanner and computer), add C or S after the subject ID:

NAV066/
├── coordinates_NAV066C.csv   ← Computer/desktop run
├── coordinates_NAV066S.csv   ← Scanner run
├── preferences_NAV066C.csv
└── preferences_NAV066S.csv

Repeat runs: If a run was repeated, add a number suffix:

NAV090/
├── coordinates_NAV090S.csv     ← First scanner run
├── coordinates_NAV090S_2.csv   ← Second scanner run (repeat)
├── preferences_NAV090S.csv
└── preferences_NAV090S_2.csv

Note: The pipeline only processes files with "coordinates" (case-insensitive) in the filename for timing analysis. Preference files are processed separately by run_preference_analysis.py.

NAV066/
├── coordinates_NAV066C.csv   ← ✓ Processed by create_ymaze_timing.py
├── preferences_NAV066C.csv   ← ✓ Processed by run_preference_analysis.py
├── NAV066_Log.txt            ← Ignored
├── notes.docx                ← Ignored
└── .DS_Store                 ← Ignored

Output Files

For each subject, the original participant number is preserved (e.g., NAV066 → sub-066):

Single run:

output/
├── all_outcomes.csv                    # Aggregated task outcomes (from create_ymaze_outcomes.py)
├── all_preferences.csv                 # Aggregated preferences (from run_preference_analysis.py)
│
└── sub-066/
    ├── sub-066_labeled_final.csv       # Full labeled data with trial_num column
    ├── sub-066_trial_outcomes.csv      # Trial-level outcomes (from create_ymaze_outcomes.py)
    ├── sub-066_env_outcomes.csv        # Environment-level outcomes (from create_ymaze_outcomes.py)
    ├── sub-066_task_outcomes.csv       # Task-level outcomes (from create_ymaze_outcomes.py)
    │
    ├── trial_numbers/                  # Raw trial-numbered FSL files
    │   ├── trial_1.txt                 # All 1st trials across environments
    │   ├── trial_2.txt
    │   ├── trial_3.txt
    │   ├── ...                         # (varies by participant)
    │   └── rest.txt
    │
    ├── labeled_trials/                 # Semantically labeled FSL files
    │   ├── explore.txt                 # First trial (trial_1)
    │   ├── cond1.txt                   # Conditioning trial 1
    │   ├── cond2.txt                   # Conditioning trial 2
    │   ├── cond3.txt                   # Conditioning trial 3
    │   ├── preprobe.txt                # Trial before probe
    │   ├── probe.txt                   # Probe trials
    │   └── rest.txt
    │
    └── sub-066_trajectory.png          # Trajectory visualization

Multiple runs (C and S):

output/
├── all_outcomes.csv                    # Aggregated across all subjects/runs
│
└── sub-066/
    ├── C/
    │   ├── sub-066_labeled_final.csv
    │   ├── sub-066_trial_outcomes.csv  # Outcome files (from create_ymaze_outcomes.py)
    │   ├── sub-066_env_outcomes.csv
    │   ├── sub-066_task_outcomes.csv
    │   ├── sub-066_trajectory.png
    │   ├── trial_numbers/
    │   │   └── ...
    │   └── labeled_trials/
    │       └── ...
    │
    └── S/
        ├── sub-066_labeled_final.csv
        ├── sub-066_trial_outcomes.csv
        ├── sub-066_env_outcomes.csv
        ├── sub-066_task_outcomes.csv
        ├── sub-066_trajectory.png
        ├── trial_numbers/
        │   └── ...
        └── labeled_trials/
            └── ...

Labeled Trials Scheme

The labeled_trials/ folder maps trial numbers to semantic labels based on position relative to the probe:

Trial Position	Label	Description
trial_1	explore	First/exploration trial
max - 4	cond1	Conditioning trial 1 (skipped if = trial_1)
max - 3	cond2	Conditioning trial 2
max - 2	cond3	Conditioning trial 3
max - 1	preprobe	Trial immediately before probe
max (last)	probe	Probe trial

Note: Only COMPLETED environments (those with a probe trial) are included in labeled_trials/. If a participant stopped mid-environment, those trials are excluded from labeled files but remain in trial_numbers/.

labeled_final.csv columns:

Column	Description
`env`	Environment name
`trial`	Trial type label (np1, np2, p, etc.)
`trial_num`	Sequential trial number within environment (1, 2, 3...)
`start`	Trial start time (normalized)
`end`	Trial end time
`duration`	Trial duration

FSL Timing Format

Three-column format: onset duration weight

0.00 15.23 1
45.67 12.89 1
...

Outcome Measures

The create_ymaze_outcomes.py script reads labeled_final.csv files and calculates behavioral outcome measures at three levels.

# Run after create_ymaze_timing.py has generated labeled_final.csv files
python create_ymaze_outcomes.py --output-dir ../output

Trial-Level Outcomes (`sub-XXX_trial_outcomes.csv`)

One row per trial:

Column	Description
`subject_id`	Subject identifier
`environment`	Environment name
`trial_num`	Trial number within environment (1, 2, 3...)
`trial_type`	`np` (non-probe) or `p` (probe)
`duration`	Trial duration in seconds
`start_time`	Trial start time (normalized)
`end_time`	Trial end time

Environment-Level Outcomes (`sub-XXX_env_outcomes.csv`)

One row per environment:

Column	Description
`subject_id`	Subject identifier
`environment`	Environment name
`n_np_trials`	Number of non-probe trials
`total_time`	Total time for this environment
`avg_np_duration`	Average non-probe trial duration
`probe_duration`	Probe trial duration (if exists)
`has_probe`	Whether environment had a probe trial

Task-Level Outcomes (`sub-XXX_task_outcomes.csv`)

One row per subject/run:

Column	Description
`subject_id`	Subject identifier
`run_suffix`	`C`, `S`, or empty
`n_environments`	Number of environments processed
`n_complete_environments`	Environments with probe trials
`total_time`	Total task time
`total_np_trials`	Total non-probe trials across all environments
`total_probe_trials`	Total probe trials
`avg_np_per_environment`	Average non-probe trials per environment
`avg_env_duration`	Average environment duration
`avg_np_duration`	Average non-probe trial duration (across all)
`avg_probe_duration`	Average probe trial duration (across all)

Group-Level Aggregation (`all_outcomes.csv`)

Task-level outcomes aggregated across all subjects, saved to the output root directory.

Preference Aggregation

The preference analysis script aggregates preference responses and applies classification schemes.

Usage

# Aggregate preferences from all subjects
python run_preference_analysis.py --data-dir ../data --output-dir ../output

Input Format

Preference CSV files with columns:

ID - Subject identifier
MazeName - Environment/maze name
Preference - Place or Response

Output (`all_preferences.csv`)

Column	Description
`subject_id`	Subject identifier
`n_environments`	Number of environments with preference data
`n_place`	Number of Place responses
`n_response`	Number of Response responses
`pct_place`	Percentage Place responses
`pct_response`	Percentage Response responses
`majority`	Classification: allocentric if >= 50% Place
`ego_presence`	Classification: allocentric only if 100% Place
`ego_exclusive`	Classification: egocentric only if 0% Place

Classification Schemes

Scheme	Allocentric	Egocentric
Majority	>= 50% Place	< 50% Place
Ego_Presence	100% Place	< 100% Place
Ego_Exclusive	> 0% Place	0% Place

Command Line Options

FSL Timing Generator (`create_ymaze_timing.py`)

usage: create_ymaze_timing.py [-h] [--data-dir PATH] [--output-dir PATH]
                               [--subject-prefix PREFIX] [--debug] [--no-plots]
                               [--verbose] [--log-file PATH]

options:
  --data-dir, -d        Path to data directory (default: ./data)
  --output-dir, -o      Path to output directory (default: ./output)
  --subject-prefix, -p  Prefix for subject IDs (default: sub)
                        Numbers are extracted from folder names:
                        NAV066 → sub-066, NAV123 → sub-123
  --debug               Save intermediate processing files
  --no-plots            Skip trajectory visualizations
  --verbose, -v         Enable verbose logging
  --log-file            Save logs to file

Outcome Measures Calculator (`create_ymaze_outcomes.py`)

This script can run in two modes:

Use existing files: If labeled_final.csv files exist (from create_ymaze_timing.py), use those
Process raw data: If no labeled files found, process raw coordinate files directly from --data-dir

usage: create_ymaze_outcomes.py [-h] [--data-dir PATH] [--output-dir PATH]
                                 [--subject-prefix PREFIX] [--verbose] [--log-file PATH]

options:
  --data-dir, -d        Path to data directory with raw coordinate files
                        (used only if no labeled_final.csv files found)
  --output-dir, -o      Path to output directory (default: ./output)
  --subject-prefix, -p  Prefix for subject IDs when processing raw files (default: sub)
  --verbose, -v         Enable verbose logging
  --log-file            Save logs to file

Preference Aggregation (`run_preference_analysis.py`)

usage: run_preference_analysis.py [-h] [--data-dir PATH] [--output-dir PATH]
                                   [--verbose] [--log-file PATH]

options:
  --data-dir, -d        Path to data directory containing preference files (default: ./data)
  --output-dir, -o      Path to output directory (default: ./output)
  --verbose, -v         Enable verbose logging
  --log-file            Save logs to file

Trial Labeling Scheme

Label	Description
`np1`	First non-probe trial (always assigned to first row)
`np2`, `np3`, ...	Subsequent non-probe trials
`p`	Probe trial (last teleportation in each environment)
`np1end`, `np2end`, ...	End markers for non-probe trials
`pend`	End marker for probe trials

Troubleshooting

"Data directory not found"

Make sure you have a data/ folder in your current directory, OR
Specify the path explicitly: --data-dir /your/path/to/data

"Output directory not found"

Create an output/ folder in your current directory, OR
Specify the path explicitly: --output-dir /your/path/to/output

"No coordinate file found"

Check that files follow the naming convention: coordinates_SUBID.csv (e.g., coordinates_NAV066.csv)
Files must contain "coordinates" (case-insensitive) in the filename
Verify files have .csv extension

"Missing required columns"

Column names may have leading spaces (the loader handles this)
Ensure columns exist: Environment, Cummulative_Time, X, Z

Advanced Configuration

Most users won't need to change these settings, but if needed, you can edit config.py:

@dataclass
class TeleportConfig:
    distance_threshold: float = 5.0      # Min distance to count as a teleport
    target_positions: List[Tuple[float, float]] = field(
        default_factory=lambda: [(36.52, 22.0), (0.0, -32.0)]  # Used for visualization only
    )

Note: The pipeline auto-detects start positions from the data, so target_positions is only used as a fallback for visualization if no teleports are found.

Advanced Usage

Processing a single subject programmatically

from data_io import load_coordinates
from processing import detect_teleportations, label_trials, extract_timing

# Load data
df = load_coordinates("/path/to/coordinates_NAV066.csv")

# Detect teleportations
teleport_df, events = detect_teleportations(df)

# Label trials
labeled_df, summary = label_trials(teleport_df)

# Extract timing
timing = extract_timing(labeled_df)

print(f"Found {len(timing.trials)} total trials")
print(f"Max trial number: {timing.max_trial_num}")

Calculating outcome measures programmatically

from post import calculate_outcome_measures

# After extracting timing (see above)
outcomes = calculate_outcome_measures(timing, subject_id="sub-066")

# Access outcome data
print(f"Total trials: {len(outcomes.trial_outcomes)}")
print(f"Environments: {len(outcomes.env_outcomes)}")
print(f"Task duration: {outcomes.task_outcome.total_time:.1f}s")

# Convert to DataFrames
trial_df = outcomes.trial_outcomes_to_dataframe()
env_df = outcomes.env_outcomes_to_dataframe()
task_df = outcomes.task_outcome_to_dataframe()

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data_io		data_io
processing		processing
visualization		visualization
.gitignore		.gitignore
README.md		README.md
config.py		config.py
create_ymaze_outcomes.py		create_ymaze_outcomes.py
create_ymaze_timing.py		create_ymaze_timing.py
run_preference_analysis.py		run_preference_analysis.py

npresearchlab/ymaze_analysis

Folders and files

Latest commit

History

Repository files navigation

Y-Maze Behavioral Analysis Pipeline

Overview

Installation

Quick Start

1. Download the pipeline

2. Set up your folder structure

3. Run the analysis

Input Data Format

Coordinate Files

Task Restart Handling

Preference Files

Transition Period Handling

Probe Trial Detection

Expected Directory Structure

File Naming Convention

Output Files

Labeled Trials Scheme

FSL Timing Format

Outcome Measures

Trial-Level Outcomes (sub-XXX_trial_outcomes.csv)

Environment-Level Outcomes (sub-XXX_env_outcomes.csv)

Task-Level Outcomes (sub-XXX_task_outcomes.csv)

Group-Level Aggregation (all_outcomes.csv)

Preference Aggregation

Usage

Input Format

Output (all_preferences.csv)

Classification Schemes

Command Line Options

FSL Timing Generator (create_ymaze_timing.py)

Outcome Measures Calculator (create_ymaze_outcomes.py)

Preference Aggregation (run_preference_analysis.py)

Trial Labeling Scheme

Troubleshooting

"Data directory not found"

"Output directory not found"

"No coordinate file found"

"Missing required columns"

Advanced Configuration

Advanced Usage

Processing a single subject programmatically

Calculating outcome measures programmatically

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2