`v0.AutomaticCuration.get_labels` unexpected behavior

### Summary

`AutomaticCuration.get_labels()` in the v0 spikesorting pipeline has three bugs that cause incorrect unit labels to be stored in `Curation.curation_labels` and propagated to `CuratedSpikeSorting`, analysis NWB files, and downstream decoding tables.

Two of these bugs (B, C) have existed since the function was introduced very early in the project (2022-03-25). A third (A) was added in PR #1281 (2025-04-22). To review impact, I have generated a temporary table [here](https://github.com/CBroz1/spyglass/blob/e6b739fc97c734caf31748e5e9e1e77231426d52/track-1281.py).

The v1 pipeline (`MetricCuration._compute_labels`) is **not affected**.

---

### Bug A: Early return after first metric

**Introduced**: PR #1281 (commit `351a698d`, 2025-04-22)

During a refactor, `return parent_labels` was indented inside the `for metric in label_params` loop, causing the function to return after processing only the first metric.

```python
# Buggy (PR #1281)                    # Fixed
for metric in label_params:           for metric in label_params:
    ...                                   ...
    return parent_labels              return parent_labels
```

**Impact**: Any entry with more than one metric in `label_params` that overlaps with `quality_metrics` will have labels from only the first metric. Labels from all subsequent metrics are silently dropped.

**Scope**: Entries created after 2025-04-22 with >1 overlapping metric. As measured by [this table](https://github.com/CBroz1/spyglass/blob/e6b739fc97c734caf31748e5e9e1e77231426d52/track-1281.py#L26), ~400 cases in the database.

---

### Bug B: List aliasing across units

**Introduced**: [commit `25689c4`](https://github.com/LorenFrankLab/spyglass/commit/25689c49daef5603bf8fbb5bf406b8b4e13688bd), 2022-03-25)

When a unit first matches a metric, the label list is assigned by reference:

```python
# Buggy (since PR #807)               # Fixed
parent_labels[uid] = label[2]         parent_labels[uid] = label[2].copy()
```

Because `label[2]` comes from `label_params[metric]`, every unit matching the same metric shares the **same list object**. When a later metric appends or extends labels on one of these units, the mutation propagates to all units that share the reference — and back into `label_params` itself.

**Impact**: Units that should have distinct labels end up with identical (and incorrect) label lists. The corruption compounds with Bug C.

**Scope**: Any entry where multiple units match the same metric, regardless of creation date. Current cases: ~144k

---

### Bug C: Duplicate label comparison

**Introduced**: [commit `25689c4`](https://github.com/LorenFrankLab/spyglass/commit/25689c49daef5603bf8fbb5bf406b8b4e13688bd), 2022-03-25)

When a unit already has labels and matches an additional metric, the code checks whether the new labels are already present:

```python
# Buggy (since PR #807)
elif label[2] not in parent_labels[unit_id]:
    parent_labels[unit_id].extend(label[2])

# Fixed
for element in label[2].copy():
    if element not in parent_labels[unit_id]:
        parent_labels[unit_id].append(element)
```

The `in` operator checks whether `label[2]` (a list, e.g. `["noise", "reject"]`) is an **element** of `parent_labels[unit_id]` (a list of strings). A list is never an element of a flat string list, so the check is always `True`, and `.extend()` always runs. This produces duplicate labels like `["noise", "reject", "noise", "reject"]`.

**Impact**: Units with pre-existing labels accumulate duplicates on every matching metric. Combined with Bug B (aliasing), the duplicates also leak to other units sharing the same list object.

**Scope**: Any entry where a unit already has labels (from `parent_labels` or a prior metric) and matches another metric, regardless of creation date. Current cases: ~146k

---

### Downstream data impact

The flawed labels propagate through the pipeline:

```
AutomaticCuration.get_labels()        ← bugs here
  → Curation.curation_labels          ← incorrect labels stored
    → CuratedSpikeSorting.make()
      → accepted_units filter          ← units with "reject" are excluded
      → CuratedSpikeSorting.Unit.label ← per-unit label column
      → AnalysisNwbfile (NWB units)    ← label column in NWB file
        → SpikeSortingOutput
          → decoding/v0 tables         ← built on wrong unit sets
```

The most critical effect is on the `accepted_units` filter: if a unit should have received a `"reject"` label but didn't (due to Bug A dropping the relevant metric), that unit's spike data is included in analysis files and downstream decoding when it should have been excluded — or vice versa.

### v1 pipeline

`MetricCuration._compute_labels()` (`v1/metric_curation.py`) is **not affected**. It pre-allocates a separate empty list per unit, uses `.extend()` without aliasing, has no list-in-list comparison, and has `return labels` correctly outside the metric loop.

### Fix

Each bug can be fixed relatively easily using a new function provided by @sytseng 

- **Bug A**: `return parent_labels` unindented to outside the `for metric` loop.
- **Bug B**: `label[2]` → `label[2].copy()` to prevent aliasing.
- **Bug C**: Per-element `if element not in parent_labels[uid]` check replaces
  the list-in-list `in` comparison.

A repair function (`AutomaticCuration.fix_15XX`) could recompute and correct labels for affected entries, including downstream `CuratedSpikeSorting.Unit` rows and NWB analysis files.

Retroactively fixing something that used to be the case (A) is a clear-cut task.
Fixing B and C is a more complex question about what previous pipeline versions
should do for reproducibility. Options include:

1. Fix all cases to the correct labels, potentially changing past results.
2. Leave past entries as-is for reproducibility, but fix going forward.
3. Alter an upstream parameter table to add a flag to request the corrected
    behavior when running, preserving past behavior by default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`v0.AutomaticCuration.get_labels` unexpected behavior #1513

Summary

Bug A: Early return after first metric

Bug B: List aliasing across units

Bug C: Duplicate label comparison

Downstream data impact

v1 pipeline

Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

v0.AutomaticCuration.get_labels unexpected behavior #1513

Description

Summary

Bug A: Early return after first metric

Bug B: List aliasing across units

Bug C: Duplicate label comparison

Downstream data impact

v1 pipeline

Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`v0.AutomaticCuration.get_labels` unexpected behavior #1513