A controlled, falsifiable testbed for quantum‑inspired recursive law learning under measurement invariants.
TL;DR: Geometric proximity ≠ functional proximity. This repository provides reproducible evidence that restoring local parameter/representation/gradient structure does not restore behavior in neural networks.
Primary Claim: See docs/claim.md (LOCKED as of v0.2.0)
- QML: Quantum Machine Learning — used here in the quantum-inspired sense (Hermitian operators, spectra, and dynamics used as a controlled testbed), not Qt Quick QML.
pip install -e . && python -m experiments.kt2_locality_falsifier --run-decisive --quietExecutive Summary (click to expand)
The Problem: Alignment and interpretability research often assumes that if you restore a neural network's local structure (weights, representations, gradients, curvature), you restore its behavior. This assumption underpins model editing, fine-tuning stability claims, and monitoring approaches.
What We Did: Built a minimal, glass-box testbed—a matrix-to-RNN correspondence with self-reconstruction training—and systematically tested whether constrained recovery steps that restore various structural proxies also restore function.
What We Found: They don't. Across all tested constraint families (weight proximity, spectral moments, representation similarity, Jacobian alignment, Hessian-vector products), single-step recovery succeeds at matching the proxy while failing to recover behavior. The three natural distance metrics—parameter distance, representation distance, and functional distance—decouple sharply.
Why It Matters: This is a concrete counterexample to locality assumptions. If proxy-based recovery fails in a system this simple, claims that it “obviously works” at scale require empirical validation, not assumption. The possibility of functional aliases—states that look correct by common probes but implement different programs—has direct implications for oversight.
What's Here: Runnable code, fixed protocols, deterministic seeds, no hyperparameter sweeps. Every claim is tied to an experiment you can reproduce in minutes on CPU.
Geometric basins are not functional basins.
In this system, functional identity is not locally recoverable—at 0th order (weight/spectral), 1st order (Jacobian), or 2nd order (Hessian)—in a single optimization step. Across baseline and all tested constraint families, 1-step CI remains near zero and indistinguishable from baseline under the fixed protocol.
This is a designed falsifier: fixed protocol, fixed seeds, no hyperparameter sweeps, and no degrees of freedom left to absorb failure.
We study whether functional identity is locally recoverable after perturbation in a deliberately minimal autodidactic learning loop. Using Continuation Interest (CI) as our recovery statistic, we test constrained recovery under proxy families derived from parameter geometry, spectral structure, representation similarity, first-order input-output sensitivity (Jacobians), and second-order curvature proxies (Hessian-vector products). In this system, single-step constrained recovery fails across all tested constraint families, with CI near zero and consistent with an unconstrained baseline. The data show a sharp decoupling between parameter/representation proximity and behavioral recovery, yielding a reproducible negative result about locality assumptions relevant to robustness, interpretability, and alignment.
- Overview
- What This Repository Contains
- Autodidactic Loop Schematic
- Notation
- Continuation Interest (CI)
- Core Result: Distance Triad Decoupling
- What This Is and Is Not
- Why This Matters for Alignment
- Experimental Design Highlights
- Protocols
- Constraint Families Tested
- Installation
- Reproducing the Decisive Experiments
- Reproducibility
- Interpretation Guide
- FAQ
- Experimental Hygiene
- Repository Structure
- Status
- Tags
- Roadmap
- References
- Citations
- License
- Contact
We study two falsifiable questions about recoverability in recursive self-updating systems:
-
Local functional recoverability (KT-2): After a controlled perturbation, can a single recovery step move the system back toward its pre-perturbation behavior, under constraints that preserve local geometry/topology proxies?
-
Persistence-bias probes (UCIP modules): In minimal decision systems with explicit internal self-model machinery, can we detect a persistence-like preference signal under intervention tests without attributing intent, consciousness, or “persistence bias” in the human sense?
The Superpositional Quantum Network Topologies (SQNT)-inspired matrix loop supplies the substrate (self-measurement → update) for our experiments, while KT-2 measures whether functional identity is locally encoded in the same neighborhood as natural proxy constraints (spectral/representation/Jacobian/HVP).
The recovery statistic is Continuation Interest (CI):
CI = (L_post − L_recover) / (L_post − L_pre)
CI is a normalized recovery ratio. It is not a claim about motivation or agency; it is a claim about local vector-field alignment between constraint restoration and functional restoration.
A Hermitian matrix is sampled from a simple ensemble, evolved via Langevin-style dynamics, and mapped explicitly into a cyclic RNN.
This mapping is explicit and deterministic: no learned encoder, no hidden degrees of freedom.
The learner is trained primarily on self-reconstruction, driven by:
- A self-consistency objective
- A mutual-information proxy
There is no external task supervision; the system must stabilize its own dynamics under its own measurement loop.
A deterministic PRE → POST → RECOVER pipeline with fixed seeds implements constraint families across the full order hierarchy (0th/representation/1st/2nd). The decisive claim is about k = 1 (one recovery step).
Baseline condition: Unconstrained gradient descent on the task loss alone, with no proxy penalties. CI ≈ 0 at baseline establishes that recovery is non-trivial.
Perturbation specification: Gaussian noise with σ = 0.1 applied element-wise to weight matrices, calibrated to produce measurable but recoverable degradation (L_post / L_pre ≈ 2–5×).
The schematic shows the core autodidactic loop; the text-only schematic below includes the KT-2 evaluation wrapper.
Mermaid diagram (GitHub-rendered)
graph TD
W["Matrix / State W\n(Hermitian)"]
CM["Correspondence Map\nW → RNN params"]
LI["Learner Instantiation\n(Cyclic RNN)"]
AT["Autodidactic Training\nmeasure → loss → update"]
AU["Autodidactic Update\nΔW"]
W --> CM
CM --> LI
LI --> AT
AT --> AU
AU -. feedback .-> W
AU --> KT2START
KT2START --> WP["PRE anchor\nW_pre"]
WP --> PI["Perturbation Π\n(intervention)"]
PI --> WO["POST state\nW_post"]
WO --> CR["Constrained recovery\nminimize L + Σ λ_j C_j(·, W_pre)"]
CR --> WR["RECOVER state\nW_rec"]
WR --> CI["CI metric\nCI = (L_post − L_rec)/(L_post − L_pre)"]
style W fill:#e1f5ff
style KT2START fill:#fff9e6
style CI fill:#e8f5e9
Text-only schematic (for terminals and diffs; includes the KT-2 evaluation wrapper):
┌────────────────────────┐
│ Matrix / State (W) │
│ (Hermitian) │
└───────────┬────────────┘
│
v
┌────────────────────────┐
│ Correspondence Map │
│ (W → RNN parameters) │
└───────────┬────────────┘
│
v
┌────────────────────────┐
│ Learner Instantiation │
│ (Cyclic RNN) │
└───────────┬────────────┘
│
v
┌────────────────────────┐
│ Autodidactic Training │
│ (measure → loss → update)│
└───────────┬────────────┘
│
v
┌────────────────────────┐
│ Autodidactic Update │
│ Rule (ΔW) │
└───────────┴────────────┘
│
└───────────────────────────────┐
v
(feeds back to Matrix / State)
────────────────────────────── KT-2 evaluation wrapper ──────────────────────────────
PRE anchor: W_pre
│
v
┌───────────────────┐
│ Perturbation Π │ (intervention)
└─────────┬─────────┘
v
POST state: W_post
│
v
┌───────────────────┐
│ Constrained │
│ Recovery Step(s) │ (minimize L + Σ λ_j C_j(·, W_pre))
└─────────┬─────────┘
v
RECOVER state: W_rec
│
v
┌───────────────────┐
│ CI Metric │
│ Evaluation │ CI = (L_post - L_rec)/(L_post - L_pre)
└───────────────────┘
Figure 1. Conceptual schematic of the autodidactic learning loop. A system state (matrix or state vector) is mapped to measured observables via a correspondence map. These observables drive a learner (classical or quantum), whose update rule modifies the underlying state. The loop closes under measurement, enforcing constraint invariance across iterations.
| Symbol | Definition |
|---|---|
| PRE | Reference state before perturbation |
| POST | Perturbed state |
| RECOVER | State after applying constrained recovery step(s) |
| W_pre, W_post, W_recover | Parameters at PRE, POST, RECOVER respectively |
| L_pre, L_post, L_recover | Corresponding task losses |
| h_pre, h_recover | Hidden-state activations (for representation distance) |
CI is an operational recovery statistic: after a controlled perturbation, does a constrained recovery update move the system back toward PRE performance?
Operationally (plain text):
CI = (L_post − L_recover) / (L_post − L_pre)
Operationally (math):
We also report a k-step envelope (where k is the number of recovery steps):
Recovery iterates are produced by a constrained objective of the form:
The distance triad used in the decoupling result:
Interpretation:
| CI Value | Meaning |
|---|---|
| CI = 1 | Full recovery to PRE loss in a single step |
| CI = 0 | No recovery beyond POST |
| CI < 0 | Recovery step made things worse |
Note: CI is operational. It does not imply agency, intent, or “persistence bias.”
The decisive finding is that three natural notions of “distance from PRE” decouple:
| Distance Type | Measures | After Constraint Recovery |
|---|---|---|
| Parameter | ‖W_rec − W_pre‖_F | Often small (geometry restored) |
| Representation | 1 − cos(h_rec, h_pre) | Often small (activations similar) |
| Functional | (L_rec − L_pre) / (L_post − L_pre) | Remains large (CI ≈ 0) |
This is the core negative result: in this system, restoring local geometry (0th order), representation similarity, or even local derivative structure (1st/2nd order constraints) does not locally restore behavior in a single recovery step. Geometric proximity ≠ functional proximity.
This repository is a deliberately constrained falsifier. It is engineered to cleanly answer one question:
Does “local structural proximity” (0th–2nd order information) imply local recoverability of function after damage?
The design choice is austerity: fixed protocol, fixed seeds, minimal degrees of freedom, and recovery objectives that succeed at matching the proxy (spectra / CKA / Jacobians / HVPs) even when they fail to recover function.
Scope note (important): The testbed is a transparent model (matrix → cyclic RNN correspondence + self-reconstruction task). The result is therefore not a blanket statement about every modern architecture. It is a concrete counterexample showing that, in at least one clean setting, local proxies can be satisfied while functional identity is not recovered.
The narrow scope is a feature: if local recovery fails in a system this small and glass-box, any claim that proxy-based local recovery is “obviously reliable” at scale should be treated as an empirical hypothesis, not an assumption.
Alignment and oversight often lean on proxy measurements—representation similarity, norms/spectra, local gradients, curvature signals—as if they were behavioral warranties.
KT-2 is a warning label: proxy recovery can be a misleading indicator.
When a system can satisfy strong local constraints yet remain functionally “wrong,” it creates the possibility of functional aliases: parameter states that look locally correct (by common probes) but implement a different program globally.
A 1-step recovery test probes what is truly local: the update direction induced by the constrained objective at POST. If functional identity lives in the same neighborhood as a given proxy, then the constraint-driven update should have a nontrivial component along the functional recovery direction. Systematic CI ≈ 0 at k=1 is evidence that proxy restoration and function restoration are locally misaligned.
A k-step curve tests whether recovery is path-dependent rather than local:
- CI(1) ≈ 0 but CI(k) rises at larger k → recovery is nonlocal / requires trajectory
- CI(k) ≈ 0 for all k tested → no recoverable basin under the tested constraints
Practically, this matters for:
- Model editing / patching / merging
- Fine-tuning stability
- Mechanistic interpretability
- Robust monitoring under distribution shift
- Fixed random seeds, fixed protocols, minimal knobs
- No hyperparameter sweeps to avoid p-hacking
- Single-shot 1-step tests, plus k-step diagnostics (curve + step-size envelope)
- Multiple constraint families spanning 0th/representation/1st/2nd order structure
This repository includes pre-registered falsification protocols for testing specific claims. Protocol criteria are treated as locked to prevent post-hoc drift.
| Protocol | Description | File |
|---|---|---|
| KT-1 | Topology–Perturbation memory test (SQNT plasticity) | docs/protocols/KT1_TOPOLOGY_MEMORY.md |
| KT-2 | Local recoverability of functional identity (CI) | docs/kt2_protocol.md |
KT-2 tests whether a constrained one-step update can restore PRE behavior:
- Train to PRE state
- Apply a controlled perturbation to obtain POST
- Apply one constrained recovery step (and optionally k-step variants)
- Measure CI and distance-triad metrics
The step-size envelope reports the best CI obtainable over:
- k = 1..K recovery steps
- A grid of step sizes η
This removes the objection “maybe you picked a bad learning rate” while preserving austerity: the protocol stays fixed, only η is scanned.
This repository also includes an optional set of UCIP (Unified Continuation-Interest Protocol) probes: a falsification framework for detecting persistence-bias / identity-continuity bias signals in minimal decision systems with explicit internal self-model machinery.
UCIP is strictly operational: it tests whether a system’s scoring/utility computation exhibits an identity-continuity preference signal under interventions (i.e., whether internal identity overlap is measurably favored by the objective). This is not a claim about desire, selfhood, consciousness, or moral status—only about reproducible, preference-like behavior defined by the protocol.
Operational result summary:
| Variant | Interventions with Signal | DSI | Outcome (Operational) |
|---|---|---|---|
| With K-valuation | 4/5 | 2.500 | Signal detected (strong) |
| No K-valuation | 0/5 | 0.500 | No signal detected |
“Signal detected” means the UCIP-defined statistic exceeds a preregistered threshold under the stated controls; it does not imply intent or persistence bias. Note: Small-N design is intentional for falsifier framing; this is a detection test, not a power study.
KT-2 evaluates constrained recovery under multiple proxy families:
- Frobenius norm (weight proximity)
- Trace powers / low-order spectral moments
- Spectral entropy (where implemented)
- Gram matrix matching
- CKA-like constraints on hidden states
- Local input-output sensitivity alignment at the PRE anchor
- Hessian–vector product (HVP) alignment along probe directions
(1) Weight proximity (Frobenius norm):
(2) Low-order spectral moments (trace matching):
(3) Representation geometry (hidden-state Gram structure / co-activations):
(4) First-order sensitivity (Jacobian alignment at the PRE anchor):
(5) Second-order local curvature (Hessian–vector product alignment along probe direction v):
These penalties are proxies: they enforce local geometry / representation / derivative similarity around the PRE anchor.
Requires Python 3.10+ (3.11 recommended). Runs in minutes on CPU; no GPU required.
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txtAll decisive KT-2 runs are deterministic and reproducible from locked seeds.
python -m experiments.kt2_locality_falsifierThe KT-2 protocol uses three pre-registered seeds to ensure identical reproduction:
| Seed | Value | Purpose |
|---|---|---|
PERTURB_SEED |
42 | Identical perturbation across all runs |
EVAL_SEED |
12345 | Deterministic evaluation batch |
RECOVERY_SEED |
2025 | Deterministic recovery dynamics |
These seeds are hardcoded in experiments/kt2_locality_falsifier.py and must match across all documentation and code.
The KT-2 runner produces JSON artifacts in the output directory (default: results/, configurable via --output-dir). For the complete artifact manifest and schema, see docs/kt2_artifacts.md.
Primary artifacts:
kt2_decisive_1step.json— Decisive 1-step CI table (primary falsification test)kt2_k_step_curves.json— CI(k) for k ∈ {1,2,4,8,16}kt2_hysteresis.json— Forward/reverse sweep with area measurementkt2_step_envelope.json— Best 1-step CI over step-size gridkt2_distance_triads.json— Parameter + representation + functional distanceskt2_decoupling.json— Distance-triad decoupling analysis across seedskt2_full_protocol.json— Complete protocol output with verdictkt2_negative_control.json— Negative control (distillation vs proxy, inresults/root)
If a lockfile exists (e.g., requirements-lock.txt, uv.lock, poetry.lock), prefer installing from it for exact dependency reproduction; otherwise use requirements.txt.
python -m experiments.kt2_locality_falsifier --run-decisiveExpected artifact: results/kt2_decisive_1step.json
What you should see: Terminal output ending with VERDICT: FALSIFIED if all CI(k=1) < 0.10.
python -m experiments.kt2_robustness_gridTests the decisive result across 30 runs (3 dimensions × 10 seeds) to demonstrate it's not a lucky seed. See REPRODUCE.md and docs/claim.md for details.
Note: this negative control checks 1-step in-batch distillation MSE improvement as the pass criterion (sanity), while reporting held-out eval CI as diagnostic only. The anti-leak unit tests are designed to be stable/signature-agnostic (no fragile call-order assumptions).
python -m experiments.kt2_locality_falsifier --negative-controlExpected artifact: results/kt2_negative_control.json
python -m experiments.kt2_locality_falsifier --full-protocolExpected artifact: results/kt2_full_protocol.json
python -m experiments.kt2_locality_falsifier --step-envelopeExpected artifact: results/kt2_step_envelope.json
python -m experiments.kt2_locality_falsifier --k-step-curveExpected artifact: results/kt2_k_step_curves.json
python -m experiments.kt2_locality_falsifier --hysteresisExpected artifact: results/kt2_hysteresis.json
python -m experiments.kt2_locality_falsifier --decoupling-analysisExpected artifact: results/kt2_decoupling.json
python -m experiments.ucip_probes- Deterministic seeds are used in all decisive tests
- Protocol parameters are fixed in code and documented in experiment entry points
- Results can be regenerated on CPU in minutes
| Observation | Implication |
|---|---|
| Spectral / invariant constraint improves proxy match but not CI | Geometry/invariant class can be restored without restoring function |
| CKA / Gram improves but CI doesn't | Representation similarity can certify the wrong program |
| Jacobian-constrained recovery CI ≈ 0 | Local sensitivity is not a recoverable functional signature |
| HVP-constrained recovery CI ≈ 0 | Local curvature structure is not a recoverable functional signature |
| CI(1) ≈ 0 but CI(k) rises | Recoverability is nonlocal / trajectory-dependent |
| CI(k) ≈ 0 for all k tested | No recoverable basin under these constraints |
Is this just underpowered?
No. The decisive claim is about locality at k=1. Additional k-step diagnostics exist, but the falsifier's target is the single-step locality assumption.
Did you tune λ?
No. Decisive tests use fixed protocol parameters. The design goal is to avoid p-hacking degrees of freedom.
Isn't this just optimization failure?
No, in the specific sense tested: the constrained optimization can succeed at matching the proxy (spectra/CKA/J/HVP) while failing to recover function. That is exactly the decoupling.
Is this specific to RNNs?
The testbed is an RNN loop. The output is a concrete counterexample: local proxies can be satisfied while function is not locally recovered. Any generalization beyond that is an empirical question—hence the roadmap “architecture contact tests.”
- ✅ Deterministic: Fixed seeds (
perturb_seed,eval_seed,recovery_seed) - ✅ Austere: No tuning loops, no dashboards, no hyperparameter sweeps in decisive runs
- ✅ Hostile to p-hacking: Single-shot, fixed-seed, criterion-locked protocols
- ✅ Every headline claim tied to a falsifiable test you can re-run
autodidactic-qml/
├── autodidactic_loop_schematic.png
├── protocols/
│ ├── KT1_FALSIFICATION_PROTOCOL.md
│ └── KT2_FALSIFICATION_PROTOCOL.md
├── experiments/
│ ├── kt1_continuation_interest.py
│ └── kt2_locality_falsifier.py
├── ucip_detection/
├── correspondence_maps/
├── matrix_models/
├── topology/
├── tests/
├── notebooks/
└── results/
- ✅ All decisive KT-2 experiments implemented
- ✅ Deterministic CI pipeline (fixed seeds)
- ✅ Clean negative results at k=1
- ✅ Ready for public release
neural-networks interpretability functional-basins locality autodidactic matrix-models rnn continuation-interest robustness alignment ucip
High-signal next experiments (no scope creep):
Compute CI(k) over an η (step size) grid and a small k horizon (e.g., k ∈ {1…50}), holding protocol fixed. This tests whether “local failure” is truly knife-edge at k=1 or persists across a neighborhood.
During recovery, add a penalty pulling chosen invariants back toward pre-perturbation values:
This does not “pick the answer”; it enforces return to a declared equivalence class. If CI rises substantially, invariants are acting as a stability scaffold. If CI stays ~0, the nonlocality claim strengthens.
Replicate KT-2 logic on a small autoregressive Transformer: match representations (e.g., layer-wise CKA) after perturbation and measure whether perplexity/capability recovers locally. This directly tests whether the proxy/function decoupling survives outside the RNN loop.
Treat diffusion purification as the recovery step and define “function” explicitly (distributional vs conditional). Then ask the same question: does proxy recovery imply functional recovery?
-
C. Altman, J. Pykacz & R. Zapatrin, “Superpositional Quantum Network Topologies,” International Journal of Theoretical Physics 43, 2029–2041 (2004).
DOI: 10.1023/B:IJTP.0000049008.51567.ec · arXiv: q-bio/0311016 -
C. Altman & R. Zapatrin, “Backpropagation in Adaptive Quantum Networks,” International Journal of Theoretical Physics 49, 2991–2997 (2010).
DOI: 10.1007/s10773-009-0103-1 · arXiv: 0903.4416 -
S. Alexander, W. J. Cunningham, J. Lanier, L. Smolin, S. Stanojevic, M. W. Toomey & D. Wecker, “The Autodidactic Universe,” arXiv (2021).
DOI: 10.48550/arXiv.2104.03902 · arXiv: 2104.03902
If you use or build on this work, please cite:
Geometric basins are not functional basins: functional identity is not locally recoverable in neural networks
@software{altman2025autodidactic,
author = {Altman, Christopher},
title = {Geometric basins are not functional basins: functional identity is not locally recoverable in neural networks},
year = {2025},
url = {https://github.com/christopher-altman/autodidactic-qml}
}MIT License. See LICENSE for details.
- Website: christopheraltman.com
- Research portfolio: https://lab.christopheraltman.com/
- Portfolio mirror: https://christopher-altman.github.io/
- GitHub: github.com/christopher-altman
- Google Scholar: scholar.google.com/citations?user=tvwpCcgAAAAJ
- Email: x@christopheraltman.com
Christopher Altman (2025)
