Skip to content

Uniform Manifold Approximation with Two-phase Optimization (IEEE VIS 2022 short)

License

Notifications You must be signed in to change notification settings

hyungkwonko/umato

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

295 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UMATO

Uniform Manifold Approximation with Two-phase Optimization


Updated paper: IEEE TVCG (2025, author version)
DOI: 10.1109/TVCG.2025.3602735

UMATO is a dimensionality reduction (DR) technique designed to preserve both local neighborhoods and global manifold relationships in high-dimensional data. Existing DR methods often prioritize one side and can lead to misleading interpretations of manifold arrangement. UMATO addresses this with a two-phase optimization strategy and improves reliability for visual analytics.

Key Contributions

  • Bridges local and global structures in a single projection workflow.
  • Two-phase optimization:
    1. Build a global skeletal layout using representative (hub) points.
    2. Project and optimize remaining points while preserving regional characteristics.
  • Improved stability against initialization and subsampling variation.
  • Strong scalability and competitive runtime on large datasets.

System Requirements

  • Python 3.9 or greater
  • scikit-learn
  • numpy
  • scipy
  • numba

Installation

UMATO is available via pip.

pip install umato

Quickstart

import umato
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
emb = umato.UMATO(hub_num=50).fit_transform(X)

For detailed algorithm background, see the API documentation.

API Reference

Main Class

from umato import UMATO

model = UMATO(
    n_neighbors=50,
    n_components=2,
    hub_num=300,
    metric="euclidean",
    global_n_epochs=100,
    local_n_epochs=50,
    global_learning_rate=0.0065,
    local_learning_rate=0.01,
    min_dist=0.1,
    spread=1.0,
    gamma=0.1,
    negative_sample_rate=5,
    init="pca",
    random_state=42,
    verbose=False,
    n_jobs=None,
    execution_mode="deterministic",
)

Key Parameters

  • n_neighbors (int, default=50): neighborhood size used to build local structure.
  • hub_num (int, default=300): number of representative hubs used in the global phase (2 <= hub_num < n_samples).
  • n_components (int, default=2): output embedding dimensionality.
  • metric (str, default="euclidean"): distance metric for neighbor search (e.g., "euclidean", "cosine", "precomputed").
  • init ("pca" | "random" | "spectral" or ndarray, default="pca"): initialization for hub layout.
  • global_n_epochs / local_n_epochs (int): optimization epochs for each phase (defaults: 100 / 50).
  • global_learning_rate / local_learning_rate (float): learning rates for global and local optimization.
  • min_dist, spread: shape parameters controlling embedding compactness and spacing.
  • gamma (float, default=0.1): repulsion strength in local optimization.
  • negative_sample_rate (int, default=5): number of negative samples per positive edge.
  • random_state: seed or RandomState for reproducibility.
  • verbose (bool): print progress logs.
  • n_jobs (int | None, default=None): thread count for internal computation. Use -1 to use all available CPU cores; use a positive integer to pin a specific thread count.
  • execution_mode (str, default="deterministic"): optimization mode. Allowed values are "deterministic" and "fast".
    • "deterministic": recommended default for reproducibility.
    • "fast": may reduce runtime on large datasets by enabling more aggressive parallel updates.
    • invalid values raise ValueError.
  • Validation constraints:
    • n_jobs must be None, -1, or a positive integer. 0 and values below -1 raise ValueError.
    • execution_mode must be one of "deterministic" or "fast"; any other value raises ValueError.

Methods

  • fit(X): learn embedding from input X.
  • fit_transform(X) -> ndarray: fit and return low-dimensional embedding.

Attributes (after fitting)

  • embedding_: final embedding of shape (n_samples, n_components).
  • graph_: fuzzy simplicial graph used in optimization.

Example

import umato
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
emb = umato.UMATO(
    n_neighbors=30,
    hub_num=50,
    init="pca",
    random_state=42,
).fit_transform(X)

Speed-oriented Example

import umato
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
emb = umato.UMATO(
    n_neighbors=30,
    hub_num=50,
    init="pca",
    random_state=42,
    n_jobs=-1,
    execution_mode="fast",
).fit_transform(X)

Execution Modes

  • deterministic (default, recommended): prioritizes reproducibility and stable behavior across runs.
  • fast: may reduce runtime on large datasets. Because updates are applied in parallel order, small embedding differences can occur compared with deterministic mode.

When to Use UMATO

UMATO is particularly useful when you need to:

  • Inspect cluster-level/global manifold arrangement without giving up local neighborhood readability.
  • Reduce interpretation risk from projections that over-emphasize either local compactness or global distance alone.
  • Analyze high-dimensional datasets with a focus on reliable visual analytics.

Findings

Detailed statistical data supporting UMATO’s accuracy and scalability are shown below.

Figure 1: Accuracy Analysis between Dimensionality Reduction Techniques

Figure 1
Average scores of nine DR techniques in the accuracy analysis. UMATO substantially outperforms baselines in global metrics with a slight sacrifice in local metrics.

Figure 2: Local and Global Metric Rankings

Ranking by local/global quality metrics. UMATO shows the strongest global-structure performance among compared methods.

Figure 3: Scalability with Large Datasets

Runtime analysis for large datasets. UMATO outperforms most nonlinear baselines and shows strong practical scalability.

Figure 4: Projection Subset Analysis

Figure 4
Projection subsets from the accuracy analysis. UMATO preserves global arrangement while remaining competitive on local structure.

Figure 5: Scalability with Small Datasets

Runtime analysis for small datasets. UMATO remains efficient while maintaining projection quality.

Citation

IEEE TVCG (2025) — Recommended

@article{jeon2025umato,
  title={UMATO: Bridging Local and Global Structures for Reliable Visual Analytics with Dimensionality Reduction},
  author={Jeon, Hyeon and Ko, Kwon and Lee, Soohyun and Hyun, Jake and Yang, Taehyun and Go, Gyehun and Jo, Jaemin and Seo, Jinwook},
  journal={IEEE Transactions on Visualization and Computer Graphics},
  year={2025},
  doi={10.1109/TVCG.2025.3602735}
}

IEEE VIS (2022) — Original Conference Paper

@inproceedings{jeon2022vis,
  title={Uniform Manifold Approximation with Two-phase Optimization},
  author={Jeon, Hyeon and Ko, Hyung-Kwon and Lee, Soohyun and Jo, Jaemin and Seo, Jinwook},
  booktitle={2022 IEEE Visualization and Visual Analytics (VIS)},
  pages={80--84},
  year={2022},
  organization={IEEE}
}

About

Uniform Manifold Approximation with Two-phase Optimization (IEEE VIS 2022 short)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 9