Skip to content

Releases: unum-cloud/UForm

v2.1.0

14 Apr 00:50

Choose a tag to compare

2.1.0 (2024-04-14)

Add

Fix

  • Image preprocessing in Swift (f2772d0)

Improve

  • Fetching nested configs (729b9d9)

Make

v2.0.2

28 Mar 20:43

Choose a tag to compare

2.0.2 (2024-03-28)

Make

  • Fix PyPi CI version with hash (364afe6)

v2.0.1

28 Mar 20:38

Choose a tag to compare

2.0.1 (2024-03-28)

Make

Multimodal Matryoshka, Multimodal DPO, and ONNX πŸŽ‰

28 Mar 20:35

Choose a tag to compare

DPO Preview

Today we are releasing a new batch of multimodal models trained with Nebius and already available on HuggingFace πŸ€—

  1. Matryoshka style multimodal embeddings ranging from 64 to 256 and 768 dimensions πŸ–ΌοΈ
  2. Improved multimodal chat in 1.2B parameters, tuned with Direct Preference Optimization πŸ’¬
  3. ONNX backend, making PyTorch dependency optional for lightning fast deployments ⚑

v1.1.1: Polishing the Repo

23 Feb 18:14

Choose a tag to compare

Great thanks to @lmmx, @blackforestboi, and @kapulkin for their patches to the project!


  • Performance observations for M2 CPUs (#56) (8374ef6), closes #56
  • Passing labels to text_decoder to compute loss. (#65) (f445a8b), closes #65
  • Larger batch benchmarks (fdc8587)
  • pre-commit config and linters (#62) (0a3efac), closes #62

v1.1.0

15 Feb 18:08

Choose a tag to compare

1.1.0 (2024-02-15)

Add

v1.0.3

29 Dec 01:45

Choose a tag to compare

1.0.3 (2023-12-29)

Improve

v1.0.2

28 Dec 17:46

Choose a tag to compare

1.0.2 (2023-12-28)

Make

UForm v1: Multimodal Chat in 1.5 Billion Parameters

28 Dec 17:33

Choose a tag to compare

UForm v1: Multimodal Chat in 1.5 Billion Parameters

The UForm family of tiny multimodal transformer models just got bigger! In addition to the existing CLIP-like embedding models, we now have a generative model useful for image captioning, visual question answering, and multimodal chats. All that is in just a billion parameters, small enough to fit even on mobile devices πŸŽ‰

Repository: https://github.com/unum-cloud/uform
Generative model: https://huggingface.co/unum-cloud/uform-gen
Chat model: https://huggingface.co/unum-cloud/uform-gen-chat

Evaluation Metrics

Being the smallest model of its kind, unum-cloud/uform-gen is hard to compare to others. Next in size are the 5x larger LLaVAs and InstructBLIP, with 7 billion parameters. LLaVA performs noticeably better on VQAv2: 78.5 vs 66.5. On captioning, CLIPScore and RefCLIPScore are relatively close across all models.

Model Size Caption Length CLIPScore RefCLIPScore
llava-hf/llava-1.5-7b-hf 7B Long 0.878 0.529
llava-hf/llava-1.5-7b-hf 7B Short 0.886 0.531
Salesforce/instructblip-vicuna-7b 7B Long 0.902 0.534
Salesforce/instructblip-vicuna-7b 7B Short 0.848 0.523
unum-cloud/uform-gen 1.5B Long 0.847 0.523
unum-cloud/uform-gen 1.5B Short 0.842 0.522
unum-cloud/uform-gen-chat 1.5B Long 0.860 0.525
unum-cloud/uform-gen-chat 1.5B Short 0.858 0.525

Throughput

On RTX 3090, using vanilla PyTorch for inference, with bfloat16 arithmetic and greedy decoding, one should expect the following numbers for throughput.

Model Size Speed Speedup
llava-hf/llava-1.5-7b-hf 7B ~ 40 tokens/second
Salesforce/instructblip-vicuna-7b 7B ~ 40 tokens/second
unum-cloud/uform-gen 1.5B ~ 140 tokens/second x 3.5

v0.4.8

13 Oct 05:07

Choose a tag to compare

0.4.8 (2023-10-13)

Make

  • pass ANACONDA_API_TOKEN as env. var. (ed020d3)