How to use CoreML on MacOS #81

fabiodcorreia · 2025-08-03T12:31:26Z

fabiodcorreia
Aug 3, 2025

I'm trying to enable CoreML on a FeatureExtraction pipeline, using a M1 and the model jinaai/jina-embeddings-v2-base-code.

I started the session with

session, err := hugot.NewORTSession(options.WithOnnxLibraryPath(onnxPath), options.WithCoreML(make(map[string]string)))

Then I create the pipeline

config := hugot.FeatureExtractionConfig{
	ModelPath: modelPath,
	Name:      modelName,
}

p, err := hugot.NewPipeline(session, config)

And finaly I run the pipeline

results, err := pipeline.RunPipeline([]string{text})

I build with ORT tag and if I remove the options.WithCoreML(make(map[string]string)) everything runs fine on the CPU, but when I add the option for CoreML it still only runs on the CPU (at least I don't see much GPU usage) and I get errors like this

Context leak detected, msgtracer returned -1

Maybe its the empty map of options but I couldn't find any references for the possible values of the options, only for the old version where it used uint.

Answered by fabiodcorreia

Aug 4, 2025

With python I got the same behavior but shows more warning logs.

The Context leak detected, msgtracer returned -1 if I set the flag RequireStaticInputShapes=1 and it's also a lot faster compared to the value 0. With 0 most of the times consumes all memory of the computer.
Looks like ONNX is dumping everything on the CPU, regardless if I run just with CPU or with CoreML with any combination of flags.

So in summary the bheavior of hugot is the same as python. Bad news for me because after a run of 150 files I can fry eggs on my macbook bottom part and my goal is to reach 4k files :/

Also I don't see any advantage of CoreML vs Onnx only, at least for text and cross encoding text. Maybe for…

View full answer

RJKeevil · 2025-08-03T13:40:52Z

RJKeevil
Aug 3, 2025
Maintainer

Hi @fabiodcorreia, it would be good to work through this together, as neither myself or Riccardo have any apple devices to test on. So far I just enabled the execution provider that was added in this commit yalue/onnxruntime_go@217ac59 . It shows the options that can be set, which are also documented here https://onnxruntime.ai/docs/execution-providers/CoreML-ExecutionProvider.html. Could you also send me the path to the ORT library you installed, was it the one here? https://github.com/microsoft/onnxruntime/releases/download/v1.22.0/onnxruntime-osx-arm64-1.22.0.tgz

Perhaps also try the args from this pyhton example:

providers = [
    ('CoreMLExecutionProvider', {
        "ModelFormat": "MLProgram", "MLComputeUnits": "ALL", 
        "RequireStaticInputShapes": "0", "EnableOnSubgraphs": "0"
    }),
]

3 replies

fabiodcorreia Aug 3, 2025
Author

Thank @RJKeevil for the quick reply.

Yes, I'm using the runtime 1.22 arm64, downloaded from that exact url, added the flags from the python example but I got the same behavior.

Can I change something on hugot code to help to debug? I have a clone of the repo main branch where I'm doing some experiments to implement a crossencoder pipeline so I can use it to throubleshoot.

RJKeevil Aug 3, 2025
Maintainer

Theres not much debugging available for this on the Hugot side unfortunately, as go is essentially handing the parameters to onnx runtime's C backend, which then becomes a bit of a black box to Hugot. I see it is also being reported here on onnxruntime directly microsoft/onnxruntime#22007. Kind of implies the coreML provider is running, but it throws these messages during runtime? Perhaps run some tests with and without coreML, and with different options, to see if you are getting any difference in runtime?

Also have you tried coreML with onnx runtime with e.g. the python transformers library? If that also has the same issue its more likely to be a dependency or environment problem, good to rule that out?

RJKeevil Aug 3, 2025
Maintainer

Some models like https://huggingface.co/spaces/aletrn/samgis/blob/a4ee3c965be20155960df87217cf5ed7041f01ed/README.md also mention this issue, can also test with e.g. minilm-l6-v2 to see if the error is model specific

fabiodcorreia · 2025-08-03T16:26:01Z

fabiodcorreia
Aug 3, 2025
Author

Here is what I found so far.

Generating embeddings from 57 files with chunks of up to 2500 chars, using the model jina-embeddings-v2-base-code with one goroutine per core I have the following results.

Using just ONNX without CoreML takes ~1:30s and the memory usage is flat around 2gb

Enabling CoreM:

Empty flags takes +10min and 2x Context leak detected, msgtracer returned -1, also the memory usage is all over the place up and down until around 39 files processed drains all memory and freezes the laptop.
ModelFormat=NeuralNetwork shows the same behavior as empty flags
ModelFormat=MLProgram shows the same behavior as empty flags

When using MLProgram with other flags like MLComputeUnits looks like it goes back to onnx without coreML because it takes the same time, memory also flat around 2gb and no GPU usage.

Using a smaller model sentence-transformers/all-mpnet-base-v2, the ONNX without CoreML take 14s and with CoreML 44s, looks like regardless the scale with CoreML it's always slowers, but the strange is that I don't see any GPU activity.

Besides CoreML there are other options to use MacOS Metal? I would prefer ORT because I already have the scripts to download the onnx runtime and everything ready for cross compilation and XLA looked more complex to release as dependency.

3 replies

RJKeevil Aug 3, 2025
Maintainer

The full list of execution providers are here (https://onnxruntime.ai/docs/execution-providers/) , but unfortunately while the linux/windows options are many, I think Mac is possibly just CoreML. One think i did see here is that CoreML says "It is recommended to use Apple devices equipped with Apple Neural Engine to achieve optimal performance" - perhaps that would explain why your gpu usage is low, its using an NPU instead?

Either way that does not explain why it is slower than cpu. I think confirming with the python tranformers library that CoreML can be faster than CPU is the next step im afraid, i suspect this issue is going to be purely in the ORT layer (unless we are missing some important config flag).

fabiodcorreia Aug 3, 2025
Author

I will see if I can make it work with python, I always have problems setting up pythons envs.

About the NPU I don't think it's using it, because if I set the flag to CPUAndGPU its a the same behavior as if I set ALL or CPU and NPU.

fabiodcorreia Aug 4, 2025
Author

With python I got the same behavior but shows more warning logs.

The Context leak detected, msgtracer returned -1 if I set the flag RequireStaticInputShapes=1 and it's also a lot faster compared to the value 0. With 0 most of the times consumes all memory of the computer.
Looks like ONNX is dumping everything on the CPU, regardless if I run just with CPU or with CoreML with any combination of flags.

So in summary the bheavior of hugot is the same as python. Bad news for me because after a run of 150 files I can fry eggs on my macbook bottom part and my goal is to reach 4k files :/

Also I don't see any advantage of CoreML vs Onnx only, at least for text and cross encoding text. Maybe for images or text inference is different.

Answer selected by RJKeevil

riccardopinosio · 2025-08-04T07:34:02Z

riccardopinosio
Aug 4, 2025
Maintainer

hi @fabiodcorreia I did start taking a look yesterday before seeing this chat on implementing the crossEncoder pipeline since everyone seems to want it lol, do you want me to leave it for you to contribute or do you prefer if I implement it?

4 replies

fabiodcorreia Aug 4, 2025
Author

Hi, I have kind of a POC that I built to implement a reranker for my app but I must say it's mostly AI generated and I refactored the Go part because I don't know much about NLP specifics part.

I can make a PR when I have some time, I would not say it's finished but I'm running it for some time during development and for most of the models that I tried it works fine (meaning no errors or crashing) and same results as using the python crossencoder api.

riccardopinosio Aug 4, 2025
Maintainer

If you want to point me to the branch where you implemented it I can take a look and finish it up for inclusion into hugot

fabiodcorreia Aug 4, 2025
Author

I have it on my fork https://github.com/fabiodcorreia/hugot

riccardopinosio Aug 4, 2025
Maintainer

thanks it looks like it's a great starting point to integrate it into hugot with a bit of cleanup, will take a look at it later today

How to use CoreML on MacOS #81

Uh oh!

fabiodcorreia Aug 3, 2025

Replies: 3 comments · 10 replies

Uh oh!

Uh oh!

RJKeevil Aug 3, 2025 Maintainer

Uh oh!

fabiodcorreia Aug 3, 2025 Author

Uh oh!

Uh oh!

RJKeevil Aug 3, 2025 Maintainer

Uh oh!

Uh oh!

RJKeevil Aug 3, 2025 Maintainer

Uh oh!

Uh oh!

fabiodcorreia Aug 3, 2025 Author

Uh oh!

RJKeevil Aug 3, 2025 Maintainer

Uh oh!

fabiodcorreia Aug 3, 2025 Author

Uh oh!

fabiodcorreia Aug 4, 2025 Author

Uh oh!

riccardopinosio Aug 4, 2025 Maintainer

Uh oh!

fabiodcorreia Aug 4, 2025 Author

Uh oh!

riccardopinosio Aug 4, 2025 Maintainer

Uh oh!

fabiodcorreia Aug 4, 2025 Author

Uh oh!

riccardopinosio Aug 4, 2025 Maintainer

fabiodcorreia
Aug 3, 2025

Replies: 3 comments 10 replies

RJKeevil
Aug 3, 2025
Maintainer

fabiodcorreia Aug 3, 2025
Author

RJKeevil Aug 3, 2025
Maintainer

RJKeevil Aug 3, 2025
Maintainer

fabiodcorreia
Aug 3, 2025
Author

RJKeevil Aug 3, 2025
Maintainer

fabiodcorreia Aug 3, 2025
Author

fabiodcorreia Aug 4, 2025
Author

riccardopinosio
Aug 4, 2025
Maintainer

fabiodcorreia Aug 4, 2025
Author

riccardopinosio Aug 4, 2025
Maintainer

fabiodcorreia Aug 4, 2025
Author

riccardopinosio Aug 4, 2025
Maintainer