Bug 2010698 - add a basic concept interface for an LLM by MatthewTighe · Pull Request #45 · mozilla-firefox/firefox

MatthewTighe · 2026-01-27T01:17:29Z

No description provided.

github-actions · 2026-01-27T01:17:42Z

Warning

The base branch is currently set to main. Please Edit this PR and set the base to autoland.

github-actions · 2026-01-27T01:17:43Z

View this pull request in Lando to land it once approved.

segunfamisa · 2026-01-27T13:18:20Z

@MatthewTighe I think this PR should be targeting autoland

MatthewTighe · 2026-01-28T00:43:44Z

...nts/lib/llm-gemininano/src/main/java/mozilla/components/lib/llm/gemini/nano/GeminiNanoLlm.kt

+            FeatureStatus.DOWNLOADING -> {
+                Llm.Response.InProgress("already downloading")
+            }
+            FeatureStatus.DOWNLOADABLE -> {
+                val result = model.download().onEach {
+                    logger("Download update: $it")
+                    yield()
+                }.first { status ->
+                    status == DownloadStatus.DownloadCompleted || status is DownloadStatus.DownloadFailed
+                }
+
+                if (result is DownloadStatus.DownloadFailed) {
+                    Llm.Response.Failure("download failed")
+                } else {
+                    model.getPromptResponse(prompt)
+                }


I wasn't exactly sure what to do here - in our use case we will probably intentionally block summarization from calling prompt twice but we can't guarantee it. Without knowing more about how model.download() functions, I didn't feel confident trying to share the flow between the DOWNLOADING and DOWNLOADABLE cases. For example, if it's called twice, does it avoid interrupting an existing download? Basically, I had been considering something like

Suggested change

FeatureStatus.DOWNLOADING -> {

Llm.Response.InProgress("already downloading")

}

FeatureStatus.DOWNLOADABLE -> {

val result = model.download().onEach {

logger("Download update: $it")

yield()

}.first { status ->

status == DownloadStatus.DownloadCompleted || status is DownloadStatus.DownloadFailed

}

if (result is DownloadStatus.DownloadFailed) {

Llm.Response.Failure("download failed")

} else {

model.getPromptResponse(prompt)

}

FeatureStatus.DOWNLOADING -> {

model.resumeDownload()

}

FeatureStatus.DOWNLOADABLE -> {

model.resumeDownload()

}

...

fun GenerativeModel.resumeDownload(): Llm.Response {

// if flow active already, cancel it and return Llm.Response.InProgress

// do the thing we're doing above

val result = model.download().onEach {

logger("Download update: $it")

yield()

}.first { status ->

status == DownloadStatus.DownloadCompleted || status is DownloadStatus.DownloadFailed

}

if (result is DownloadStatus.DownloadFailed) {

Llm.Response.Failure("download failed")

} else {

model.getPromptResponse(prompt)

}

}

but I don't think we have a good mechanism for canceling the in-progress Flow

I believe that cancelling the job or the scope in which we start observing the flow, should cancel the request.

We can verify this with our PoC branch. I think we should expect to see that it will cancel.

MatthewTighe · 2026-01-28T00:44:39Z

I went ahead and added an implementation as we discussed on Slack. None of what's here is necessarily expected to survive first contact with how we actually want to use it, but it felt like a good starting place.

MatthewTighe · 2026-01-28T00:48:31Z

try: https://treeherder.mozilla.org/jobs?repo=try&revision=860eb7863e7e0c12b4f184978e18f052ba298a6c

segunfamisa

Thanks for this PR. We looked at this at today's review-n-brew and there were some questions, some I could answer, and some I couldn't.

In the end, it turned out to be a good idea to try and work out an implementation together with the interface we thought about.

I posted a mix of those comments with my own review comments. Let me know your thoughts.

segunfamisa · 2026-01-28T13:31:23Z

...ndroid-components/components/concept/llm/src/main/java/mozilla/components/concept/llm/Llm.kt

+    /**
+     * A prompt request deliver to the LLM for inference.
+     */
+    suspend fun prompt(prompt: Prompt): Response


Q from review-n-brew:

We think that the prompt streams the response, and we should consider making this a Flow since we may get more than one response

I looked in the docs, and saw that the Gemini Nano one has both a streaming and non-streaming API.

Streaming API

Non-streaming API

I think our LLM abstraction should accommodate both, but if we just want to simplify things in the beginning, then we need to explicitly state the behavior of the prompt() function - whether or not it returns only when the request is complete, or it has streaming capability in the doc comment.

I really think we should consider using the streaming response, because for the non-streamed version, we will have to wait until the model is done with the inference, before the user sees anything, then we now start to fake a "streaming" UI.

That's going to feel verrrryyyy slow.

segunfamisa · 2026-01-28T13:39:16Z

...nts/lib/llm-gemininano/src/main/java/mozilla/components/lib/llm/gemini/nano/GeminiNanoLlm.kt

+            FeatureStatus.DOWNLOADING -> {
+                Llm.Response.InProgress("already downloading")
+            }
+            FeatureStatus.DOWNLOADABLE -> {


Q from review-n-brew:

How is this going to be represented to the user? Especially since there will be a download state.

segunfamisa · 2026-01-28T13:47:35Z

...nts/lib/llm-gemininano/src/main/java/mozilla/components/lib/llm/gemini/nano/GeminiNanoLlm.kt

+ * inference.
+ */
+class GeminiNanoLlm(
+    private val buildModel: () -> GenerativeModel = { Generation.getClient() },


I think this way, we stand the risk of calling buildModel more than once, and it seems like it could be expensive.

I think we could:
Create a lazy member that will call that, to ensure we only build the model once.

class GeminiNanoLlm(....) { private val model by lazy { buildModel() } }

segunfamisa · 2026-01-28T14:38:10Z

...nts/lib/llm-gemininano/src/main/java/mozilla/components/lib/llm/gemini/nano/GeminiNanoLlm.kt

+                val result = model.download().onEach {
+                    logger("Download update: $it")
+                    yield()
+                }.first { status ->
+                    status == DownloadStatus.DownloadCompleted || status is DownloadStatus.DownloadFailed
+                }


Some questions that came out

How long does this download take? Do we know the size of the download?

Do we need to communicate it to the user that we are going to need to download the model?

Do we need to do things like ensure we're on WiFi or an unmetered network before we do a download.

At what point in the user journey do we get here?

If we are going to need to let the user know that we are going to download, then we might need an intermittent state where we show a permission to the user, or some way we communicate with them that we will need to download the models.

It seems we may need a sequence diagram to surface the user flow for us, but it seems that may be dependent on UX.

This is a question I was grappling with quite a bit yesterday, and was the motivation behind the Preparing availability status. Each of our implementations will have intermittent states, but they are different.

For Nano, this will be downloading the model.
For MLPA, this will be authentication.

These are probably both worth visualizing to the user, but require us abstracting over some kind of visualization or messaging if we want to share a common API. I think for now we can just expose string (resources, once we have copy) to distinguish between the different types of preparing

I agree that we want user consent for network downloading (or at least strong network detection), but I think we could probably handle it in a follow-up and should get UX/product involvement.

I've engaged UX for this in our Slack channel

segunfamisa · 2026-01-28T14:59:14Z

...nts/lib/llm-gemininano/src/main/java/mozilla/components/lib/llm/gemini/nano/GeminiNanoLlm.kt

+            FeatureStatus.DOWNLOADING -> {
+                Llm.Response.InProgress("already downloading")
+            }
+            FeatureStatus.DOWNLOADABLE -> {
+                val result = model.download().onEach {
+                    logger("Download update: $it")
+                    yield()
+                }.first { status ->
+                    status == DownloadStatus.DownloadCompleted || status is DownloadStatus.DownloadFailed
+                }
+
+                if (result is DownloadStatus.DownloadFailed) {
+                    Llm.Response.Failure("download failed")
+                } else {
+                    model.getPromptResponse(prompt)
+                }


I believe that cancelling the job or the scope in which we start observing the flow, should cancel the request.

We can verify this with our PoC branch. I think we should expect to see that it will cancel.

segunfamisa · 2026-01-28T15:16:49Z

...ndroid-components/components/concept/llm/src/main/java/mozilla/components/concept/llm/Llm.kt

+/**
+ * An abstract definition of a LLM that can receive prompts.
+ */
+interface Llm {


I think we should consider adding the ability to warm up and close the LLM for cleaning up resources.

Can we handle this in a follow-up if and when it becomes necessary? Or do you see a need for it now?

segunfamisa · 2026-01-28T15:23:07Z

...ndroid-components/components/concept/llm/src/main/java/mozilla/components/concept/llm/Llm.kt

+    /**
+     * A prompt request deliver to the LLM for inference.
+     */
+    suspend fun prompt(prompt: Prompt): Response


I feel like the naming of this API could box us in.

For instance, If we decide to pass in various configurations into the LLM, then the parameter name prompt no longer holds true.

I think a Request is generic enough to hold a prompt and future extensions.

Would it make sense to extend the API at that point, rather than try to plan for the future? Or perhaps prompt means something specific that I'm not understanding? I take it to mean any arbitrary string instruction input into a LLM

For example, I can imagine a future where:

interface Llm { suspend fun prompt(prompt: Prompt) suspend fun request(request: Request) data class Request(val startingImage: Image, config: Config, prompt: Prompt) }

I am open to the suggestion, I guess I'm just not fully understanding what usecases you foresee us not being able to accommodate down the line.

Overall, I think handling arbitrary string prompts is already pretty open, and we can build additional functionality (image processing, etc) on top. I can't immediately think of a way to limit the prompt to only do summarization while still keep the underlying types abstract

That makes sense. I think it's fine to leave it as is. It's not a strongly held thought from my end. Just wanted to float the idea.

segunfamisa · 2026-01-28T15:30:08Z

...ndroid-components/components/concept/llm/src/main/java/mozilla/components/concept/llm/Llm.kt

+/**
+ * An abstract definition of a LLM that can receive prompts.
+ */
+interface Llm {


The more I think about it, the more the name Llm suggest something much more abstract than I am seeing in the rest of the class accommodate for.

For example, an LLM has the ability to respond to images, generate content, proof read, etc. but the GeminiNanoLlm does only summarization. If we needed to do a proof-reading Gemini nano implementation, we will not be able to use that same class, despite it being a GeminiNanoLlm

For the specifics of our case, I think what we are really doing is using its ability to summarize, and I think a name Summarizer is more fitting.

And that way, the gemini-nano one can be named OnDeviceGeminiSummarizer.

So future developers who are using other capabilities of an LLM, will not think they can use this because it's named an LLM, and also, the usage is going to be specific enough.

My proposal is something like:

@JvmInline value class SummarizationRequest(val content: String) interface Summarizer { /** This does an async summarization request. It does *not* stream the response */ suspend fun summarize(request: SummarizationRequest) : Result<SummarizationResponse> }

For example, an LLM has the ability to respond to images, generate content, proof read, etc. but the GeminiNanoLlm does only summarization.

Technically, GeminiNanoLlm as written will handle any text prompt so it could be used to generate content, proofread, etc. Yesterday I was wondering if it makes sense to build an even more opinionated API on top of it to handle summarization specifically. We will want to construct the actual somewhere. Thinking now that our planned LlmProvider might be the place to do that, and that we may need another abstraction layer between them. Something like:

// stick this in feature-summarizer class Summarizer(val llm: Llm) { private fun buildPrompt(content: String) = Prompt("Summarize the following, using all these rules.......: $content) fun summarize(content: String) { llm.prompt(buildPrompt(content)) } } object SummarizationLlmProvider { fun buildSummarizer(config: Config) { val llm = /* logic to determine nano or mlpa based on config */ return summarizer(llm) } }

WDYT?

Makes sense. I think I misunderstood the Gemini API at the time I was reviewing. I thought we were using this one https://developers.google.com/ml-kit/genai/summarization/android that is explicitly a summarizer.

That's why I was struggling to fit in the "Llm" abstraction, cos I thought we were using the Summarizer model.

That then makes me ask: Why are we using "prompt" over "summarizer"? do they use different models?

Or is it just a wrapper around the "prompt" model?

The "Summarizer" package strictly controls the output - IIRC you can only get output in a 3 bullet point format and you can't specify additional instructions. I've seen in one of the iOS briefs things like "for recipes, summarize this way..." etc.

The "prompt" model allows for a more unrestricted conversational type of model.

segunfamisa · 2026-01-28T16:28:00Z

...nts/lib/llm-gemininano/src/main/java/mozilla/components/lib/llm/gemini/nano/GeminiNanoLlm.kt

+        }
+    }
+
+    override suspend fun checkAvailability(): Llm.AvailabilityStatus = when (buildModel().checkStatus()) {


What is this API going to be used for? My suspicion is that it is for us to know if this "LLM" has the capability we want? For example, if we are running on a device that does not have GeminiNano?

And are we going to do anything specifically if it's "Preparing"?

If we want to use it to know if we have this capability or not, then I wonder if what we need for checkAvailability() is not a simple yes or no?

My line of questioning is edging towards an idea of unifying both APIs, into 1 flow, such that our result here represents the "status" of the prompt request.

something like this:

I think I was just overthinking this yesterday - combining the two APIs makes total sense. I was trying to allow for a second API consumer to inspect the model state and proactively choose not to engage with it - like if it were unavailable or already processing - but a flow already accomplishes that. Thanks for such a helpful diagram!

segunfamisa

Thanks a lot for this patch Matt.

I appreciate the effort and the conversations that have come out of it.

As we discussed on slack, I think that we have gotten the value we can get from review - and its still heading in the direction of where our design is, though specifics will evolve as the feature gets more defined.

We can proceed now, and we will iterate, instead of being blocked figuring everything out in one go.

Thank you!

Pull request: #45

lando-prod-mozilla · 2026-01-30T19:43:14Z

Pull request closed by commit 423a0aa

MatthewTighe force-pushed the s2s-concept branch from 50e995c to 7a1517c Compare January 28, 2026 00:36

MatthewTighe commented Jan 28, 2026

View reviewed changes

MatthewTighe changed the base branch from main to autoland January 28, 2026 00:44

MatthewTighe force-pushed the s2s-concept branch from 7a1517c to 6d94006 Compare January 28, 2026 00:48

segunfamisa reviewed Jan 28, 2026

View reviewed changes

MatthewTighe force-pushed the s2s-concept branch from 6d94006 to 523187c Compare January 29, 2026 01:12

segunfamisa approved these changes Jan 29, 2026

View reviewed changes

MatthewTighe force-pushed the s2s-concept branch 3 times, most recently from cb72eaa to d06df93 Compare January 30, 2026 18:56

Bug 2010698 - add concept-llm and a gemini nano lib implementation of it

9ec08fd

MatthewTighe force-pushed the s2s-concept branch from d06df93 to 9ec08fd Compare January 30, 2026 19:00

MatthewTighe added the testing-approved label Jan 30, 2026

lando-prod-mozilla bot pushed a commit that referenced this pull request Jan 30, 2026

Bug 2010698 - add a basic concept interface for an LLM r=segun

423a0aa

Pull request: #45

lando-prod-mozilla bot closed this Jan 30, 2026

Conversation

MatthewTighe commented Jan 27, 2026

Uh oh!

github-actions bot commented Jan 27, 2026

Uh oh!

github-actions bot commented Jan 27, 2026

Uh oh!

segunfamisa commented Jan 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MatthewTighe commented Jan 28, 2026

Uh oh!

MatthewTighe commented Jan 28, 2026

Uh oh!

segunfamisa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

segunfamisa Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MatthewTighe Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

segunfamisa left a comment

Choose a reason for hiding this comment

Uh oh!

lando-prod-mozilla bot commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

segunfamisa Jan 28, 2026 •

edited

Loading

MatthewTighe Jan 28, 2026 •

edited

Loading