feat: Add social media video import (YouTube, TikTok, Instagram)#6764
feat: Add social media video import (YouTube, TikTok, Instagram)#6764AurelienPautet wants to merge 17 commits intomealie-recipes:mealie-nextfrom
Conversation
… insta, facebook, youtube...)
…into video-parser
docs/docs/documentation/getting-started/installation/backend-config.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Maxime Louward <61564950+mlouward@users.noreply.github.com>
|
At a high level this looks good, I like the usage of Would it be better to build this into the URL import, instead of having a dedicated page for it? I think it would be nicer to have a single "URL" entrypoint for users (and the UI is cleaner, our import page is already a bit bloated). I haven't looked into the mechanics of how you locate the video before downloading/processing, if we're unable to do that automatically then I see a good reason to keep it as a separate page. |
|
Having it under the same url import might make api clients easier to implement. i.e. - the iOS shortcut or home assistant. iOS shortcut is the primary way I get videos into mealie. The interface of the shortcut isn't ideal for figuring out if it's a video URL. I used another repo, but I also added a "choose best thumbnail" AI step so the mealie thumbnails would be better. export async function selectBestFoodThumbnail(
thumbnailUrls: string[],
): Promise<string> {
if (thumbnailUrls.length <= 1) {
return thumbnailUrls[0] || '';
}
try {
const { text } = await generateText({
model: textModel,
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: `You are analyzing thumbnails from a cooking video to select the best one for a recipe.
Please analyze these ${thumbnailUrls.length} thumbnails and return ONLY the index number (0-${thumbnailUrls.length - 1}) of the thumbnail that:
1. Shows food most prominently
2. Has the best visual quality/clarity
3. Would be most appealing as a recipe thumbnail
Return only a single number (the index), no other text.`,
},
...thumbnailUrls.map(url => ({
type: 'image' as const,
image: url,
})),
],
},
],
});
const selectedIndex = parseInt(text.trim());
if (
isNaN(selectedIndex) ||
selectedIndex < 0 ||
selectedIndex >= thumbnailUrls.length
) {
console.warn(
'AI returned invalid thumbnail index, using first thumbnail',
);
return thumbnailUrls[0];
}
console.log(
`AI selected thumbnail ${selectedIndex} out of ${thumbnailUrls.length} options`,
);
return thumbnailUrls[selectedIndex];
} catch (error) {
console.error('Error selecting best thumbnail with AI:', error);
// Fallback to first thumbnail
return thumbnailUrls[0];
}
} |
…allback if the web scraping failed. And also better error handling + bulk video url import working
I totally agree with you. I've updated the PR so that both web and video URL scraping are handled through a single URL entrypoint (I've retained the video URL route exclusively for API use). Now, when calling
This way, all websites supported by the yt-dl library can now be used to import recipes into Mealie. |
|
this is so much better than my share to email, n8n, social to mealie automation :) |
|
I'm quite excited to get this one in, just haven't had the time to properly review and provide feedback yet! |
michael-genson
left a comment
There was a problem hiding this comment.
Overall looks great! I made a few small tweaks:
- updated docs to include version tags
- simplified the prompt a bit to be more in-line with our new prompts
I provided some feedback, only one major issue. I want to test a few different video sources and see how well it works but otherwise this is pretty close to being ready.
| "url-form-hint": "Copy and paste a link from your favorite recipe website", | ||
| "url-form-hint": "Copy and paste a link from your favorite recipe website or a link to a social media video", |
There was a problem hiding this comment.
Let's simplify this to "Copy and paste a link from your favorite website" (drop the word recipe from the original). More on this below.
| "scrape-recipe-description": "Scrape a recipe by url. Provide the url for the site you want to scrape, and Mealie will attempt to scrape the recipe from that site and add it to your collection.", | ||
| "scrape-recipe-description": "Scrape a recipe by url. Provide the url for the site or the video you want to scrape, and Mealie will attempt to scrape the recipe from that site and add it to your collection.", |
There was a problem hiding this comment.
Since these options are only available if transcriptions are enabled, can we separate this out? Something like:
"Scrape a recipe by url. Provide the url for the site you want to scrape, and Mealie will attempt to scrape the recipe from that site and add it to your collection."
(if transcriptions) "You can also provide the url to a video and Mealie will attempt to transcribe it into a recipe."
| "error-title": "Looks Like We Couldn't Find Anything", | ||
| "error-title-rate-limit": "Rate Limit Exceeded", | ||
| "error-details-rate-limit": "The AI service is currently rate-limited. Please wait a moment and try again.", | ||
| "error-title-server": "Something Went Wrong", |
There was a problem hiding this comment.
Can we use events.something-went-wrong instead?
| "error-title-rate-limit": "Rate Limit Exceeded", | ||
| "error-details-rate-limit": "The AI service is currently rate-limited. Please wait a moment and try again.", | ||
| "error-title-server": "Something Went Wrong", | ||
| "error-details-server": "An unexpected error occurred while processing your request. Please try again later.", |
There was a problem hiding this comment.
Can we switch this to "an-unexpected-error-occurred-request": "...same-text" and move it to general?
There was a problem hiding this comment.
Actually, if this is for server errors (500 errors) we can probably drop this entirely and just stick with "Something went wrong". We use this pattern elsewhere in the app
| video_fallback_enabled = self.settings.OPENAI_ENABLED and self.settings.OPENAI_ENABLE_TRANSCRIPTION_SERVICES | ||
|
|
||
| try: | ||
| return await self._create_recipe_from_web(req) | ||
| except HTTPException as e: | ||
| if e.status_code != 400: | ||
| raise | ||
| # If OpenAI transcription is not available so re-raises the original error | ||
| if not video_fallback_enabled: | ||
| raise | ||
|
|
||
| # Normal scraping failed so try parsing as a video URL | ||
| return await self._create_recipe_from_video_url(req.url, translate_language=translate_language) |
There was a problem hiding this comment.
We have multiple scraper strategies prioritized in mealie.services.scraper.recipe_scraper. Particularly the RecipeScraperOpenGraph strategy works on most websites, so waiting for an exception and falling back to video processing won't work (try a YT link, e.g. https://www.youtube.com/watch?v=Cyskqnp1j64).
Can we add some logic in the OpenAI scraper which does this? I imagine it goes something like:
- Attempt to download the video. If successful, process it like a video
- If that fails, process the HTML (the existing way)
- If that fails, assume OpenAI cannot process the recipe
Open to better suggestions than that, that's just my gut, but we definitely shouldn't rely on route-level exception handling to trigger the fallback.
There was a problem hiding this comment.
Alternatively we can create a new scraper strategy called OpenAIVideo or something, and that inherits from the existing OpenAI service, then just register that before the existing one. This is probably cleaner.
| temp_id = os.getpid() | ||
| output_template = f"/tmp/mealie_{temp_id}" # No extension here |
There was a problem hiding this comment.
Change this to use get_temporary_path (from mealie.core.dependencies.dependencies import get_temporary_path)
| for line in subtitle_content.split("\n"): | ||
| if line.strip() and not line.startswith("WEBVTT") and "-->" not in line and not line.isdigit(): | ||
| lines.append(line.strip()) |
There was a problem hiding this comment.
Is there a better way to parse this? I'm okay leaving this for a future PR if there's not a quick solution.
For instance, from my YT video, all the text is wrapped in XML:
<00:02:58.000><c> beef</c>
Which adds a lot of unneeded tokens/cost to the OpenAI request.
| <BaseButton | ||
| :disabled="recipeUrl === null" | ||
| rounded | ||
| block | ||
| type="submit" | ||
| :loading="loading" | ||
| /> | ||
| </div> |
There was a problem hiding this comment.
This is probably not in scope of this PR, but just wanted to comment on it. Right now we have a single loading state for all import strategies, and video processing takes waaaayyyy longer than other strategies, so users might start to think something's broken.
I don't think there's a quick solution to this (since the backend determines the strategy and doesn't communicate it until the very end), but something to keep in mind for a follow-up PR if you (or anyone) thinks of something.
What this PR does / why we need it:
This PR introduces the ability to import recipes directly from social media video URLs (Instagram Reels, TikTok, Facebook, and YouTube).
Currently, Mealie excels at importing from blogs, but many modern recipes exist primarily in video format where the instructions are spoken rather than written. This feature bridges that gap by using AI to transcribe and parse video content into structured recipe data.
Technical Implementation
I opted for a native implementation using
yt-dlpandffmpegrather than third-party scraping APIs (like Apify) to keep dependencies local and avoid vendor lock-in.Following some advice from michael-genson, the import from video URL is now in the same page as the classic web import, so the modified workflow is the following:
Moreover, this also works with the bulk importer.
The video url scrapper workflow operates as follows:
yt-dlpfetches the video title, description, and thumbnail.ffmpeginto a lightweight, mono-channel MP3 to minimize bandwidth..env).Docker Changes:
I added
ffmpegto the Dockerfile. This is a standard, lightweight tool required foryt-dlp's audio post-processing. It allows us to standardize audio input from various platforms and consumes zero system resources when idle.Here is a demo of the new import from video URL page:
Enregistrement.de.l.ecran.2025-12-22.a.13.53.05.mp4
Special notes for your reviewer:
It’s my biggest contribution to Mealie yet, and I’m not sure whether my code is structured perfectly.
Like, I don’t know if my functions are always in the best matching folders and files.
Testing
I have added unit tests using mock responses to verify the new API routes without hitting external services.
I also performed extensive manual testing of the full flow using:
Both providers successfully generated valid recipes, with Gemini showing slightly faster processing times during my tests.