LocalAI

Drop-in OpenAI replacement

Ettore Di Giacinto

Open App Store on your umbrelOS device to install this app

Screenshot 1 of LocalAI app on Umbrel App Store

Screenshot 2 of LocalAI app on Umbrel App Store

Screenshot 3 of LocalAI app on Umbrel App Store

Screenshot 4 of LocalAI app on Umbrel App Store

About this app

LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI API specifications for local inferencing.

It allows you to run LLMs, generate images, audio locally with consumer grade hardware, supporting multiple model families and architectures.

⚠️ Note Before running a model, make sure your device has enough free RAM to support it. Attempting to run a model that exceeds your available memory could cause your device to crash or become unresponsive. Always check the model requirements before downloading or starting it.

What's new

Version v3.8.0today

This release focuses on enhancing the user experience and providing more control without requiring restarts or complex configurations. It introduces a new onboarding flow and a universal model loader.

Key highlights include:

Universal Model Import - Easily import models from Hugging Face, Ollama, OCI, or local paths, with automatic backend and chat template detection
UI & Index Overhaul - Enjoy a new onboarding wizard, automatic model selection on startup, and a cleaner interface for managing models
MCP Live Streaming - Agent actions and tool calls are now streamed live, allowing you to see reasoning in real-time
Hot-Reloadable Settings - Modify watchdogs, API keys, and peer-to-peer settings instantly without restarting the app
Chat enhancements - Chat history and parallel conversations are now saved in your browser
Improved OpenAI API Compatibility - Streaming responses now strictly match OpenAI specifications, resolving issues with various clients
Advanced Configuration - Fine-tune settings like context shifting, RAM limits for the KV cache, and parallel workers for optimized performance
Logprobs & Logitbias Support - Added token-level probability support for advanced agent workflows

Additional fixes and improvements:

Model Preview - See filename and size before downloading models
Tool Handling - Fixed issues related to missing or malformed tool content
TTS - Corrected dropdown selection states for text-to-speech models
True Cancellation - Stopping a stream now immediately halts generation and frees resources

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.7.03 weeks ago

This release introduces powerful new features and improvements.

Key highlights include:

🤖 Agentic MCP Support with full WebUI integration - Build AI agents that use real tools like web search and code execution, fully OpenAI-compatible
🎙️ New neutts TTS Backend - Generate natural, high-quality speech with low-latency audio powered by Neuphonic
🖼️ WebUI enhancements - Faster, cleaner UI with real-time updates and full YAML model control
💬 Long-Text TTS Chunking for Chatterbox - Generate natural-sounding long-form audio by intelligently splitting text
🧩 Advanced Agent Controls - Fine-tune agent behavior with new options for retries, reasoning, and re-evaluation
📸 New Video Creation Endpoint - OpenAI-compatible /v1/videos endpoint for text-to-video generation
🔍 Fuzzy Gallery Search - Find models even with typos
🧠 Qwen 3 VL Support - Support for Qwen 3 VL with llama.cpp/gguf models
Enhanced Whisper compatibility - Now supported on various CPU variants to prevent crashes
Multiple bug fixes for improved stability and OpenAI API compliance

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.6.0last month

This release includes several improvements and new features:

Added support for multilingual capabilities in Chatterbox
Improved reranking models for the llama.cpp backend
Added support for L4T devices in Kokoro
Introduced new models to the gallery, including Qwen image edit and IBM Granite variants
Enhanced chat completion and function calling
Improved backend runtime capability detection

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.5.42 months ago

This release includes several improvements and bug fixes:

Improved process management and shutdown procedures
Enhanced chat completion and function calling
Added support for new models in the gallery
Improved backend runtime capability detection
Enhanced support for various GPU architectures

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.5.12 months ago

This release includes several improvements and bug fixes:

Improved process management and shutdown procedures
Enhanced P2P worker functionality
Refined chat completion and function calling
Added support for new models in the gallery
Improved backend runtime capability detection
Enhanced support for various GPU architectures

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.5.02 months ago

This release includes several improvements and new features:

New backend support for MLX, including audio and visual language models
Added WAN support for video generation
New CPU and MPS version of the diffusers backend for image generation without a GPU
WebUI now allows downloading model configurations and manual model refresh
Added stop button for running backends in WebUI
Models can now be imported and edited via the WebUI
Improved Whisper backend with integrated Voice Activity Detection
New LocalAI Launcher App (Alpha) for easier installation and management
Fixed various issues related to AMD graphics cards, macOS compatibility, and CUDA detection
Enhanced support for macOS, including whisper, diffusers, llama.cpp, MLX, and stable-diffusion.cpp

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.4.03 months ago

This release includes several improvements and new features:

WebUI improvements: now size can be set during image generation
New backends: KittenTTS, kokoro, and dia are now available as backends and models can be installed directly from the gallery
Support for reasoning effort in the OpenAI chat completion
Diffusers backend now available for l4t images and devices
Backends can now be sideloaded from the system

⚠️ Note: New backends need to be warmed up during the first call to download the model files.

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.3.23 months ago

This release includes several improvements and new features:

Support for installing backends from local paths
Improved responsiveness of documentation tables
Updates to various dependencies and components

⚠️ Breaking change: Intel GPU images have been renamed. If you're using Intel GPU images, please update your configurations accordingly.

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.3.14 months ago

This release includes several improvements and new features:

Support for Flux Kontext and Flux krea for image editing
Improved backend download reliability
Bug fixes for Intel GPU images and container images

⚠️ Breaking change: Intel GPU images have been renamed. Please check the full release notes for details.

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.3.04 months ago

This release introduces object detection capabilities and includes several improvements:

New API for fast object detection (install the 'rfdetr-base' model to use)
Improved backend download reliability with defined mirrors
Various bug fixes across container images, backends, and installation scripts

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.2.34 months ago

This release introduces a new modular architecture, making LocalAI lighter and more flexible:

Backends are now separate from the core binary, allowing for easier updates and customization
Automatic backend installation based on your hardware (CPU, NVIDIA, AMD, Intel)
New CLI commands for managing backends
Enhanced realtime audio APIs with speech started/stopped events
Intel GPU acceleration for Whisper transcriptions
Over 50 new models added to the gallery

⚠️ Note: After upgrading, you may need to manually install required backends for existing models from the WebUI or CLI.

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.1.15 months ago

This release includes several improvements and new features:

Automatically install missing backends along with models
Support for Gemma 3n models (text generation only)
Meta packages in backend galleries for easier GPU-specific setup
Improved backend gallery with descriptions and re-ordering
Various bug fixes and dependency updates

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v3.0.05 months ago

This new release of LocalAI comes with major improvements and new features.

Key highlights include:

Backend Gallery: Install and remove backends on the fly with API-driven customization
Audio Support: Upload audio, PDFs, or text files directly in the UI
Realtime API: WebSocket support compatible with OpenAI clients for chat apps and agents
Reasoning UI: Visual thinking indicators for smart models during inference
Dynamic VRAM Handling: Smarter GPU usage with automatic layer offloading
50+ New Models: Huge model gallery update including Qwen3, reasoning models, and multimodal LLMs
Experimental Video Generation: New video generation endpoint

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v2.29.06 months ago

New features and improvements include:

Support for Qwen3 model family
Experimental video generation endpoint
Default images are now slimmer without extra Python libraries
FFmpeg is now included in all images

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Version v2.28.07 months ago

This release brings exciting updates to LocalAI:

New LocalAI logo and branding
SYCL support added for stablediffusion.cpp
WebUI enhancements for improved user experience
Support for the Lumina model family for image generation
Bug fixes, including issues related to LOCALAI_SINGLE_ACTIVE_BACKEND

Full release notes can be found at https://github.com/mudler/LocalAI/releases

Information

Versionv3.8.0
CategoryAI
Source codePublic
Developed by
Ettore Di Giacinto 1 app
Submitted by
highghlow 6 apps
Compatible withumbrelOS 0.5 or later

LocalAI

Drop-in OpenAI replacement

Ettore Di Giacinto

More apps in AI