Ollama

Self-host open source AI models like DeepSeek-R1, Llama, and more

Ollama

Open App Store on your umbrelOS device to install this app

Screenshot 1 of Ollama app on Umbrel App Store

Screenshot 2 of Ollama app on Umbrel App Store

Screenshot 3 of Ollama app on Umbrel App Store

About this app

Ollama allows you to download and run advanced AI models directly on your own hardware. Self-hosting AI models ensures full control over your data and protects your privacy.

⚠️ Before running a model, make sure your device has enough free RAM to support it. Attempting to run a model that exceeds your available memory could cause your device to crash or become unresponsive. Always check the model requirements before downloading or starting it.

Getting Started: The easiest way to get started with Ollama is to install the Open WebUI app from the Umbrel App Store. Open WebUI will automatically connect to your Ollama setup, allowing you to manage model downloads and chat with your AI models effortlessly.

Advanced Setup: If you want to connect Ollama to other apps or devices, here's how:

Apps running on UmbrelOS: Use ollama_ollama_1 as the host and 11434 as the port when configuring other apps to connect to Ollama. For example, the API Base URL would be: http://ollama_ollama_1:11434.
Custom Integrations: Connect Ollama to third-party apps or your own code using your UmbrelOS local domain (e.g., http://umbrel.local:11434) or your device's IP address, which you can find in the UmbrelOS Settings page (e.g., http://192.168.4.74:11434).

What's new

Version 0.12.103 weeks ago

This release contains various improvements and bug fixes.

Key highlights in this release:

Ollama run now works with embedding models to generate vector embeddings from text
Fixed errors when running qwen3-vl:235b and qwen3-vl:235b-instruct
Fixed hanging issues due to CPU discovery
Improved performance for qwen3-vl models with flash attention support
Ollama will now stop running a model before removing it

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.12.7last month

New models available:

Qwen3-VL: now available in all parameter sizes ranging from 2B to 235B
MiniMax-M2: a 230 Billion parameter model built for coding and agentic workflows

Key highlights in this release:

Fixed embedding results being incorrect when running embeddinggemma
Fixed truncation error when generating embeddings
Increased speed when scheduling models

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.12.6last month

Key highlights in this release:

Flash attention is now enabled by default for Gemma 3, improving performance and memory utilization
Fixed issue where Ollama would hang while generating responses
Fixed issue where qwen3-coder would act in raw mode when using /api/generate or ollama run qwen3-coder <prompt>
Fixed qwen3-embedding providing invalid results
Ollama will now evict models correctly when num_gpu is set

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.12.5last month

This release includes new features and improvements.

Key highlights in this release:

Thinking models now support structured outputs when using the /api/chat API
Fixed output issues for some models
Improved parsing of tool calls

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.12.32 months ago

This release includes new features and improvements.

Key highlights in this release:

New models: DeepSeek-V3.1-Terminus and Kimi-K2-Instruct-0905
Web search API with a free tier for individuals
Improved parsing of tool calls
Fixed issues with model loading and output

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.12.22 months ago

This release includes new features and improvements.

Key highlights in this release:

New web search API with a free tier for individuals
Cloud models now available in preview, allowing you to run larger models with fast, datacenter-grade hardware
Improved memory usage and estimates for various model types
Added 'dimensions' field to embed requests

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.11.112 months ago

This release includes new features and improvements.

Key highlights in this release:

Improved memory estimates for hybrid and recurrent models
Added 'dimensions' field to embed requests
Support for EmbeddingGemma, a new open embedding model that delivers best-in-class performance for its size

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.11.102 months ago

This release includes new features and improvements.

Key highlights in this release:

Support for EmbeddingGemma, a new open embedding model
Improved performance via overlapping GPU and CPU computations
Fixed issues with unrecognized AMD GPUs
Reduced crashes due to unhandled errors on some Mac and Linux installations

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.11.83 months ago

This release includes new features and improvements.

Key highlights in this release:

Support for DeepSeek-V3.1
Fixed model loading issues on CPU-only systems
Improved handling of models without initial <think> tags
Fixed unwanted text output when <think> tag is missing
Fixed parsing of tool calls with curly braces
gpt-oss now has flash attention enabled by default for systems that support it
Improved load times for gpt-oss

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.11.63 months ago

This release includes new features and improvements.

Key highlights in this release:

Improved performance when using flash attention
Fixed boundary case when encoding text using BPE

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.11.43 months ago

This release includes new features and improvements.

Key highlights in this release:

Improved handling of messages with both content and tool calls
Enhanced OpenAI API compatibility for tool calls
Better memory management and performance optimizations

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.11.34 months ago

This release includes new features and improvements.

Key highlights in this release:

Fixed issue where gpt-oss would consume too much VRAM when split across GPU & CPU or multiple GPUs
Statically linked C++ libraries on Windows for better compatibility
Fixed crash in gpt-oss when using kv cache quantization
Welcome OpenAI's gpt-oss models to Ollama

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.10.14 months ago

This release includes new features and improvements.

Key highlights in this release:

Fixed unicode character input for Japanese and other languages
Improved performance in Gemma 3n models
Parallel request processing now defaults to 1
Fixed issues with tool calling for certain models

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.9.64 months ago

This release includes new features and improvements.

Key highlights in this release:

Fixed styling issue in launch screen
Improved handling of tool messages in chat API
The directory in which models are stored can now be modified

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.9.55 months ago

This release includes new features and improvements.

Key improvements in this release:

The directory in which models are stored can now be modified.

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.9.35 months ago

This release includes important bug fixes and stability improvements.

Key improvements in this release:

Ollama will now limit context length to what the model was trained against to avoid strange overflow behavior

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.9.25 months ago

This release includes several bug fixes and improvements to tool calling functionality.

Key improvements in this release:

Fixed issue where tool calls without parameters would not be returned correctly
Fixed errors that previously showed "does not support generate"
Fixed issue where some special tokens would not be tokenized properly for certain model architectures

Tool calling support has been added for new models:

DeepSeek-R1-2508 (671B model)
Magistral

Tool calling reliability has also been improved for existing models including Llama 4 and Mistral.

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.9.15 months ago

Tool calling reliability and performance has been improved for several models including Magistral, Llama 4, Mistral, and DeepSeek-R1-2508.

Key highlights in this release:

Magistral now supports disabling thinking mode
More informative error messages
Improved tool calling reliability
Fixed issue on Windows where ollama run would not start automatically
New models: DeepSeek-R1-2508 with improved reasoning capabilities

Ollama now has the ability to enable or disable thinking mode, giving users flexibility to choose the model's thinking behavior for different applications. When thinking is enabled, the output separates the model's thoughts from the actual response. Models that support thinking include DeepSeek R1 and Qwen 3.

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.9.06 months ago

Ollama now supports thinking mode for models that support it, such as DeepSeek-R1 and Qwen3.

Key highlights in this release:

Added support for thinking mode, displaying the model's thoughts during processing
Introduced new models: DeepSeek-R1-0528 and Qwen3
Improved streaming of responses with tool calls
Enhanced memory estimation and logging for better debugging

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.7.16 months ago

Key highlights in this release:

Improved model memory management to prevent crashes when running multimodal models
Enhanced memory estimation for models
Fixed crashes related to specific models on certain hardware
Added support for Alibaba's Qwen 3 and Qwen 2 architectures in Ollama's multimodal engine

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.7.06 months ago

Ollama now supports multimodal models via Ollama's new engine, starting with new vision multimodal models like Meta Llama 4, Google Gemma 3, Qwen 2.5 VL, Mistral Small 3.1, and more.

Key highlights in this release:

Added support for WebP images as input to multimodal models
Improved performance of importing safetensors models
Improved API responses for unsupported methods

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.6.87 months ago

Highlights:

Performance improvements for Qwen 3 MoE models on NVIDIA and AMD GPUs
Fixed issues with conflicting installations and memory leaks
Improved labeling for older vision models
Reduced out of memory errors
Fixed context canceled error

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.6.77 months ago

Highlights:

Added support for new models: Qwen 3, Phi 4 reasoning, Phi-4-mini-reasoning, and Llama 4
Increased default context window to 4096 tokens
Improved output quality when using JSON mode in certain scenarios
Fixed various issues related to model stopping, image path recognition, and tensor operations

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.6.67 months ago

Highlights:

New, faster model downloading with improved performance and reliability
Fixed memory leak issues for various models
Improved performance when importing models from Safetensors
Enhanced tool function parameter handling
Fixed out of memory issues and model unload order

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.6.57 months ago

Highlights:

Support for Mistral Small 3.1, the best performing vision model in its weight class
Improved model loading times for Gemma 3 on network-backed filesystems

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.6.48 months ago

Highlights:

/api/show now includes model capabilities like vision
Fixed out-of-memory errors with parallel requests on Gemma 3
Improved Gemma 3's handling of multilingual characters
Fixed context shifting issues in DeepSeek models
Resolved Gemma 3 output degradation after 512/1024 tokens in 0.6.3

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.6.38 months ago

Highlights:

New sliding window attention optimizations for Gemma 3, improving inference speed and memory allocation for long context windows
Improved loading speed of Gemma 3
Fixed various errors when running models

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.6.28 months ago

Highlights:

New Model, Command A: 111 billion parameter model for enterprise use
Multiple images are now supported in Gemma 3
Fixed issue where running Gemma 3 would consume a large amount of system memory
Fixed issue where '/save' would not work if running a model with '/' in the name

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.6.08 months ago

Highlights:

New Google Gemma 3 model available in 1B, 4B, 12B, and 27B parameter sizes
Fixed issues with snowflake-arctic-embed and snowflake-arctic-embed2 models

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.5.138 months ago

Highlights:

New models: Phi-4-Mini, Granite-3.2-Vision, and Command R7B Arabic
Default context length can now be set with OLLAMA_CONTEXT_LENGTH environment variable
Ollama is now compiled for NVIDIA Blackwell
Fixed issue where bf16 GGUF files could not be imported
Ollama now accepts requests from Visual Studio Code and Cursor

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.5.129 months ago

Highlights:

Perplexity R1 1776: A version of the DeepSeek-R1 model that has been post trained to remove its refusal to respond to some sensitive topics.
The OpenAI-compatible API will now return tool_calls if the model called a tool
Performance on certain Intel Xeon processors should now be restored
Fixed permission denied issues after installing Ollama on Linux
The progress bar will no longer flicker when running ollama pull
Fixed issue where running a model would fail on Linux if Ollama was installed in a path with UTF-8 characters

Full release notes are available at https://github.com/ollama/ollama/releases

Version 0.5.119 months ago

Highlights:

Ollama will now use AVX-512 instructions where available for additional CPU acceleration
Fixed indexing error that would occur when downloading a model with ollama run or ollama pull
Fixes cases where download progress would reverse
DeepScaleR: A fine-tuned version of Deepseek-R1-Distilled-Qwen-1.5B that surpasses the performance of OpenAI's o1-preview with just 1.5B parameters on popular math evaluations.
OpenThinker: A fully open-source family of reasoning models built using a dataset derived by distilling DeepSeek-R1.
Fixed The system cannot find the path specified errors when running models in some cases on Windows

Full release notes are available at https://github.com/ollama/ollama/releases

Information

Version0.12.10
CategoryAI
Source codePublic
Developed by
Ollama 1 app
Submitted by
al-lac 14 apps
Compatible withumbrelOS 0.5 or later

Ollama

Self-host open source AI models like DeepSeek-R1, Llama, and more

Ollama

More apps in AI