Ollama
Self-host open source AI models like DeepSeek-R1, Llama, and more
Ollama



Ollama allows you to download and run advanced AI models directly on your own hardware. Self-hosting AI models ensures full control over your data and protects your privacy.
⚠️ Before running a model, make sure your device has enough free RAM to support it. Attempting to run a model that exceeds your available memory could cause your device to crash or become unresponsive. Always check the model requirements before downloading or starting it.
Getting Started: The easiest way to get started with Ollama is to install the Open WebUI app from the Umbrel App Store. Open WebUI will automatically connect to your Ollama setup, allowing you to manage model downloads and chat with your AI models effortlessly.
Advanced Setup: If you want to connect Ollama to other apps or devices, here's how:
- Apps running on UmbrelOS: Use ollama_ollama_1 as the host and 11434 as the port when configuring other apps to connect to Ollama. For example, the API Base URL would be: http://ollama_ollama_1:11434.
- Custom Integrations: Connect Ollama to third-party apps or your own code using your UmbrelOS local domain (e.g., http://umbrel.local:11434) or your device's IP address, which you can find in the UmbrelOS Settings page (e.g., http://192.168.4.74:11434).
This release contains various improvements and bug fixes.
Key highlights in this release:
- Ollama run now works with embedding models to generate vector embeddings from text
- Fixed errors when running qwen3-vl:235b and qwen3-vl:235b-instruct
- Fixed hanging issues due to CPU discovery
- Improved performance for qwen3-vl models with flash attention support
- Ollama will now stop running a model before removing it
Full release notes are available at https://github.com/ollama/ollama/releases
New models available:
- Qwen3-VL: now available in all parameter sizes ranging from 2B to 235B
- MiniMax-M2: a 230 Billion parameter model built for coding and agentic workflows
Key highlights in this release:
- Fixed embedding results being incorrect when running embeddinggemma
- Fixed truncation error when generating embeddings
- Increased speed when scheduling models
Full release notes are available at https://github.com/ollama/ollama/releases
Key highlights in this release:
- Flash attention is now enabled by default for Gemma 3, improving performance and memory utilization
- Fixed issue where Ollama would hang while generating responses
- Fixed issue where qwen3-coder would act in raw mode when using /api/generate or ollama run qwen3-coder <prompt>
- Fixed qwen3-embedding providing invalid results
- Ollama will now evict models correctly when num_gpu is set
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key highlights in this release:
- Thinking models now support structured outputs when using the /api/chat API
- Fixed output issues for some models
- Improved parsing of tool calls
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key highlights in this release:
- New models: DeepSeek-V3.1-Terminus and Kimi-K2-Instruct-0905
- Web search API with a free tier for individuals
- Improved parsing of tool calls
- Fixed issues with model loading and output
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key highlights in this release:
- New web search API with a free tier for individuals
- Cloud models now available in preview, allowing you to run larger models with fast, datacenter-grade hardware
- Improved memory usage and estimates for various model types
- Added 'dimensions' field to embed requests
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key highlights in this release:
- Improved memory estimates for hybrid and recurrent models
- Added 'dimensions' field to embed requests
- Support for EmbeddingGemma, a new open embedding model that delivers best-in-class performance for its size
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key highlights in this release:
- Support for EmbeddingGemma, a new open embedding model
- Improved performance via overlapping GPU and CPU computations
- Fixed issues with unrecognized AMD GPUs
- Reduced crashes due to unhandled errors on some Mac and Linux installations
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key highlights in this release:
- Support for DeepSeek-V3.1
- Fixed model loading issues on CPU-only systems
- Improved handling of models without initial <think> tags
- Fixed unwanted text output when <think> tag is missing
- Fixed parsing of tool calls with curly braces
gpt-ossnow has flash attention enabled by default for systems that support it- Improved load times for
gpt-oss
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key highlights in this release:
- Improved performance when using flash attention
- Fixed boundary case when encoding text using BPE
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key highlights in this release:
- Improved handling of messages with both content and tool calls
- Enhanced OpenAI API compatibility for tool calls
- Better memory management and performance optimizations
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key highlights in this release:
- Fixed issue where gpt-oss would consume too much VRAM when split across GPU & CPU or multiple GPUs
- Statically linked C++ libraries on Windows for better compatibility
- Fixed crash in gpt-oss when using kv cache quantization
- Welcome OpenAI's gpt-oss models to Ollama
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key highlights in this release:
- Fixed unicode character input for Japanese and other languages
- Improved performance in Gemma 3n models
- Parallel request processing now defaults to 1
- Fixed issues with tool calling for certain models
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key highlights in this release:
- Fixed styling issue in launch screen
- Improved handling of tool messages in chat API
- The directory in which models are stored can now be modified
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes new features and improvements.
Key improvements in this release:
- The directory in which models are stored can now be modified.
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes important bug fixes and stability improvements.
Key improvements in this release:
- Ollama will now limit context length to what the model was trained against to avoid strange overflow behavior
Full release notes are available at https://github.com/ollama/ollama/releases
This release includes several bug fixes and improvements to tool calling functionality.
Key improvements in this release:
- Fixed issue where tool calls without parameters would not be returned correctly
- Fixed errors that previously showed "does not support generate"
- Fixed issue where some special tokens would not be tokenized properly for certain model architectures
Tool calling support has been added for new models:
- DeepSeek-R1-2508 (671B model)
- Magistral
Tool calling reliability has also been improved for existing models including Llama 4 and Mistral.
Full release notes are available at https://github.com/ollama/ollama/releases
Tool calling reliability and performance has been improved for several models including Magistral, Llama 4, Mistral, and DeepSeek-R1-2508.
Key highlights in this release:
- Magistral now supports disabling thinking mode
- More informative error messages
- Improved tool calling reliability
- Fixed issue on Windows where ollama run would not start automatically
- New models: DeepSeek-R1-2508 with improved reasoning capabilities
Ollama now has the ability to enable or disable thinking mode, giving users flexibility to choose the model's thinking behavior for different applications. When thinking is enabled, the output separates the model's thoughts from the actual response. Models that support thinking include DeepSeek R1 and Qwen 3.
Full release notes are available at https://github.com/ollama/ollama/releases
Ollama now supports thinking mode for models that support it, such as DeepSeek-R1 and Qwen3.
Key highlights in this release:
- Added support for thinking mode, displaying the model's thoughts during processing
- Introduced new models: DeepSeek-R1-0528 and Qwen3
- Improved streaming of responses with tool calls
- Enhanced memory estimation and logging for better debugging
Full release notes are available at https://github.com/ollama/ollama/releases
Key highlights in this release:
- Improved model memory management to prevent crashes when running multimodal models
- Enhanced memory estimation for models
- Fixed crashes related to specific models on certain hardware
- Added support for Alibaba's Qwen 3 and Qwen 2 architectures in Ollama's multimodal engine
Full release notes are available at https://github.com/ollama/ollama/releases
Ollama now supports multimodal models via Ollama's new engine, starting with new vision multimodal models like Meta Llama 4, Google Gemma 3, Qwen 2.5 VL, Mistral Small 3.1, and more.
Key highlights in this release:
- Added support for WebP images as input to multimodal models
- Improved performance of importing safetensors models
- Improved API responses for unsupported methods
Full release notes are available at https://github.com/ollama/ollama/releases
Highlights:
- Performance improvements for Qwen 3 MoE models on NVIDIA and AMD GPUs
- Fixed issues with conflicting installations and memory leaks
- Improved labeling for older vision models
- Reduced out of memory errors
- Fixed context canceled error
Full release notes are available at https://github.com/ollama/ollama/releases
Highlights:
- Added support for new models: Qwen 3, Phi 4 reasoning, Phi-4-mini-reasoning, and Llama 4
- Increased default context window to 4096 tokens
- Improved output quality when using JSON mode in certain scenarios
- Fixed various issues related to model stopping, image path recognition, and tensor operations
Full release notes are available at https://github.com/ollama/ollama/releases
Highlights:
- New, faster model downloading with improved performance and reliability
- Fixed memory leak issues for various models
- Improved performance when importing models from Safetensors
- Enhanced tool function parameter handling
- Fixed out of memory issues and model unload order
Full release notes are available at https://github.com/ollama/ollama/releases
Highlights:
- Support for Mistral Small 3.1, the best performing vision model in its weight class
- Improved model loading times for Gemma 3 on network-backed filesystems
Full release notes are available at https://github.com/ollama/ollama/releases
Highlights:
- /api/show now includes model capabilities like vision
- Fixed out-of-memory errors with parallel requests on Gemma 3
- Improved Gemma 3's handling of multilingual characters
- Fixed context shifting issues in DeepSeek models
- Resolved Gemma 3 output degradation after 512/1024 tokens in 0.6.3
Full release notes are available at https://github.com/ollama/ollama/releases
Highlights:
- New sliding window attention optimizations for Gemma 3, improving inference speed and memory allocation for long context windows
- Improved loading speed of Gemma 3
- Fixed various errors when running models
Full release notes are available at https://github.com/ollama/ollama/releases
Highlights:
- New Model, Command A: 111 billion parameter model for enterprise use
- Multiple images are now supported in Gemma 3
- Fixed issue where running Gemma 3 would consume a large amount of system memory
- Fixed issue where '/save' would not work if running a model with '/' in the name
Full release notes are available at https://github.com/ollama/ollama/releases
Highlights:
- New Google Gemma 3 model available in 1B, 4B, 12B, and 27B parameter sizes
- Fixed issues with snowflake-arctic-embed and snowflake-arctic-embed2 models
Full release notes are available at https://github.com/ollama/ollama/releases
Highlights:
- New models: Phi-4-Mini, Granite-3.2-Vision, and Command R7B Arabic
- Default context length can now be set with OLLAMA_CONTEXT_LENGTH environment variable
- Ollama is now compiled for NVIDIA Blackwell
- Fixed issue where bf16 GGUF files could not be imported
- Ollama now accepts requests from Visual Studio Code and Cursor
Full release notes are available at https://github.com/ollama/ollama/releases
Highlights:
- Perplexity R1 1776: A version of the DeepSeek-R1 model that has been post trained to remove its refusal to respond to some sensitive topics.
- The OpenAI-compatible API will now return
tool_callsif the model called a tool - Performance on certain Intel Xeon processors should now be restored
- Fixed permission denied issues after installing Ollama on Linux
- The progress bar will no longer flicker when running
ollama pull - Fixed issue where running a model would fail on Linux if Ollama was installed in a path with UTF-8 characters
Full release notes are available at https://github.com/ollama/ollama/releases
Highlights:
- Ollama will now use AVX-512 instructions where available for additional CPU acceleration
- Fixed indexing error that would occur when downloading a model with ollama run or ollama pull
- Fixes cases where download progress would reverse
- DeepScaleR: A fine-tuned version of Deepseek-R1-Distilled-Qwen-1.5B that surpasses the performance of OpenAI's o1-preview with just 1.5B parameters on popular math evaluations.
- OpenThinker: A fully open-source family of reasoning models built using a dataset derived by distilling DeepSeek-R1.
- Fixed The system cannot find the path specified errors when running models in some cases on Windows
Full release notes are available at https://github.com/ollama/ollama/releases
