Local AI troubleshooting

LOCAL_AI_STACK / troubleshooting(12)

v2.2 · 2026-06-12

DIAGNOSTIC_PAGES — start with visible error text and hardware fit before guessing at config changes

Ollama model will not load

This usually points to model name, download, memory limit, or runtime state. Start with the simplest visible checks before changing advanced settings.

Model name, download, memory, or runtime configuration may be wrong.

This usually points to model size, CPU fallback, context length, offload, or hardware limits. Start with the simplest visible checks before changing advanced settings.

Model size, GPU offload, context length, or CPU-only hardware may be the bottleneck.

Out-of-memory errors

This usually points to model weights, context window, runtime overhead, or background apps. Start with the simplest visible checks before changing advanced settings.

Model weights, context length, or runtime overhead exceed available memory.

Out-of-memory errors explained

This usually points to RAM, VRAM, unified memory, quantization, context, and runtime overhead. Start with the simplest visible checks before changing advanced settings.

RAM, VRAM, unified memory, quantization, context, and runtime overhead can all contribute.

Open WebUI cannot connect

This usually points to server address, Docker networking, provider settings, or service startup. Start with the simplest visible checks before changing advanced settings.

Ollama server address, Docker networking, or service startup may be misconfigured.

Open WebUI cannot connect to Ollama

This usually points to Ollama endpoint, container networking, host access, or service state. Start with the simplest visible checks before changing advanced settings.

Ollama endpoint, container networking, service startup, or host permissions may be misconfigured.

Local model is too slow

This usually points to model size, offload path, memory bandwidth, thermals, or context length. Start with the simplest visible checks before changing advanced settings.

Model size, CPU/offload path, context length, or hardware limits may be the bottleneck.

PDF chat not working

This usually points to document parsing, embedding provider, retrieval settings, or model selection. Start with the simplest visible checks before changing advanced settings.

Document ingestion, embeddings, retrieval settings, or model quality may be failing.

Local PDF chat gives bad answers

This usually points to bad extraction, weak retrieval, poor chunking, missing OCR, or hallucination. Start with the simplest visible checks before changing advanced settings.

Poor extraction, chunking, embeddings, retrieval, or source-grounding may be failing.

GPU not detected

This usually points to driver, backend, runtime build, or unsupported hardware. Start with the simplest visible checks before changing advanced settings.

Driver, backend, runtime build, or hardware support may not be configured.

Model download fails

This usually points to network, disk space, model name, provider availability, or permissions. Start with the simplest visible checks before changing advanced settings.

Network, storage, model name, or provider availability may be the issue.

Local AI privacy mistakes

This usually points to cloud providers, exposed local servers, sync, plugins, telemetry, or document storage. Start with the simplest visible checks before changing advanced settings.

A local chat app can still leak data through cloud models, sync, plugins, or remote embeddings.

TROUBLESHOOTING CONTEXT

How to triage local AI failures

Intro

Most local AI failures are not mysterious. They usually come from memory limits, model size, runtime configuration, Docker/networking, provider selection, or document-processing settings.

First triage

Symptom	Start here
Model will not load	Out-of-memory errors; model download fails
Model is slow	Local model is too slow; Ollama is slow
Open WebUI cannot see Ollama	Open WebUI cannot connect to Ollama
PDF answers are bad	Local PDF chat gives bad answers
Privacy concern	Local AI privacy mistakes

Support page standard

Each troubleshooting page should give the first three checks, the likely causes, fixes that do not hide uncertainty, and a stop/switch warning for sensitive data.