Local model is too slow

troubleshooting · ready for authoring

DIAGNOSTIC

This usually points to model size, offload path, memory bandwidth, thermals, or context length. Start with the simplest visible checks before changing advanced settings.

DIAGNOSTIC_OVERVIEW

This usually points to model size, offload path, memory bandwidth, thermals, or context length. Start with the simplest visible checks before changing advanced settings.

Stop troubleshooting and switch paths when the workflow involves confidential documents and you cannot explain where prompts, files, embeddings, logs, and provider calls go.

SYMPTOMS

The local AI workflow does not behave as expected.

Local AI Guide has not independently reproduced this diagnostic.

LIKELY_CAUSES

Model size, CPU/offload path, context length, or hardware limits may be the bottleneck.

FIRST_FIXES

Confirm the exact app, model, route, and error text.

Check whether the selected model is realistic for the machine’s RAM, VRAM, and context length.

Confirm whether the issue is a local runtime problem, a UI/provider problem, or a document/retrieval problem.

STOP_AND_SWITCH_WHEN

The workflow involves confidential data and the local/private boundary is unclear.

The hardware cannot realistically fit the selected model.

Stop troubleshooting and switch paths when the workflow involves confidential documents and you cannot explain where prompts, files, embeddings, logs, and provider calls go.

EVIDENCE

Official documentation reviewed with caveatsreviewed: 2026-05-24

Official documentation reviewed, with caveats

CAVEATS

Not a substitute for current official documentation.

Exact fix depends on runtime, OS, hardware, and model.

← Open WebUI cannot connect to Ollama PDF chat not working →