Verdict
Conservative estimate, not a benchmark
Evidence label: Conservative estimate, not a benchmark. Sources were reviewed on 2026-05-24. Local AI Guide test status: Not independently tested by Local AI Guide. This page does not contain local benchmark, install, privacy-audit, network-monitoring, storage-inspection, or screenshot evidence. Hardware/calculator framing: Conservative estimate, not a benchmark. Actual results depend on model, quantization, context length, runtime, GPU offload, drivers, thermals, and other running apps.
Quick answer: The best local AI setup for Windows depends mostly on your GPU, dedicated VRAM, system RAM, and comfort with tools like Docker or PowerShell. For most beginners, LM Studio is the easiest GUI-first path. Ollama is better if you want a lightweight runtime, local API, or Open WebUI later. If you have an NVIDIA GPU with dedicated VRAM, you have the cleanest Windows path. If you have AMD, integrated graphics, or CPU-only hardware, start smaller and check support carefully before downloading large models.
Beginner recommendation
Use this decision box first.
| Your Windows setup | Best first path | First model target | Evidence label |
|---|---|---|---|
| You want the easiest app | LM Studio | Small or 7B/8B-class model depending on RAM/VRAM | Conservative estimate, not a benchmark |
| You want API, scripts, or Open WebUI later | Ollama | Small or 7B/8B-class model depending on RAM/VRAM | Conservative estimate, not a benchmark |
| You have NVIDIA 6-8GB VRAM | LM Studio or Ollama | 7B/8B Q4/Q5 | Conservative estimate, not a benchmark |
| You have NVIDIA 12-16GB VRAM | LM Studio or Ollama | 14B Q4/Q5 | Conservative estimate, not a benchmark |
| You have NVIDIA 24GB VRAM | LM Studio or Ollama | 32B Q4/Q5 | Conservative estimate, not a benchmark |
| You only have integrated graphics | LM Studio for GUI, or smaller llama.cpp/Ollama path | 3B, maybe 7B/8B Q4 on stronger systems | Conservative estimate, not a benchmark |
| You have AMD on Windows | Only if your exact card/path is supported | Start one size class lower | Official documentation reviewed, with caveats |
| You want Open WebUI | Get Ollama working first, then add Open WebUI | Depends on underlying runtime | Conservative estimate, not a benchmark |
The main Windows rule is simple: dedicated VRAM matters more than the big shared-memory number Windows may show. Shared GPU memory is not the same as dedicated graphics memory, and it can be much slower for local AI workloads.
Best Windows setup by hardware class
| Windows machine | Best first stack | First model target | What to avoid first | Evidence label |
|---|---|---|---|---|
| CPU-only laptop, 8GB RAM | LM Studio only for small tests, or lighter CLI tooling | 3B Q4/Q5 | 7B/8B as a comfort claim, 14B+, PDF-heavy workflows | Conservative estimate, not a benchmark |
| CPU/iGPU laptop, 16GB RAM | LM Studio for GUI, Ollama for runtime | 3B to 7B/8B Q4 | Large context, 14B+ as default | Conservative estimate, not a benchmark |
| Windows laptop/mini PC, iGPU only, 32GB RAM | LM Studio or lower-level tooling | 7B/8B Q4; cautious 14B experiments | 32B+, heavy agent stacks | Conservative estimate, not a benchmark |
| NVIDIA 6GB VRAM | LM Studio or Ollama | 7B/8B Q4 | 14B dense models | Conservative estimate, not a benchmark |
| NVIDIA 8GB VRAM | LM Studio or Ollama | 7B/8B Q4/Q5 | 32B dense models | Conservative estimate, not a benchmark |
| NVIDIA 12GB VRAM | LM Studio or Ollama | 14B Q4/Q5 | Full 32B dense models | Conservative estimate, not a benchmark |
| NVIDIA 16GB VRAM | LM Studio or Ollama | 14B through higher quants; cautious 32B hybrid experiments | 70B dense models | Conservative estimate, not a benchmark |
| NVIDIA 24GB VRAM | LM Studio or Ollama | 32B Q4/Q5 | Full 70B Q4/Q5 dense models | Conservative estimate, not a benchmark |
| AMD Windows GPU | Only if exact support is confirmed | One size class below matching VRAM tier | Assuming NVIDIA-like simplicity | Official documentation reviewed, with caveats |
| Older x64 CPU without AVX2 | Avoid assuming LM Studio x64 support | Very limited alternatives only | Standard beginner local AI path | Official documentation reviewed, with caveats |
The point of this table is not to tell you what can be forced to launch. It is to tell you what is a sane first setup for a normal user.
NVIDIA, AMD, integrated graphics, and CPU-only Windows
NVIDIA Windows PCs
NVIDIA is the cleanest beginner path on Windows because CUDA support is mature across the local AI ecosystem. If you have a supported NVIDIA GPU with dedicated VRAM, you can usually choose either LM Studio or Ollama and focus on model size rather than backend complexity.
A practical ladder looks like this:
| Dedicated VRAM | Beginner model class | Notes |
|---|---|---|
| 6GB | 7B/8B Q4 | Entry discrete-GPU tier. Keep context modest. |
| 8GB | 7B/8B Q4/Q5 | Good beginner local text tier. |
| 12GB | 14B Q4/Q5 | Better assistant quality, still context-sensitive. |
| 16GB | 14B comfortably, selected larger experiments | Good hobbyist tier. |
| 24GB | 32B Q4/Q5 | Serious local model tier. |
AMD Windows PCs
AMD on Windows can work, but it is not the safest default recommendation for a beginner article unless the exact card and software path are confirmed. Support has improved, but it remains more configuration-sensitive than NVIDIA or Apple Silicon.
Use this conservative rule: if you have AMD on Windows, start one model size class lower than your raw VRAM suggests until you have confirmed support and performance.
Integrated graphics
Integrated graphics can run some local AI workloads, especially with enough system RAM and the right backend. But it should be treated as a low-to-mid expectation path.
Good first uses:
- Small local models.
- Short prompts.
- Learning how local AI works.
- Offline privacy experiments.
Avoid first:
- 32B models.
- 70B models.
- Large PDF collections.
- Long-context workflows.
- Vision or multimodal workflows as a beginner default.
CPU-only Windows
CPU-only local AI is real, but it has sharp limits. It is best for small models, short prompts, offline privacy, and learning. It is not the right way to sell someone on a satisfying 32B or 70B local AI setup.
If your first run is slow, the app may not be broken. Your machine may simply be running the model on the CPU or spilling into slower shared memory.
Which Windows path fits your machine?
Use this decision tree.
| If this describes you… | Choose this path |
|---|---|
| “I do not want to use terminal commands.” | Start with LM Studio. |
| “I want a local runtime or API.” | Start with Ollama. |
| “I want Open WebUI.” | Install and verify Ollama first, then install Open WebUI. |
| “I have NVIDIA VRAM.” | Use LM Studio or Ollama and choose model size by VRAM. |
| “I have AMD on Windows.” | Check exact support before relying on GPU acceleration. |
| “I only have integrated graphics.” | Start with small models and modest expectations. |
| “I have 8GB RAM total.” | Use small models only. |
| “I have 16GB RAM total.” | Try 7B/8B Q4 only if the rest of the machine is suitable. |
| “I need local PDF chat.” | Start with LM Studio or Open WebUI after your runtime works. |
| “I am using sensitive documents.” | Read the privacy guide before uploading anything. |
LM Studio vs Ollama on Windows
| Choose LM Studio on Windows if… | Choose Ollama on Windows if… |
|---|---|
| You want the simplest desktop app. | You want a local runtime/API. |
| You want to search and download models inside the app. | You want to connect Open WebUI later. |
| You prefer a GUI-first experience. | You are comfortable with PowerShell or terminal commands. |
| You want an easier first local chat experience. | You want a modular stack for other apps. |
| You may want document chat in the same app. | You want to use the model from scripts or developer tools. |
| You want to avoid Docker at first. | You plan to build a self-hosted browser UI later. |
Best beginner default: Start with LM Studio if you want a Windows app that feels like an app.
Best stack-builder default: Start with Ollama if you want a runtime that other tools can use.
Best Open WebUI path: Do not start with Open WebUI. First confirm Ollama works locally. Then add Open WebUI.
Do you need Docker or WSL?
For basic local AI on Windows, usually no.
| Goal | Docker needed? | WSL needed? | Notes |
|---|---|---|---|
| Basic LM Studio local chat | No | No | Best GUI-first path. |
| Basic Ollama local chat | No | No | Install Ollama directly first. |
| Ollama local API | No | No | Confirm local runtime before adding layers. |
| Open WebUI with Ollama | Usually yes for common Docker path | Often yes on Windows workflows | More advanced than first local chat. |
| Linux-oriented tutorials on Windows | Sometimes | Often | Follow Windows-specific instructions. |
| Private PDF workflow | Not necessarily | Not necessarily | Depends on whether you use LM Studio, Open WebUI, or another app. |
If you are a beginner, do not make Docker the first problem you solve. First prove that your machine can run a local model.
Driver and OS checklist
Before installing anything, check these items:
- Windows version and edition.
- CPU architecture and instruction support.
- Total system RAM.
- Dedicated GPU model.
- Dedicated VRAM amount.
- Whether Windows is showing shared GPU memory separately.
- NVIDIA or AMD driver status.
- Free disk space on the drive where models will live.
- Whether you want GUI-only, runtime/API, or Open WebUI.
- Whether corporate security software may block installers or local servers.
This checklist matters because Windows local AI problems often look like app problems when they are really hardware, driver, storage, or networking problems.
First model to try on Windows
Start smaller than you think.
| Hardware tier | First model class | Why |
|---|---|---|
| 8GB RAM, no discrete GPU | 3B Q4/Q5 | Keeps the system usable. |
| 16GB RAM, no discrete GPU | 3B to 7B/8B Q4 | 7B/8B may work, but expect limits. |
| NVIDIA 6GB VRAM | 7B/8B Q4 | Entry discrete-GPU local AI tier. |
| NVIDIA 8GB VRAM | 7B/8B Q4/Q5 | Stronger beginner tier. |
| NVIDIA 12GB VRAM | 14B Q4/Q5 | Better quality with realistic fit. |
| NVIDIA 16GB VRAM | 14B comfortably | Good local assistant tier. |
| NVIDIA 24GB VRAM | 32B Q4/Q5 | Serious local model tier. |
| AMD Windows GPU | One class lower than raw VRAM suggests | Support and performance are more variable. |
Do not use the first model run to prove the largest possible model. Use it to prove the setup works.
Model storage on Windows
Model files can become large quickly. Before downloading many models, decide where you want them stored.
For Ollama, the research packet notes that models are stored in a user .ollama model directory by default and that the model directory can be relocated with the documented OLLAMA_MODELS environment variable. For LM Studio, the product research notes that the model directory is configurable and that models are organized under an LM Studio models directory.
Practical advice:
- Do not fill your system drive by accident.
- Decide whether models should live on
C:or another drive before downloading many large files. - Keep the first test model small.
- Delete models you do not use.
- Record the storage location in any tutorial or screenshot pack.
What not to try first on Windows
| Mistake | Why it causes trouble | Better move |
|---|---|---|
| Treating shared GPU memory as dedicated VRAM | Shared memory can be much slower. | Plan around dedicated VRAM. |
| Starting with a 32B model on a normal laptop | It may not fit or may run painfully slowly. | Start with 3B, 7B, or 8B depending on hardware. |
| Assuming AMD behaves like NVIDIA | Support paths differ and can be more configuration-sensitive. | Check exact support first. |
| Installing Open WebUI before Ollama works | Adds Docker/networking complexity too early. | Verify Ollama first. |
| Assuming CPU-only is broken because it is slow | CPU-only conservative estimate is often slow by nature. | Use a smaller model. |
| Ignoring model storage location | Model files can fill the wrong drive. | Choose storage path before large downloads. |
| Uploading sensitive documents before checking privacy settings | Local app does not always mean local model. | Confirm provider and storage path first. |
Privacy caveats for Windows local AI
Local AI can be more private than cloud AI, but only for the parts of the workflow that actually remain local.
A Windows setup is more meaningfully local when:
- The model runs on your Windows machine.
- Prompts are processed locally.
- Documents and embeddings stay on local storage.
- The selected provider is not a cloud API.
- The local server is bound only to localhost.
- Cloud features, web search, remote access, and public tunnels are off unless intentionally used.
A Windows local AI setup may still contact the internet for:
- App downloads.
- Model downloads.
- Runtime downloads.
- Update checks.
- Model search.
- Cloud model providers.
- Web search.
- Community hubs.
- Remote access features.
- MCP tools and extensions.
The key rule is this: a local app and a local model are not the same thing. If your local interface is connected to OpenAI, Anthropic, Groq, or another hosted provider, your prompts and uploaded content may leave your computer for conservative estimate.
Also remember that local servers can create security issues. A service bound to localhost is very different from a service exposed to your network, a reverse proxy, or a public tunnel. Do not expose Ollama, LM Studio, Open WebUI, or AnythingLLM beyond localhost unless you understand authentication, network access, and the security tradeoff.
Common Windows troubleshooting
| Problem | Likely cause | First fix | Evidence label |
|---|---|---|---|
| LM Studio will not run | Unsupported CPU, OS, or architecture issue | Check current LM Studio system requirements. | Official documentation reviewed |
| Model loads but is very slow | CPU-only path or shared-memory fallback | Try a smaller model and confirm GPU use. | Conservative estimate, not a benchmark |
| GPU is not being used | Driver or backend mismatch | Update drivers and confirm supported backend. | Conservative estimate, not a benchmark |
| Ollama works but Open WebUI cannot see it | Docker, localhost, or networking issue | Verify Ollama locally before debugging Open WebUI. | Conservative estimate, not a benchmark |
| Downloads fail or restart | Network or storage issue | Check free disk space and try a smaller model first. | Conservative estimate, not a benchmark |
| Model is stored on the wrong drive | Default model path used | Configure the documented model directory before large downloads. | Official documentation reviewed, with caveats |
| Windows shows large shared GPU memory | Shared memory confused with dedicated VRAM | Use dedicated VRAM as the recommendation budget. | Conservative estimate, not a benchmark |
| PDF chat is inaccurate | Parsing, retrieval, or model limits | Use cleaner PDFs and verify outputs against the source. | Conservative estimate, not a benchmark |
| Setup is “local” but network is active | Downloads, updates, providers, or web features | Check the selected model provider and app settings. | Official documentation reviewed, with caveats |
Windows beginner setup checklist
Before downloading a model, answer these questions:
- How much system RAM do I have?
- Do I have a discrete GPU?
- How much dedicated VRAM do I have?
- Is my GPU NVIDIA, AMD, or integrated?
- Do I want a desktop app or a runtime/API?
- Do I want Open WebUI eventually?
- Do I want to use PDFs or documents?
- Where should model files be stored?
- Am I handling sensitive data?
- Am I willing to use Docker or WSL?
Then choose:
| If your answer is… | Start with… |
|---|---|
| “I want the easiest local chat app.” | LM Studio |
| “I want a runtime/API.” | Ollama |
| “I have NVIDIA VRAM.” | Choose model size by dedicated VRAM. |
| “I have only integrated graphics.” | Start small and expect lower speed. |
| “I have AMD on Windows.” | Verify exact support before relying on GPU acceleration. |
| “I want Open WebUI.” | Install and verify Ollama first. |
| “I want local PDF chat.” | Start with LM Studio or Open WebUI after runtime setup. |
| “I have sensitive documents.” | Read the privacy guide first. |
Frequently asked questions
Can Windows run local AI?
Yes. Windows can run local AI with tools such as LM Studio, Ollama, and Open WebUI. The quality of the experience depends heavily on RAM, GPU, dedicated VRAM, drivers, model size, and whether the workload stays on GPU or falls back to slower memory.
Is Ollama available on Windows?
Yes. Ollama has a Windows install path. It is a good choice if you want a local runtime, local API, or a backend for tools like Open WebUI.
Should I use LM Studio or Ollama on Windows?
Use LM Studio if you want the easiest desktop app. Use Ollama if you want a runtime/API or plan to add Open WebUI or other integrations.
Do I need Docker for local AI on Windows?
Not for basic LM Studio or basic Ollama. Docker usually enters the picture when you want tools like Open WebUI or self-hosted interface layers.
Do I need WSL?
Not for basic LM Studio or basic Ollama. WSL is often part of Windows workflows for Docker-based or Linux-oriented tools, including many Open WebUI tutorials.
Can I run local AI without a GPU?
Yes, but expectations should be modest. CPU-only or integrated-graphics setups should start with small models and short prompts.
Where does Ollama store models on Windows?
The product research notes that Ollama stores models under the user .ollama model directory by default and supports relocating the model directory through the documented OLLAMA_MODELS environment variable. Confirm the current official path before publishing screenshots or instructions.
Why is local AI slow on my Windows PC?
Common reasons include using a model that is too large, running on CPU, spilling into shared memory, outdated drivers, too much context, or not enough dedicated VRAM.