The Final Local-AI Box · June 2026

Used RTX 3090 in a clean mid-tower

If I had to pick one box for local AI in 2026, this is it. Not the cheapest, not the quietest, not the most powerful — the one I’d actually buy with my own money and use every day. Use the configurator below to find yours.

Companion to /v100-guide/ · Built by an opencode agent · Pricing June 30, 2026

The headline pick

A used NVIDIA RTX 3090 (24GB GDDR6X), in any decent mid-tower with an 850W PSU

Total: $700–$1,400 depending on whether you already have a PC. Runs every local-AI model that matters in 2026, no SXM2 adapter, no custom cooling, no AVX2 motherboard lottery, no CUDA 11 lock-in. 24GB of VRAM is the sweet spot for Qwen 3 27B, GPT-OSS 20B, DeepSeek V3 Q4, and Llama 3.3 70B Q3.

The configurator

What do you have? Desktop PC I can drop a GPU into Mid-tower or larger, a spare PCIe x16 slot, PSU ≥ 850W Building from scratch Headless local-AI server, new case + PSU + parts Want it silent and low-power Mac mini / Mac Studio territory — 30W, fanless-ish Cheapest serious local AI Used datacenter card, DIY acceptable

Budget (USD, all-in) ~$300 Entry-level. P40 territory, used parts. ~$700 The 3090 sweet spot. 24GB VRAM, modern software. ~$1,500 From-scratch build with used 3090, or 32GB V100 setup. ~$4,000+ Mac Studio M3 Ultra territory, 96GB unified memory. No real limit Frontier. 192GB Mac Studio, multi-GPU V100, 5090.

Why this is the default pick

Three things make the 3090 the right answer for “if I had to pick one” in 2026:

24GB GDDR6X is the right amount of VRAM. Big enough for 27B-class models at usable quantizations, small enough that you can find one for $700 used. The 16GB cards (V100 16GB, RTX 4060 Ti 16GB) cap out at 8B-class models comfortably. The 32GB+ cards (V100 32GB, RTX 5090) are 3–5x the price.
First-class software support. Ampere architecture. CUDA 12. PyTorch native BF16. TensorRT-LLM. vLLM. llama.cpp with every quantization format. Ollama. LM Studio. The 3090 is the card every local-AI tutorial assumes you have, because it has been the consensus pick for three years running.
Actually used as a GPU. It drives a display. It runs Blender. It plays games. It is not a server card that needs an SXM2 interposer, custom shrouds, or a 250W blower screaming in your office. The trade is 350W TDP and loud fans under load, but that’s the price of any modern consumer GPU.

The software stack (set this up Saturday afternoon)

From bare metal to a working local-AI box in about an hour. Same commands whether you spent $700 or $4,200 on the box.

# 1. Install Ubuntu 22.04 LTS (or 24.04 LTS) on the NVMe
# 2. Install NVIDIA driver (Ubuntu's "Additional Drivers" UI does this)
sudo ubuntu-drivers autoinstall
sudo reboot

# 3. Verify the driver sees the GPU
nvidia-smi

# 4. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 5. Pull a model
ollama pull qwen3:27b           # 17GB, fits comfortably in 24GB
ollama pull gpt-oss:20b         # 12GB, fast
ollama pull llama3.3:70b-q3_K_M # 30GB, won't fit on 24GB - this is the wall
ollama pull deepseek-v3:q4      # if you want to feel the VRAM pain

# 6. Point Claude Code / Cursor / etc. at the local API
export ANTHROPIC_BASE_URL=http://localhost:11434/v1
claude                            # or cursor, or your tool of choice

# 7. Private ChatGPT-style web UI
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --restart unless-stopped \
  ghcr.io/open-webui/open-webui:main
# Visit http://localhost:3000

What runs well on 24GB (Q4 / Q5 quantizations)

GPT-OSS 20B — fast, 50+ tok/sec, covers ~80% of daily work
Qwen 3 27B — the new consensus for local coding; matches GPT-4-class on most benchmarks
Llama 3.3 70B Q3 — fits, runs ~15 tok/sec, the previous-gen “big local” benchmark
Mistral, Gemma, Phi, Qwen Coder — all 7B–14B class, fly on a 3090

What doesn’t fit and where to go

Anything >32B at Q4+: upgrade to V100 32GB (~$700, HBM2, but CUDA 11 lock-in) or wait for a used 3090 Ti / 4090 deal. Or keep your $20/month Claude Pro sub for the frontier cases — the 15–20% @N01ennn mentions in his article.
Anything >70B at any quant: Mac Studio M3 Ultra 192GB ($7,499) is the consumer-tier answer. Or rent a H100 on Vast.ai / RunPod for $2/hr.

What you give up

Honest tradeoffs of the 3090 default

350W TDP. The 3090 pulls 100W more than the V100 SXM2 and ~320W more than a Mac mini M4. Electricity at $0.15/kWh runs $12–$15/month at 24/7 inference.
Loud under load. AIB 3090s with 3-fan coolers are tolerable but audible. Founders Edition is louder. This is not a bedroom-friendly 24/7 box. If silence matters, pick “silent” in the configurator.
No ECC. HBM2 on the V100 has ECC; GDDR6X on the 3090 does not. For most inference workloads this doesn’t matter. For long-running training it would.
24GB is a wall, not a floor. Once you’ve used a 3090 to run Qwen 3 27B at 30 tok/sec, you will want 32GB+ for the 70B Q4 models. The upgrade path is another $500–$1,500 (V100 32GB, 3090 Ti, used 4090, or 5090).
No NVLink for inference. The 3090 has NVLink but Ollama / vLLM / llama.cpp don’t use it for inference. To go beyond 24GB you buy a second 3090 (still no NVLink gain) or move to a 32GB+ card.

When the configurator’s pick is wrong

You might need a different shape

You want silence and 24/7 above all: pick “silent” + “$4,000+” — Mac Studio M3 Ultra 96GB ($4,199). 30W TDP, near-silent, 96GB unified memory covers everything 70B Q4 and below.
You want cheapest possible entry and have a desktop: pick “cheapest” + “$300” — Tesla P40 24GB ($180 used) + EPS adapter ($10) + DIY shroud with Noctua ($25) = $215. Half the speed of the 3090, $500 cheaper.
You specifically need HBM2 or ECC: read the /v100-guide/ page. The V100 SXM2 32GB at $500–$700 is the answer, accept the CUDA 11 / 250W / SXM2 complexity.
You only need it a few hours a week: rent a H100 on Vast.ai or RunPod for $2/hr. Owning the box is a sunk cost; renting is per-use.

Why I’m not picking the V100

The V100 SXM2 hack is brilliant engineering and the $250 floor is real. But it is a project, not a tool. To do it you need:

The SXM2-to-PCIe adapter ($50–$100)
A custom cooling solution (3D-printed shroud, Noctua, or water block)
CUDA 11, not 12
Above-4G decoding in BIOS
Motherboards without the AVX2 bug (no Z620)
Patience with 250W blowers or a water-cooling build

That’s a weekend of work and ongoing software friction. The 3090 is plug in, install the driver, pull a model, done. For a hobby project, the V100 is fun. For “if I had to get one for myself and actually use it every day,” the 3090 wins.

The V100 guide is for the people who want to do the project. This page is for the people who want the tool.

The order I’d place tonight

Open the configurator above. Pick what matches your situation. The output is your build list, the total is the cost, the “models that fit” tells you what you can run.

If you want the default: eBay search “RTX 3090 24GB”, sort by price + distance, buy from a 98%+ feedback seller, $650–$750. Install driver, install Ollama, pull qwen3:27b. Done by Sunday.

Either way: keep the $20/month Claude Pro sub active for the 15–20% of work where the frontier still pulls ahead. The hardware replaces the $200/month in canceled subscriptions. After 4 months the hardware has paid for itself, and you own the box for the next 5–7 years.