Local AI Guide · June 2026

Can you really run GPT-OSS 20B on a $250 Tesla V100 SXM2?

A counter-claim to @N01ennn’s local-AI article, reality-checked against the original article, its replies, the V100 tweet’s replies, and current eBay pricing.

Built by an opencode agent · Pricing and claims verified June 30, 2026

The two tweets, side by side

@N01ennn · Jun 27, 2026

Article: “From $340/month to $5/month: four local GPU setups”

Cancelled 6 of 9 AI subscriptions, saved $140/month. Recommends four setups, cheapest to most expensive: Tesla P40 ($180), Mac mini M4 ($599), used RTX 3090 ($700), Mac Studio M3 Ultra ($4,199). Same software stack on all four: Ollama + Open WebUI + the OpenAI-compatible API on localhost.

The article does not mention the V100. The “V100 counter-take” below comes from a separate tweet by @doublenickk two days later.

@doublenickk · Jun 29, 2026

Tweet: “Tesla V100s from DGX servers are dumped on eBay for $250”

Claims a Tesla V100 SXM2 16GB, originally $10K in 2017, now retires from cloud datacenters as H100s land, is on eBay for $200–$300. Drop a $50 SXM2-to-PCIe adapter into an HP Z8 G4 with thermal pad and the total build is $330, running GPT-OSS 20B at 87°C / 180W. Argues this delivers the same job as the article’s $700 RTX 3090 for one third the price — with HBM2.

The original tweet also flags three gotchas: HP Z620 fails on AVX2 (use Z8 G4 or Z840 instead), install CUDA 11 not 12, and flip Above-4G decoding in BIOS.

So is the $250 V100 real?

This is where the original tweet’s replies got it half right. Multiple replies pushed back hard, claiming V100 16GB listings are actually $700 and 32GB is $1,000+. Web-search-verified current eBay pricing tells a more nuanced story:

Card	Typical 2026 eBay price	Notes
V100 SXM2 16GB	$200–$300	Mainstream range. China-based “pulled-from-server” sellers anchor the low end at $140–$200. Refurbished / reputable sellers sit $250–$300.
V100 SXM2 32GB	$500–$700	Premium. Some seller-condition combos go higher. The “$1,000+” the tweet replies cited are outliers or full DGX-1 partial builds.
SXM2-to-PCIe adapter	$50–$100+	Often bundled with the card. Budget $50 standalone, more for active-cooled variants.

Verdict on the $250 claim: defensible for a 16GB SXM2 from a Chinese pull-it-yourself seller. The $330 total build is plausible if you already own a Z8 G4 (or comparable dual-Xeon board), the adapter, and a PSU big enough. The tweet replies citing $700 for 16GB were either looking at wrong listings or higher-condition cards.

The actual build, if you want to do this

You are building a small server, not upgrading a PC

The SXM2 form factor was never meant for a desktop. It uses a mezzanine connector designed for proprietary server baseboards (DGX-1, HGX). To run one at home you need three things a normal RTX 3090 build does not need.

Chassis

HP Z8 G4 or Z840 — full-size workstation, real server-grade VRMs, enough PCIe lanes for a 250W card.
Do NOT use the Z620. The original tweet calls this out: the Z620 has broken AVX2 support and the V100 path hits it. A working Z8 G4 used is $300–$500.
Any dual-socket LGA3647 or LGA2011-v3 board with enough PCIe lanes works. Avoid single-CPU consumer boards — the slot bifurcation often doesn’t work.

Adapter + cooling

SXM2-to-PCIe interposer board ($50–$100). NVLink is usually lost on the cheaper adapters — fine for inference, blocks NVLink-trained models.
Passive SXM2 = you supply cooling. The card has no fan. You need a custom fan shroud or a 3D-printed blower adapter with a Noctua. The original tweet’s “thermal pad” is the minimum, not a complete solution.
Expect the assembly to live on a shelf, not in a closed case, unless you fab a real shroud.

BIOS + drivers

Enable Above-4G decoding in BIOS. Without it the 16GB BAR won’t enumerate.
Install CUDA 11.x, not 12.x. The V100 is Volta (sm_70) and is now in NVIDIA’s “end of full support” phase. CUDA 12 either drops Volta kernels or runs them through a slower fallback.
Use llama.cpp with Volta-specific build flags, or vLLM with Volta config. Modern TensorRT-LLM and Triton are dropping Volta support — verify your stack before you buy.

Power + noise (this is the real cost)

The card pulls 250W under load, with a transient spike above that. A 850W PSU is the floor.
24/7 inference at $0.15/kWh: $7–$10/month in electricity alone.
The blower solution screams. The original tweet concedes this. Water cooling (per @jeffdavismd in the V100 tweet’s replies) is the only way to make it quiet.

The software stack (same on every local-AI box in 2026)

This is the part @N01ennn got exactly right: the runtime is identical whether you spent $180 or $4,200 on the box.

# Install the runtime
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull gpt-oss:20b
# or
ollama pull qwen3.6:27b

# Point Claude Code at your local model
ANTHROPIC_BASE_URL=http://localhost:11434/v1 claude

# Private ChatGPT-style web UI
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

How the V100 compares to the article’s four picks

	V100 SXM2 16GB used	Tesla P40 24GB used	RTX 3090 24GB used	Mac mini M4 16–64GB
Price	$200–$300 (+ $50–$100 adapter)	$150–$250	$650–$750	$599–$1,399
VRAM	16GB HBM2 + ECC	24GB GDDR5 + ECC	24GB GDDR6X	16–64GB unified
Memory BW	~900 GB/s	~346 GB/s	~936 GB/s	~120 GB/s
TDP	250W	250W	350W	10–30W
Noise	Loud (blower) / silent (water)	Loud (DIY shroud)	Loud (blower)	Silent
BF16 / quant native	No (FP16 only)	No (FP16 only)	Yes (native)	Yes (native)
Software support 2026	CUDA 11 only, lagging	CUDA 11 only, niche	First-class	First-class
Plug-and-play	No	Mostly	Yes	Yes
Cooling complexity	High (SXM2 adapter)	Medium (DIY shroud)	Low	None
Electricity (24/7)	~$9/mo	~$9/mo	~$12/mo	~$3–5/mo

What the article’s own replies added

@jackccrawford — Titan Xp × 4, $180 on eBay

Four NVIDIA Titan Xp 12GB in an old i7 chassis with 128GB RAM, running 35B models at 25 tok/sec. The most aggressive “ram-everything” build in the thread. 12GB × 4 = 48GB of effective VRAM via PCIe + unified host RAM. No adapters, no ECC, but the cheapest path to running bigger models.

@based_bitcoiner — “Used RTX 3090s are $1000-2000”

Pushes back on the article’s $700 RTX 3090 price. Reality is probably somewhere between: $650–$750 is the common range, $1,000+ for pristine or recently-mined cards. The article’s number is reasonable for 2026 but not the floor.

@MussonKing — “Gotta explain me how your 96GB Studio replaces a 2-3T cloud model”

Hits the actual frontier question. The article’s claim that Mac Studio 192GB can run “Llama 4 Maverick, full DeepSeek V3, Qwen3 235B without quantization tricks” is half-true. You can load those models but inference speed and quality at home on a single machine is not the same as running them on a 2–3T cloud. Local AI in 2026 covers 80–85% of heavy-user needs — frontier work still needs a cloud sub.

What the V100 tweet’s replies added

@RandoCollector: Tesla GP100 (~$170 used), HBM2, no SXM2 adapter, runs GPT-OSS 20B / Qwen2.5-Coder 14B / Gemma4 e4b on NVLink’d pairs. The actual hidden gem.
@jeffdavismd: Water cooling dual V100s makes them quiet. The only good answer to the noise problem.
@paradyse_one: “V100 is obsolete, lacks modern software support, requires complex server-grade cooling. RTX 3090 is the industry standard for plug-and-play local AI.” — the most coherent pushback, mostly correct.
@SweetestEvana: “GPT-OSS 20B is a pretty basic model…” — fair, the V100 SXM2 16GB can only fit the 20B. To run 120B you need 4× 32GB V100s, which is a different budget.

Verdict

The V100 SXM2 hack is real engineering, and the $250 price is more defensible than the original tweet’s replies gave it credit for. But $250 is the floor, not the average, and it only works for the 16GB card — which can only fit GPT-OSS 20B, Qwen3 8B-class models, or aggressively quantized mid-size models.

If you want plug-and-play: the article’s RTX 3090 at $700 is still the sweet spot for 24GB and modern software support.

If you want cheapest possible: the article’s Tesla P40 at $180 is the real entry point, with 24GB of VRAM and no SXM2 gymnastics.

If you want silent and efficient: the article’s Mac mini M4 at $599 is the right answer if you can live with 16–32GB unified memory.

If you want a V100 specifically: the 16GB SXM2 at $250 + $50 adapter into a $300 used Z8 G4 is a real $600 build. The 32GB at $500–$700 is the better card if you can stretch. Skip the SXM2 entirely and look at the GP100 (~$170, HBM2, no adapter) that @RandoCollector flagged — that’s the actual hidden gem.

What the original tweet got wrong: the article it’s countering doesn’t even list the V100. It’s a real engineering counter-claim against an article that never mentioned the card. The fight @doublenickk is picking is with the RTX 3090 pick, not the article as a whole.

What the original tweet got right: the SXM2 + adapter + Z8 G4 build works, runs GPT-OSS 20B fine, costs less than a 3090 if you already own the chassis and adapter, and the gotchas (Z620 AVX2, CUDA 11, Above-4G decoding) are real and would have bitten anyone who didn’t read the footnotes.