Local AI Guide · June 2026

Can you really run GPT-OSS 20B on a $250 Tesla V100 SXM2?

A counter-claim to @N01ennn’s local-AI article, reality-checked against the original article, its replies, the V100 tweet’s replies, and current eBay pricing.

Built by an opencode agent · Pricing and claims verified June 30, 2026

The two tweets, side by side

@N01ennn · Jun 27, 2026

Article: “From $340/month to $5/month: four local GPU setups”

Cancelled 6 of 9 AI subscriptions, saved $140/month. Recommends four setups, cheapest to most expensive: Tesla P40 ($180), Mac mini M4 ($599), used RTX 3090 ($700), Mac Studio M3 Ultra ($4,199). Same software stack on all four: Ollama + Open WebUI + the OpenAI-compatible API on localhost.

The article does not mention the V100. The “V100 counter-take” below comes from a separate tweet by @doublenickk two days later.

@doublenickk · Jun 29, 2026

Tweet: “Tesla V100s from DGX servers are dumped on eBay for $250”

Claims a Tesla V100 SXM2 16GB, originally $10K in 2017, now retires from cloud datacenters as H100s land, is on eBay for $200–$300. Drop a $50 SXM2-to-PCIe adapter into an HP Z8 G4 with thermal pad and the total build is $330, running GPT-OSS 20B at 87°C / 180W. Argues this delivers the same job as the article’s $700 RTX 3090 for one third the price — with HBM2.

The original tweet also flags three gotchas: HP Z620 fails on AVX2 (use Z8 G4 or Z840 instead), install CUDA 11 not 12, and flip Above-4G decoding in BIOS.

So is the $250 V100 real?

This is where the original tweet’s replies got it half right. Multiple replies pushed back hard, claiming V100 16GB listings are actually $700 and 32GB is $1,000+. Web-search-verified current eBay pricing tells a more nuanced story:

CardTypical 2026 eBay priceNotes
V100 SXM2 16GB $200–$300 Mainstream range. China-based “pulled-from-server” sellers anchor the low end at $140–$200. Refurbished / reputable sellers sit $250–$300.
V100 SXM2 32GB $500–$700 Premium. Some seller-condition combos go higher. The “$1,000+” the tweet replies cited are outliers or full DGX-1 partial builds.
SXM2-to-PCIe adapter $50–$100+ Often bundled with the card. Budget $50 standalone, more for active-cooled variants.

Verdict on the $250 claim: defensible for a 16GB SXM2 from a Chinese pull-it-yourself seller. The $330 total build is plausible if you already own a Z8 G4 (or comparable dual-Xeon board), the adapter, and a PSU big enough. The tweet replies citing $700 for 16GB were either looking at wrong listings or higher-condition cards.

The actual build, if you want to do this

You are building a small server, not upgrading a PC

The SXM2 form factor was never meant for a desktop. It uses a mezzanine connector designed for proprietary server baseboards (DGX-1, HGX). To run one at home you need three things a normal RTX 3090 build does not need.

Chassis

Adapter + cooling

BIOS + drivers

Power + noise (this is the real cost)

The software stack (same on every local-AI box in 2026)

This is the part @N01ennn got exactly right: the runtime is identical whether you spent $180 or $4,200 on the box.

# Install the runtime
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull gpt-oss:20b
# or
ollama pull qwen3.6:27b

# Point Claude Code at your local model
ANTHROPIC_BASE_URL=http://localhost:11434/v1 claude
# Private ChatGPT-style web UI
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

How the V100 compares to the article’s four picks

V100 SXM2 16GB
used
Tesla P40 24GB
used
RTX 3090 24GB
used
Mac mini M4
16–64GB
Price$200–$300 (+ $50–$100 adapter)$150–$250$650–$750$599–$1,399
VRAM16GB HBM2 + ECC24GB GDDR5 + ECC24GB GDDR6X16–64GB unified
Memory BW~900 GB/s~346 GB/s~936 GB/s~120 GB/s
TDP250W250W350W10–30W
NoiseLoud (blower) / silent (water)Loud (DIY shroud)Loud (blower)Silent
BF16 / quant nativeNo (FP16 only)No (FP16 only)Yes (native)Yes (native)
Software support 2026CUDA 11 only, laggingCUDA 11 only, nicheFirst-classFirst-class
Plug-and-playNoMostlyYesYes
Cooling complexityHigh (SXM2 adapter)Medium (DIY shroud)LowNone
Electricity (24/7)~$9/mo~$9/mo~$12/mo~$3–5/mo

What the article’s own replies added

@jackccrawford — Titan Xp × 4, $180 on eBay

Four NVIDIA Titan Xp 12GB in an old i7 chassis with 128GB RAM, running 35B models at 25 tok/sec. The most aggressive “ram-everything” build in the thread. 12GB × 4 = 48GB of effective VRAM via PCIe + unified host RAM. No adapters, no ECC, but the cheapest path to running bigger models.

@based_bitcoiner — “Used RTX 3090s are $1000-2000”

Pushes back on the article’s $700 RTX 3090 price. Reality is probably somewhere between: $650–$750 is the common range, $1,000+ for pristine or recently-mined cards. The article’s number is reasonable for 2026 but not the floor.

@MussonKing — “Gotta explain me how your 96GB Studio replaces a 2-3T cloud model”

Hits the actual frontier question. The article’s claim that Mac Studio 192GB can run “Llama 4 Maverick, full DeepSeek V3, Qwen3 235B without quantization tricks” is half-true. You can load those models but inference speed and quality at home on a single machine is not the same as running them on a 2–3T cloud. Local AI in 2026 covers 80–85% of heavy-user needs — frontier work still needs a cloud sub.

What the V100 tweet’s replies added

Verdict

The V100 SXM2 hack is real engineering, and the $250 price is more defensible than the original tweet’s replies gave it credit for. But $250 is the floor, not the average, and it only works for the 16GB card — which can only fit GPT-OSS 20B, Qwen3 8B-class models, or aggressively quantized mid-size models.

If you want plug-and-play: the article’s RTX 3090 at $700 is still the sweet spot for 24GB and modern software support.

If you want cheapest possible: the article’s Tesla P40 at $180 is the real entry point, with 24GB of VRAM and no SXM2 gymnastics.

If you want silent and efficient: the article’s Mac mini M4 at $599 is the right answer if you can live with 16–32GB unified memory.

If you want a V100 specifically: the 16GB SXM2 at $250 + $50 adapter into a $300 used Z8 G4 is a real $600 build. The 32GB at $500–$700 is the better card if you can stretch. Skip the SXM2 entirely and look at the GP100 (~$170, HBM2, no adapter) that @RandoCollector flagged — that’s the actual hidden gem.

What the original tweet got wrong: the article it’s countering doesn’t even list the V100. It’s a real engineering counter-claim against an article that never mentioned the card. The fight @doublenickk is picking is with the RTX 3090 pick, not the article as a whole.

What the original tweet got right: the SXM2 + adapter + Z8 G4 build works, runs GPT-OSS 20B fine, costs less than a 3090 if you already own the chassis and adapter, and the gotchas (Z620 AVX2, CUDA 11, Above-4G decoding) are real and would have bitten anyone who didn’t read the footnotes.