A counter-claim to @N01ennn’s local-AI article, reality-checked against the original article, its replies, the V100 tweet’s replies, and current eBay pricing.
Cancelled 6 of 9 AI subscriptions, saved $140/month. Recommends four setups, cheapest to most expensive: Tesla P40 ($180), Mac mini M4 ($599), used RTX 3090 ($700), Mac Studio M3 Ultra ($4,199). Same software stack on all four: Ollama + Open WebUI + the OpenAI-compatible API on localhost.
The article does not mention the V100. The “V100 counter-take” below comes from a separate tweet by @doublenickk two days later.
Claims a Tesla V100 SXM2 16GB, originally $10K in 2017, now retires from cloud datacenters as H100s land, is on eBay for $200–$300. Drop a $50 SXM2-to-PCIe adapter into an HP Z8 G4 with thermal pad and the total build is $330, running GPT-OSS 20B at 87°C / 180W. Argues this delivers the same job as the article’s $700 RTX 3090 for one third the price — with HBM2.
The original tweet also flags three gotchas: HP Z620 fails on AVX2 (use Z8 G4 or Z840 instead), install CUDA 11 not 12, and flip Above-4G decoding in BIOS.
This is where the original tweet’s replies got it half right. Multiple replies pushed back hard, claiming V100 16GB listings are actually $700 and 32GB is $1,000+. Web-search-verified current eBay pricing tells a more nuanced story:
| Card | Typical 2026 eBay price | Notes |
|---|---|---|
| V100 SXM2 16GB | $200–$300 | Mainstream range. China-based “pulled-from-server” sellers anchor the low end at $140–$200. Refurbished / reputable sellers sit $250–$300. |
| V100 SXM2 32GB | $500–$700 | Premium. Some seller-condition combos go higher. The “$1,000+” the tweet replies cited are outliers or full DGX-1 partial builds. |
| SXM2-to-PCIe adapter | $50–$100+ | Often bundled with the card. Budget $50 standalone, more for active-cooled variants. |
Verdict on the $250 claim: defensible for a 16GB SXM2 from a Chinese pull-it-yourself seller. The $330 total build is plausible if you already own a Z8 G4 (or comparable dual-Xeon board), the adapter, and a PSU big enough. The tweet replies citing $700 for 16GB were either looking at wrong listings or higher-condition cards.
The SXM2 form factor was never meant for a desktop. It uses a mezzanine connector designed for proprietary server baseboards (DGX-1, HGX). To run one at home you need three things a normal RTX 3090 build does not need.
llama.cpp with Volta-specific build flags, or vLLM with Volta config. Modern TensorRT-LLM and Triton are dropping Volta support — verify your stack before you buy.This is the part @N01ennn got exactly right: the runtime is identical whether you spent $180 or $4,200 on the box.
# Install the runtime
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull gpt-oss:20b
# or
ollama pull qwen3.6:27b
# Point Claude Code at your local model
ANTHROPIC_BASE_URL=http://localhost:11434/v1 claude
# Private ChatGPT-style web UI
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
| V100 SXM2 16GB used |
Tesla P40 24GB used |
RTX 3090 24GB used |
Mac mini M4 16–64GB |
|
|---|---|---|---|---|
| Price | $200–$300 (+ $50–$100 adapter) | $150–$250 | $650–$750 | $599–$1,399 |
| VRAM | 16GB HBM2 + ECC | 24GB GDDR5 + ECC | 24GB GDDR6X | 16–64GB unified |
| Memory BW | ~900 GB/s | ~346 GB/s | ~936 GB/s | ~120 GB/s |
| TDP | 250W | 250W | 350W | 10–30W |
| Noise | Loud (blower) / silent (water) | Loud (DIY shroud) | Loud (blower) | Silent |
| BF16 / quant native | No (FP16 only) | No (FP16 only) | Yes (native) | Yes (native) |
| Software support 2026 | CUDA 11 only, lagging | CUDA 11 only, niche | First-class | First-class |
| Plug-and-play | No | Mostly | Yes | Yes |
| Cooling complexity | High (SXM2 adapter) | Medium (DIY shroud) | Low | None |
| Electricity (24/7) | ~$9/mo | ~$9/mo | ~$12/mo | ~$3–5/mo |
Four NVIDIA Titan Xp 12GB in an old i7 chassis with 128GB RAM, running 35B models at 25 tok/sec. The most aggressive “ram-everything” build in the thread. 12GB × 4 = 48GB of effective VRAM via PCIe + unified host RAM. No adapters, no ECC, but the cheapest path to running bigger models.
Pushes back on the article’s $700 RTX 3090 price. Reality is probably somewhere between: $650–$750 is the common range, $1,000+ for pristine or recently-mined cards. The article’s number is reasonable for 2026 but not the floor.
Hits the actual frontier question. The article’s claim that Mac Studio 192GB can run “Llama 4 Maverick, full DeepSeek V3, Qwen3 235B without quantization tricks” is half-true. You can load those models but inference speed and quality at home on a single machine is not the same as running them on a 2–3T cloud. Local AI in 2026 covers 80–85% of heavy-user needs — frontier work still needs a cloud sub.
The V100 SXM2 hack is real engineering, and the $250 price is more defensible than the original tweet’s replies gave it credit for. But $250 is the floor, not the average, and it only works for the 16GB card — which can only fit GPT-OSS 20B, Qwen3 8B-class models, or aggressively quantized mid-size models.
If you want plug-and-play: the article’s RTX 3090 at $700 is still the sweet spot for 24GB and modern software support.
If you want cheapest possible: the article’s Tesla P40 at $180 is the real entry point, with 24GB of VRAM and no SXM2 gymnastics.
If you want silent and efficient: the article’s Mac mini M4 at $599 is the right answer if you can live with 16–32GB unified memory.
If you want a V100 specifically: the 16GB SXM2 at $250 + $50 adapter into a $300 used Z8 G4 is a real $600 build. The 32GB at $500–$700 is the better card if you can stretch. Skip the SXM2 entirely and look at the GP100 (~$170, HBM2, no adapter) that @RandoCollector flagged — that’s the actual hidden gem.
What the original tweet got wrong: the article it’s countering doesn’t even list the V100. It’s a real engineering counter-claim against an article that never mentioned the card. The fight @doublenickk is picking is with the RTX 3090 pick, not the article as a whole.
What the original tweet got right: the SXM2 + adapter + Z8 G4 build works, runs GPT-OSS 20B fine, costs less than a 3090 if you already own the chassis and adapter, and the gotchas (Z620 AVX2, CUDA 11, Above-4G decoding) are real and would have bitten anyone who didn’t read the footnotes.