VE LABAgent-Native CLI · Report

opencli · paulgraham

Paul Graham, as a queryable CLI.

An agent-native command-line interface over paulgraham.com — list, read, and full-text-search 25 years of essays. This report documents the tool and what its cached corpus reveals.

01 · What it is

Five commands over a static essay archive

paulgraham.com is a keyless, static site. The CLI turns it into structured, agent-clean data — no browser, no API key, no scraping fragility on the caller's side.

Command	Does	Scope
`essays`	List all essays, newest first	index page
`read <slug>`	Full clean text of one essay (paragraphs preserved)	one page
`search <q>`	Find essays by keyword — title only	index page
`sync`	Crawl every essay into a local cache	whole corpus
`topic <q>`	Full-text search of essay bodies, ranked + snippets	cached corpus

Why opencli, not Printing Press? The Printing Press library (@mvanhorn/printing-press-library) only installs CLIs from its catalog — it has no authoring command, and paulgraham.com isn't in the catalog. opencli is the right tool for authoring a new site adapter.

02 · The corpus

What `sync` captured

One polite crawl (bounded concurrency, per-essay failure tolerance) mirrors the full archive locally so topic queries run instantly and offline.

231

essays cached

565K

total words

2,446

avg words / essay

fetch failures

Median essay is 1,527 words — well below the 2,446 average, because a handful of long pieces pull the mean up. Span: 2001–2026 (169 essays carry a parseable date).

Longest essays

What I Worked On

13,810

The Other Road Ahead

12,084

How to Do Great Work

11,822

How to Raise Money

10,679

How to Start a Startup

9,778

Shortest cached: Why Twitter is a Big Deal (147w), Charisma / Power (121w), Lisp for Web-Based Applications (58w).

03 · Publishing cadence

Essays per year

A steady builder: a 2007–2009 startup-advice peak, a quieter mid-2010s, and a strong 2020–2021 resurgence.

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2019

2020

2021

2022

2023

2024

2025

2026

04 · Theme analysis

What PG writes about

Essays containing each term (full-text, via topic), with the single essay that uses it most.

Term	Essays	Densest essay	Hits
startup	143	How to Fund a Startup	108
ideas	140	How to Get Startup Ideas	74
money	119	How to Raise Money	94
founders	108	How to Fund a Startup	55
writing	106	The Best Essay	35
users	82	The Other Road Ahead	74
growth	47	Startup = Growth	60
wealth	39	How to Make Wealth	70
taste	31	How Art Can Be Good	23
empathy	3	Hackers and Painters	11

The empathy surprise. "Empathy" — the moral pivot of How to Earn a Billion Dollars ("the key is not exploitation but empathy") — appears in just 3 of 231 essays. It's a rare word for PG, which makes its load-bearing use in the billion essay stand out. This is exactly the kind of finding search (titles only) could never surface — it needs full-text topic.

05 · Using it

Typical sessions

# one-time: build the local corpus
$ opencli paulgraham sync
  essays: 231 · words: 565030 · failed: 0

# find essays ABOUT a concept (body text, ranked)
$ opencli paulgraham topic "compound growth"
  Superlinear Returns        hits 28
  Do Things that Don't Scale hits 9
  How to Do Great Work       hits 7

# read one, clean, as markdown — feed to an LLM
$ opencli paulgraham read earn -f md

# export the whole index to a spreadsheet
$ opencli paulgraham essays --limit 0 -f csv > pg.csv

Clean tokens, not HTML — the parser strips paulgraham.com's Yahoo-store markup; -f json/-f md gives an LLM the essay, not 19KB of tags.
Read-only & keyless — safe to script, cron, or chain (e.g. read → summarizer → digest).
title vs. body — search is one fast HTTP call over titles; topic searches full text after a one-time sync.

06 · How it's built

Architecture & the hard-won bits

Body extraction — every essay's text sits in paulgraham.com's decades-stable <font face="verdana"> block, ending at the Yahoo/Turbify footer <script>. The parser slices between them and converts <br><br> → blank lines so paragraphs survive.
Hardened fetch — low-level node:https with family:4 + transient retry (global fetch stalls on broken-IPv6 networks).
Polite crawl — mapLimit runs N fetches at a time (default 6); a single essay failing is counted, not fatal.
Cache once, query free — sync writes ~/.opencli/cache/paulgraham/corpus.json; topic is then a pure in-memory scan, which is why ranking + snippets are cheap.
Gotcha — opencli appends an "Update available" notice to output; parse stdout only when consuming JSON.

Generated by VE LAB from a live opencli paulgraham sync · corpus snapshot 2026-06-20 · source paulgraham.com. Counts reflect cached essays at snapshot time.

Five commands over a static essay archive

What sync captured

Essays per year

What PG writes about

Typical sessions

Architecture & the hard-won bits

What `sync` captured