Open weights · vs · open data · vs · everything
Most "open" models are open weight — you get the trained brain, not the kitchen. A small set of labs release the whole model flow: data, code, checkpoints, and logs. Tap a name for policy & roadmap. Star the ones you want to track — your shortlist saves automatically.
scroll the grid sideways →
| Lab / model | Wtsweights | Datafull corpus | Codetraining | Ckptintermed. | Rcpelogs | Licpermissive | Trcedata trace | Openness/ 7 |
|---|
Reading the grid
The organization and its flagship fully-documented release — the specific model the row is scored on.
The trained parameters — the actual numbers that are the model's "brain." Released = you can download, run, and fine-tune it yourself.
Whether the complete training dataset is published, not merely described. This is the rarest column — and the true line between "open weight" and "open source."
The scripts and framework used to train the model, so you can reproduce the run or modify it — not just the code to run the finished model.
Saved snapshots from points during training, not only the final model. They let researchers study how abilities emerge and branch their own training from midway.
The configs, hyperparameters, data-mix ratios, and training logs — the build "diary" that explains how the model was actually made.
Whether the license (Apache 2.0 / MIT) allows free commercial use and redistribution, versus restrictive custom terms that limit who can use it and how.
A tool that maps a model's output back to the specific training documents that influenced it (e.g. Ai2's OlmoTrace). Rare even among fully-open models.
How many of the seven dimensions a model meets — a full release counts as 1, a partial as ½. The gauge fills green for released, amber for partial, grey for withheld.
Plain-language terms
Common questions
Data column is what separates the two.Lic = ✓) — Apache 2.0 / MIT allow commercial use and redistribution. Be careful with the contrast rows: Meta's Llama uses a restrictive community license, and Google's Gemma has custom terms. Always confirm the license on the specific model card, since terms can differ between a lab's models. This isn't legal advice — check the actual license before shipping.How to read it. Each row is a lab's flagship fully-documented release; "permissive license" means Apache 2.0 / MIT-style terms that allow commercial reuse. The open-weight-only rows (Meta, Google, DeepSeek/Qwen) are included for contrast — note how a model can carry a permissive license while still withholding its data.
Caveats. Cells reflect each lab's posture as of early–mid 2026 and can shift per model; the research-scale rows (Apertus, Marin, Instella, Zyphra) are coarser and worth verifying on each Hugging Face card before you rely on them. Only Ai2 (Olmo-MoE, 2026) and NVIDIA (Nemotron 4) have given concrete public roadmaps — the rest list direction, not dated commitments.