Open weights · vs · open data · vs · everything

Open-Model Transparency Matrix — who actually releases the recipe

Most "open" models are open weight — you get the trained brain, not the kitchen. A small set of labs release the whole model flow: data, code, checkpoints, and logs. Tap a name for policy & roadmap. Star the ones you want to track — your shortlist saves automatically.

✓ released ~ partial – withheld

scroll the grid sideways →

Lab / model	Wtsweights	Datafull corpus	Codetraining	Ckptintermed.	Rcpelogs	Licpermissive	Trcedata trace	Openness/ 7

Reading the grid

What each column means

Lab

Lab / model

The organization and its flagship fully-documented release — the specific model the row is scored on.

Wts

Weightsweights

The trained parameters — the actual numbers that are the model's "brain." Released = you can download, run, and fine-tune it yourself.

Data

Datafull corpus

Whether the complete training dataset is published, not merely described. This is the rarest column — and the true line between "open weight" and "open source."

Code

Codetraining

The scripts and framework used to train the model, so you can reproduce the run or modify it — not just the code to run the finished model.

Ckpt

Checkpointsintermediate

Saved snapshots from points during training, not only the final model. They let researchers study how abilities emerge and branch their own training from midway.

Rcpe

Recipeslogs

The configs, hyperparameters, data-mix ratios, and training logs — the build "diary" that explains how the model was actually made.

Lic

Licensepermissive

Whether the license (Apache 2.0 / MIT) allows free commercial use and redistribution, versus restrictive custom terms that limit who can use it and how.

Trce

Tracingdata trace

A tool that maps a model's output back to the specific training documents that influenced it (e.g. Ai2's OlmoTrace). Rare even among fully-open models.

/ 7

Openness score

How many of the seven dimensions a model meets — a full release counts as 1, a partial as ½. The gauge fills green for released, amber for partial, grey for withheld.

Plain-language terms

Glossary

Open weight: The trained parameters are downloadable, so you can run and fine-tune the model — but the data, code, and recipe may stay private. Most "open" models (Llama, Qwen, DeepSeek, Gemma) are this.
Open source strict: Everything needed to rebuild the model from scratch is released: weights + training code + the dataset. Rare. OLMo, Pythia, and SmolLM qualify; most "open" models do not.
Weights: The billions of numbers a model learns during training. Together they are the model — running it just means doing math with these values.
Training data / corpus: The body of text (and code, math, etc.) a model learns from. A corpus is one curated collection of it, like Dolma or FineWeb.
Checkpoint: A snapshot of the weights saved at a moment in training. Releasing intermediate checkpoints lets others study or resume the process partway through.
Recipe: The full how-to: data mix, hyperparameters, training order, and logs. Without it, weights are a result you can't reproduce or fully understand.
Permissive license Apache / MIT: Licensing that lets anyone use, modify, and commercialize the model freely. Contrast with custom terms (e.g. Meta's Llama license) that add restrictions.
Model flow: Ai2's term for the entire pipeline from raw data to finished model — every dataset, checkpoint, and decision — rather than just the endpoint.
Token: The unit a model reads and predicts — roughly a word-piece. Training is measured in tokens (e.g. "11.2T tokens" = 11.2 trillion).
Pretraining: The first, biggest stage: the model learns general language and knowledge by predicting the next token across a huge corpus.
Post-training SFT / RLHF: Later tuning that makes a raw model helpful and safe — supervised fine-tuning on examples, then reinforcement learning from human (or AI) feedback.
Fine-tuning: Continuing to train an existing model on your own narrower data to specialize it — much cheaper than training from scratch, and a key reason open weights matter.
Distillation: Training a smaller/cheaper model on a bigger model's outputs. How a lot of open models absorb frontier ability — often against the source's terms of service.
Synthetic data: Training text generated by another model rather than scraped from the web. Increasingly common (e.g. much of NVIDIA's Nemotron data) and easier to release cleanly.
Mixture-of-Experts MoE: An efficient architecture that routes each input to a few specialized sub-networks instead of the whole model — more capacity at lower running cost.
Reasoning / "thinking" model: A model trained to generate explicit step-by-step working before its answer (like OpenAI's o-series), which lifts math, code, and logic performance.
Frontier model: A model at the current top of capability — today mostly closed (Claude, GPT, Gemini). "Frontier-scale" open work means matching that size/ambition.
Open-data tracing: Tooling that links an answer back to the exact training documents behind it — only possible when the data is open, which is why it's so rare.

Common questions

FAQ

Open weight gives you the finished model to run and fine-tune, but the dataset and full recipe stay private — you get the cake, not the kitchen. Open source (strictly) releases weights plus training code plus the data, so the model is reproducible from scratch. Almost everything people loosely call "open source" — Llama, Qwen, DeepSeek, Mistral — is really open weight. In this grid, the Data column is what separates the two.

For casual use it may not. It matters when you need to trust, audit, or deeply customize the model: checking for contamination or bias, satisfying compliance in regulated industries, removing specific data influence, or fine-tuning confidently because you know what's already inside. Open data also makes results reproducible — essential for research and for anyone who can't risk a model trained on unknown material.

Usually yes for the rows marked permissive (Lic = ✓) — Apache 2.0 / MIT allow commercial use and redistribution. Be careful with the contrast rows: Meta's Llama uses a restrictive community license, and Google's Gemma has custom terms. Always confirm the license on the specific model card, since terms can differ between a lab's models. This isn't legal advice — check the actual license before shipping.

Three reasons: copyright liability (publishing the dataset exposes exactly what was used, and lawsuits have made labs cautious), competitive secrecy (data curation is a big part of what makes a model good), and effort (cleaning and legally clearing a corpus for release is enormous work). Notably these same labs used to disclose data-mixture details before ~2022 and stopped — researchers have specifically cited litigation as the reason.

Close, and closing. By mid-2026 the best open-weight models trail the closed frontier by a small margin and roughly a few months in time — the smallest gap yet. The remaining edge for closed models shows up most on the hardest reasoning, long-horizon agentic tasks, and reliability. For a large share of everyday work, open models are already "good enough," which is why many teams route bulk work to them and reserve the frontier for the hardest steps.

No — the score measures transparency, not capability. Ai2's OLMo is 7/7 and genuinely strong for its size, but if you need the most raw power in an open package you might pick a lower-scoring open-weight model like a top DeepSeek or Qwen. Use the score to weigh auditability and reproducibility; weigh it against capability, size, and license for your actual use case.

DeepSeek, Alibaba's Qwen, Moonshot's Kimi, and Z.ai's GLM are currently the strongest open-weight models, often under permissive MIT/Apache licenses, and account for a large share of open-model usage. But they're weights-only — the training data isn't released — so they score high on license and capability yet low on data transparency. That's exactly the open-weight-vs-open-source split, shown in the bottom contrast rows.

Roughly: need full auditability / research / compliance → Ai2 OLMo or EleutherAI. Need a small, fully-open model for edge or fine-tuning → SmolLM3 or Instella. Need frontier-scale openness with data → NVIDIA Nemotron. Need maximum raw capability in an open package and can accept weights-only → a top DeepSeek/Qwen. Star your candidates above, then verify size, license, and the data column on each model's card.

How to read it. Each row is a lab's flagship fully-documented release; "permissive license" means Apache 2.0 / MIT-style terms that allow commercial reuse. The open-weight-only rows (Meta, Google, DeepSeek/Qwen) are included for contrast — note how a model can carry a permissive license while still withholding its data.

Caveats. Cells reflect each lab's posture as of early–mid 2026 and can shift per model; the research-scale rows (Apertus, Marin, Instella, Zyphra) are coarser and worth verifying on each Hugging Face card before you rely on them. Only Ai2 (Olmo-MoE, 2026) and NVIDIA (Nemotron 4) have given concrete public roadmaps — the rest list direction, not dated commitments.