Model card

A model card is a short document that accompanies a trained machine learning model to communicate its intended uses, performance characteristics, limitations, and ethical considerations. Model cards are intended to enable more informed decisions about deploying, reusing, or building upon AI systems, and to support transparency and accountability in machine learning development.

The concept was introduced by Mitchell et al. (2019) at Google, who drew on the analogy of nutrition labels and proposed standardised performance reporting disaggregated by demographic subgroups. Model cards have since been adopted across the machine learning community, are recommended by the National Institute of Standards and Technology (NIST) AI Risk Management Framework, and are referenced in the European Union's AI Act.

Background

Prior to the introduction of model cards, trained machine learning models were typically distributed with minimal documentation, making it difficult for users to understand how a model would perform in their specific context, on which populations it had been evaluated, or what risks it posed in deployment.

Mitchell et al. (2019) drew explicit inspiration from Datasheets for Datasets, a parallel proposal by Gebru et al. (2018) for standardised documentation of training datasets.^[1] The nutrition label analogy recurs frequently in subsequent literature: as nutrition labels communicate the composition of packaged food to consumers who did not prepare it, model cards aim to communicate the properties of trained models to users who did not train them.

Original proposal

The model card concept was introduced in:

Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). "Model Cards for Model Reporting." Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT 2019), pp. 220–229. ACM.^[2]

All nine authors were affiliated with Google or Google Brain at the time of publication. The paper proposed that model cards should present performance metrics broken down by demographic factors such as age, gender, race, and skin type, enabling users to identify disparities that aggregate metrics might conceal. The authors acknowledged key limitations of the format: model cards rely on the integrity of the creating organisation, are flexible enough to be applied inconsistently, and are not a substitute for external auditing.

Standard contents

Model cards do not follow a single mandated format, but the original proposal and subsequent implementations converge on a common set of sections:

Model details: Name, version, architecture type, developers, release date, and licence.

Intended use: Primary use cases and intended user populations; explicit statement of out-of-scope uses.

Training data: Sources, preprocessing steps, and statistical characteristics of the training corpus.

Evaluation data: Datasets used for benchmarking, and how they relate to expected deployment conditions.

Performance metrics: Aggregate and disaggregated accuracy, precision, recall, or task-specific metrics across demographic and intersectional subgroups.

Limitations: Known failure modes, edge cases, and conditions under which the model should not be used.

Ethical considerations: Privacy implications, potential for misuse, fairness analysis, and mitigation measures taken.

Caveats and recommendations: Deployment warnings, monitoring advice, and guidance for downstream developers.

Later practice, particularly in large language model documentation, has added sections covering environmental impact (compute used and estimated carbon emissions) and safety evaluations.

Adoption

Google

Mitchell et al. published model cards for several Google Cloud Vision API components, including face detection and object recognition models, at modelcards.withgoogle.com — among the first publicly available commercial model cards. In 2020, Google released the open-source Model Card Toolkit (MCT), integrated with TensorFlow Extended (TFX), to automate card generation from training and evaluation artefacts.^[3]

Hugging Face

Hugging Face adopted model cards as the standard documentation format for its Model Hub, stored as a README.md file at the root of each model repository. YAML front matter at the top of the card feeds structured metadata into Hub search and filtering. A 2022 landscape analysis of 74,970 model repositories found that 44.2% included a model card, but those models accounted for 90.5% of total download traffic, indicating that more widely used models are disproportionately likely to be documented.^[4]

Anthropic and OpenAI

Anthropic publishes system cards for its Claude models documenting safety evaluations conducted under its Responsible Scaling Policy, AI Safety Level determinations, and assessments of potential for misuse in chemical, biological, radiological, and nuclear (CBRN) domains. OpenAI uses the same terminology for its GPT-4 and o-series releases.

Regulatory context

The NIST AI Risk Management Framework (AI RMF 1.0, January 2023) explicitly recommends "model cards or similar documentation standards for every major model" as part of responsible AI governance.^[6]

The European Union's AI Act (Regulation 2024/1689, in force August 2024) requires that high-risk AI systems be accompanied by documentation specifying capabilities, limitations, and performance across affected population groups — requirements functionally equivalent to a model card.^[7]

Criticisms and limitations

Empirical studies of model cards in practice have documented several systematic shortcomings:

Incompleteness: Sections covering environmental impact, limitations, and evaluation methodology have consistently low fill-out rates. A 2024 analysis of 32,111 Hugging Face model cards found that environmental impact and limitations were among the least frequently completed fields.^[8]

Tendency to downplay limitations: Authors tend to emphasise successes and minimise weaknesses; no peer-review mechanism exists for model card claims.

Transparency washing: Incomplete or selectively disclosed information may satisfy formal documentation requirements while providing little practical guidance.

Inaccessibility: Technical language in model cards often renders them minimally useful for non-expert users.

No enforcement: Mitchell et al. acknowledged that the format "relies on the integrity of the creating organisation." External auditing is not required and rarely occurs.

Related concepts

Datasheets for Datasets — the parallel documentation standard for training datasets that directly inspired model cards.^[9]
Dataset Nutrition Label — a companion proposal focused on dataset quality metrics.
AI Factsheets (IBM) — a related documentation framework.
System card — terminology used by Anthropic and OpenAI for safety-focused model documentation, often more detailed than a model card.

References

^ Gebru, Timnit; Morgenstern, Jamie; Vecchione, Briana; Vaughan, Jennifer Wortman; Wallach, Hanna; Daumé III, Hal; Crawford, Kate (2021). "Datasheets for Datasets". Communications of the ACM. 64 (12): 86–92. doi:10.1145/3458723.
^ Mitchell, Margaret; Wu, Simone; Zaldivar, Andrew; Barnes, Parker; Vasserman, Lucy; Hutchinson, Ben; Spitzer, Elena; Raji, Inioluwa Deborah; Gebru, Timnit (2019). "Model Cards for Model Reporting". Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM. pp. 220–229. doi:10.1145/3287560.3287596.
^ "Introducing the Model Card Toolkit for Easier Model Transparency Reporting". Google Research. 2020.
^ "Model Card Landscape Analysis". Hugging Face.
^ "Meta Llama 3 Model Card". Meta.
^ National Institute of Standards and Technology (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0) (PDF) (Report). NIST.
^ "EU AI Act Article 13: Transparency and provision of information to deployers". European Parliament and Council.
^ Liang, Weixin (2024). "Mapping the Increasing Use of LLMs in Scientific Papers". Nature Machine Intelligence. doi:10.1038/s42256-024-00857-z.
^ Gebru, Timnit; Morgenstern, Jamie; Vecchione, Briana; Vaughan, Jennifer Wortman; Wallach, Hanna; Daumé III, Hal; Crawford, Kate (2021). "Datasheets for Datasets". Communications of the ACM. 64 (12): 86–92. doi:10.1145/3458723.

[1] Gebru, Timnit; Morgenstern, Jamie; Vecchione, Briana; Vaughan, Jennifer Wortman; Wallach, Hanna; Daumé III, Hal; Crawford, Kate (2021). "Datasheets for Datasets". Communications of the ACM. 64 (12): 86–92. doi:10.1145/3458723.

[mitchell2019-2] Mitchell, Margaret; Wu, Simone; Zaldivar, Andrew; Barnes, Parker; Vasserman, Lucy; Hutchinson, Ben; Spitzer, Elena; Raji, Inioluwa Deborah; Gebru, Timnit (2019). "Model Cards for Model Reporting". Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM. pp. 220–229. doi:10.1145/3287560.3287596.

[3] "Introducing the Model Card Toolkit for Easier Model Transparency Reporting". Google Research. 2020.

[4] "Model Card Landscape Analysis". Hugging Face.

[5] "Meta Llama 3 Model Card". Meta.

[6] National Institute of Standards and Technology (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0) (PDF) (Report). NIST.

[7] "EU AI Act Article 13: Transparency and provision of information to deployers". European Parliament and Council.

[8] Liang, Weixin (2024). "Mapping the Increasing Use of LLMs in Scientific Papers". Nature Machine Intelligence. doi:10.1038/s42256-024-00857-z.

[9] Gebru, Timnit; Morgenstern, Jamie; Vecchione, Briana; Vaughan, Jennifer Wortman; Wallach, Hanna; Daumé III, Hal; Crawford, Kate (2021). "Datasheets for Datasets". Communications of the ACM. 64 (12): 86–92. doi:10.1145/3458723.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]