Model card
This article may meet Wikipedia's criteria for speedy deletion because it exhibits one or more signs which indicate that the page could only have been generated by a large language model (LLM) without reasonable human review. Requester's additional rationale: special:diff/1342106533. See CSD G15. If this article does not meet the criteria for speedy deletion, or you intend to fix it, please remove this notice, but do not remove this notice from pages that you have created yourself. If you created this page and you disagree with the given reason for deletion, you can click the button below and leave a message explaining why you believe it should not be deleted. You can also visit the talk page to check if you have received a response to your message. Note that this article may be deleted at any time if it unquestionably meets the speedy deletion criteria, or if an explanation posted to the talk page is found to be insufficient. Note to page author: you have not edited the article talk page yet. If you wish to contest this speedy deletion, clicking the button above will allow you to leave a talk page message explaining why you think this article should not be deleted. If you have already posted to the talk page but this message is still showing up, try purging the page cache. This page was last edited by Sapphaline (contribs | logs) at 09:53, 7 March 2026 (UTC) (1 second ago) |
This article may incorporate text from a large language model. (March 2026) |
A model card is a short document that accompanies a trained machine learning model to communicate its intended uses, performance characteristics, limitations, and ethical considerations. Model cards are intended to enable more informed decisions about deploying, reusing, or building upon AI systems, and to support transparency and accountability in machine learning development.
The concept was introduced by Mitchell et al. (2019) at Google, who drew on the analogy of nutrition labels and proposed standardised performance reporting disaggregated by demographic subgroups. Model cards have since been adopted across the machine learning community, are recommended by the National Institute of Standards and Technology (NIST) AI Risk Management Framework, and are referenced in the European Union's AI Act.
Background
[edit]Prior to the introduction of model cards, trained machine learning models were typically distributed with minimal documentation, making it difficult for users to understand how a model would perform in their specific context, on which populations it had been evaluated, or what risks it posed in deployment.
Mitchell et al. (2019) drew explicit inspiration from Datasheets for Datasets, a parallel proposal by Gebru et al. (2018) for standardised documentation of training datasets.[1] The nutrition label analogy recurs frequently in subsequent literature: as nutrition labels communicate the composition of packaged food to consumers who did not prepare it, model cards aim to communicate the properties of trained models to users who did not train them.
Original proposal
[edit]The model card concept was introduced in:
- Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). "Model Cards for Model Reporting." Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT 2019), pp. 220–229. ACM.[2]
All nine authors were affiliated with Google or Google Brain at the time of publication. The paper proposed that model cards should present performance metrics broken down by demographic factors such as age, gender, race, and skin type, enabling users to identify disparities that aggregate metrics might conceal. The authors acknowledged key limitations of the format: model cards rely on the integrity of the creating organisation, are flexible enough to be applied inconsistently, and are not a substitute for external auditing.
Standard contents
[edit]Model cards do not follow a single mandated format, but the original proposal and subsequent implementations converge on a common set of sections:
- Model details
- Name, version, architecture type, developers, release date, and licence.
- Intended use
- Primary use cases and intended user populations; explicit statement of out-of-scope uses.
- Training data
- Sources, preprocessing steps, and statistical characteristics of the training corpus.
- Evaluation data
- Datasets used for benchmarking, and how they relate to expected deployment conditions.
- Performance metrics
- Aggregate and disaggregated accuracy, precision, recall, or task-specific metrics across demographic and intersectional subgroups.
- Limitations
- Known failure modes, edge cases, and conditions under which the model should not be used.
- Ethical considerations
- Privacy implications, potential for misuse, fairness analysis, and mitigation measures taken.
- Caveats and recommendations
- Deployment warnings, monitoring advice, and guidance for downstream developers.
Later practice, particularly in large language model documentation, has added sections covering environmental impact (compute used and estimated carbon emissions) and safety evaluations.
Adoption
[edit]Mitchell et al. published model cards for several Google Cloud Vision API components, including face detection and object recognition models, at modelcards.withgoogle.com — among the first publicly available commercial model cards. In 2020, Google released the open-source Model Card Toolkit (MCT), integrated with TensorFlow Extended (TFX), to automate card generation from training and evaluation artefacts.[3]
Hugging Face
[edit]Hugging Face adopted model cards as the standard documentation format for its Model Hub, stored as a README.md file at the root of each model repository. YAML front matter at the top of the card feeds structured metadata into Hub search and filtering. A 2022 landscape analysis of 74,970 model repositories found that 44.2% included a model card, but those models accounted for 90.5% of total download traffic, indicating that more widely used models are disproportionately likely to be documented.[4]
Meta
[edit]Meta accompanied its LLaMA 1 release (February 2023) with a model card and has published detailed cards for subsequent Llama releases. The Llama 3 model card, for example, discloses a training corpus of over 15 trillion tokens, estimated carbon emissions of 2,290 tCO₂eq (fully offset), red-teaming methodology, and safety benchmark results.[5]
Anthropic and OpenAI
[edit]Anthropic publishes system cards for its Claude models documenting safety evaluations conducted under its Responsible Scaling Policy, AI Safety Level determinations, and assessments of potential for misuse in chemical, biological, radiological, and nuclear (CBRN) domains. OpenAI uses the same terminology for its GPT-4 and o-series releases.
Regulatory context
[edit]The NIST AI Risk Management Framework (AI RMF 1.0, January 2023) explicitly recommends "model cards or similar documentation standards for every major model" as part of responsible AI governance.[6]
The European Union's AI Act (Regulation 2024/1689, in force August 2024) requires that high-risk AI systems be accompanied by documentation specifying capabilities, limitations, and performance across affected population groups — requirements functionally equivalent to a model card.[7]
Criticisms and limitations
[edit]Empirical studies of model cards in practice have documented several systematic shortcomings:
- Incompleteness: Sections covering environmental impact, limitations, and evaluation methodology have consistently low fill-out rates. A 2024 analysis of 32,111 Hugging Face model cards found that environmental impact and limitations were among the least frequently completed fields.[8]
- Tendency to downplay limitations: Authors tend to emphasise successes and minimise weaknesses; no peer-review mechanism exists for model card claims.
- Transparency washing: Incomplete or selectively disclosed information may satisfy formal documentation requirements while providing little practical guidance.
- Inaccessibility: Technical language in model cards often renders them minimally useful for non-expert users.
- No enforcement: Mitchell et al. acknowledged that the format "relies on the integrity of the creating organisation." External auditing is not required and rarely occurs.
Related concepts
[edit]- Datasheets for Datasets — the parallel documentation standard for training datasets that directly inspired model cards.[9]
- Dataset Nutrition Label — a companion proposal focused on dataset quality metrics.
- AI Factsheets (IBM) — a related documentation framework.
- System card — terminology used by Anthropic and OpenAI for safety-focused model documentation, often more detailed than a model card.
See also
[edit]- Datasheets for datasets
- Algorithmic bias
- Fairness (machine learning)
- AI safety
- AI Act
- Responsible AI
References
[edit]- ^ Gebru, Timnit; Morgenstern, Jamie; Vecchione, Briana; Vaughan, Jennifer Wortman; Wallach, Hanna; Daumé III, Hal; Crawford, Kate (2021). "Datasheets for Datasets". Communications of the ACM. 64 (12): 86–92. doi:10.1145/3458723.
- ^ Mitchell, Margaret; Wu, Simone; Zaldivar, Andrew; Barnes, Parker; Vasserman, Lucy; Hutchinson, Ben; Spitzer, Elena; Raji, Inioluwa Deborah; Gebru, Timnit (2019). "Model Cards for Model Reporting". Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM. pp. 220–229. doi:10.1145/3287560.3287596.
- ^ "Introducing the Model Card Toolkit for Easier Model Transparency Reporting". Google Research. 2020.
- ^ "Model Card Landscape Analysis". Hugging Face.
- ^ "Meta Llama 3 Model Card". Meta.
- ^ National Institute of Standards and Technology (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0) (PDF) (Report). NIST.
- ^ "EU AI Act Article 13: Transparency and provision of information to deployers". European Parliament and Council.
- ^ Liang, Weixin (2024). "Mapping the Increasing Use of LLMs in Scientific Papers". Nature Machine Intelligence. doi:10.1038/s42256-024-00857-z.
- ^ Gebru, Timnit; Morgenstern, Jamie; Vecchione, Briana; Vaughan, Jennifer Wortman; Wallach, Hanna; Daumé III, Hal; Crawford, Kate (2021). "Datasheets for Datasets". Communications of the ACM. 64 (12): 86–92. doi:10.1145/3458723.