"We decide when and how to use AI tools in our work." —

Producing more but understanding less: The risks of AI for scientific research

A psychologist and an anthropologist ponder the epistemic risks AI could pose for science.

3d illustration of brain with wires
Enlarge / Current concerns about AI tend to focus on its obvious errors. But psychologist Molly Crockett and anthropologist Lisa Messeri argue that AI also poses potential long-term epistemic risks to the practice of science.
Just_Super/E+ via Getty

Last month, we witnessed the viral sensation of several egregiously bad AI-generated figures published in a peer-reviewed article in Frontiers, a reputable scientific journal. Scientists on social media expressed equal parts shock and ridicule at the images, one of which featured a rat with grotesquely large and bizarre genitals.

As Ars Senior Health Reporter Beth Mole reported, looking closer only revealed more flaws, including the labels "dissilced," "Stemm cells," "iollotte sserotgomar," and "dck." Figure 2 was less graphic but equally mangled, rife with nonsense text and baffling images. Ditto for Figure 3, a collage of small circular images densely annotated with gibberish.

The paper has since been retracted, but that eye-popping rat penis image will remain indelibly imprinted on our collective consciousness. The incident reinforces a growing concern that the increasing use of AI will make published scientific research less trustworthy, even as it increases productivity. While the proliferation of errors is a valid concern, especially in the early days of AI tools like ChatGPT, two researchers argue in a new perspective published in the journal Nature that AI also poses potential long-term epistemic risks to the practice of science.

Molly Crockett is a psychologist at Princeton University who routinely collaborates with researchers from other disciplines in her research into how people learn and make decisions in social situations. Her co-author, Lisa Messeri, is an anthropologist at Yale University whose research focuses on science and technology studies (STS), analyzing the norms and consequences of scientific and technological communities as they forge new fields of knowledge and invention—like AI.

The original impetus for their new paper was a 2019 study published in the Proceedings of the National Academy of Sciences claiming that researchers could use machine learning to predict the replicability of studies based only on an analysis of their texts. Crockett and Messeri co-wrote a letter to the editor disputing that claim, but shortly thereafter, several more studies appeared, claiming that large language models could replace humans in psychological research. The pair realized this was a much bigger issue and decided to work together on an in-depth analysis of how scientists propose to use AI tools throughout the academic pipeline.

They came up with four categories of visions for AI in science. The first is AI as Oracle, in which such tools can help researchers search, evaluate, and summarize the vast scientific literature, as well as generate novel hypotheses. The second is AI as Surrogate, in which AI tools generate surrogate data points, perhaps even replacing human subjects. The third is AI as Quant. In the age of big data, AI tools can overcome the limits of human intellect by analyzing vast and complex datasets. Finally, there is AI as Arbiter, relying on such tools to more efficiently evaluate the scientific merit and replicability of submitted papers, as well as assess funding proposals.

Each category brings undeniable benefits in the form of increased productivity—but also certain risks. Crockett and Messeri particularly caution against three distinct "illusions of understanding" that may arise from over-reliance on AI tools, which can exploit our cognitive limitations. For instance, a scientist may use an AI tool to model a given phenomenon and believe they, therefore, understand that phenomenon more than they actually do (an illusion of explanatory depth). Or a team might think they are exploring all testable hypotheses when they are only really exploring those hypotheses that are testable using AI (an illusion of exploratory breadth). Finally, there is the illusion of objectivity: the belief that AI tools are truly objective and do not have biases or a point of view, unlike humans.

This error-ridden AI-generated image, published in the journal Frontiers, is supposed to show spermatogonial stem cells, isolated, purified, and cultured from rat testes.
Enlarge / This error-ridden AI-generated image, published in the journal Frontiers, is supposed to show spermatogonial stem cells, isolated, purified, and cultured from rat testes.

The paper's tagline is "producing more while understanding less," and that is the central message the pair hopes to convey. "The goal of scientific knowledge is to understand the world and all of its complexity, diversity, and expansiveness," Messeri told Ars. "Our concern is that even though we might be writing more and more papers, because they are constrained by what AI can and can't do, in the end, we're really only asking questions and producing a lot of papers that are within AI's capabilities."

Neither Crockett nor Messeri are opposed to any use of AI tools by scientists. "It's genuinely useful in my research, and I expect to continue using it in my research," Crockett told Ars. Rather, they take a more agnostic approach. "It's not for me and Molly to say, 'This is what AI ought or ought not to be,'" Messeri said. "Instead, we're making observations of how AI is currently being positioned and then considering the realm of conversation we ought to have about the associated risks."

Ars spoke at length with Crockett and Messeri to learn more.

jump to endpage 1 of 5

Ars Technica: You're taking more of an epistemological approach to AI tools, particularly with regard to how scientists envision using them. Why? 

Molly Crockett: There's quite a lot of discussion right now about errors that AI makes and how those kinds of errors and inaccuracies are bad for science. We agree that they're very dangerous. Our paper is actually more concerned with the future of science, when all of those errors have been engineered away and the AI tools work exactly as their creators intend them to. Do we still get ourselves into trouble? Lisa and I think that we do. Usually, there's hype that AI can do a particular thing, and the critique is, no, it can't do that thing. Ours is a different argument. People say, "Look at all these things that AI could do." We respond with, "Great, let's imagine they can do those things. Is that still the world we want?"

I'm a scientist, and I have used AI in my work, and I'm really excited about its potential. At the same time, over the last several years, these tools have become more sophisticated and, in many cases, less interpretable to human users. So, I've been growing more and more uneasy about what the widespread adoption of these tools bodes for the future of science. Lisa's scholarship and the broader world of Science and Technology Studies (STS) offers helpful frameworks for scientists to talk about what makes us nervous about this moment that we're in. Neither of our respective fields alone can really offer the insight that I think we need in this moment.

Lisa Messeri: STS is a vibrant, small field that began in the 1960s and 1970s out of concern for the future of science. Some of the earliest work was in the shadow of World War II and nuclear deterrence. It was this moment when scientists recognized that they had social roles in the world, that they weren't just sitting in an ivory tower but that the scientific knowledge they had created—in this case, a nuclear bomb—actually affected millions of people's lives. So a group of interdisciplinary scholars coming from the sciences, engineering, anthropology, sociology, philosophy, and history decided that in order to aid in this project of understanding science's impact on the world, we needed a set of tools and methods that can answer these questions.

In order to do so, we need to make a stunning claim: Science is a human process and a human practice. This has historically been misinterpreted. The discipline of STS has always walked a very tricky line because its goal is not to discredit science. As soon as you say that science is human, some people think it's a claim that science therefore isn't authoritative. That's not at all what STS is trying to do. It's saying instead that accepting the reality of the fact that humans create science in social and cultural institutions might be the path toward a better, more robust, more authoritative, and more trustworthy science. What robust science do we get if we take that radically different way of thinking about science in hand?

An atomic mushroom cloud rises over Nagasaki, Japan, after the detonation of the Fat Man nuclear bomb on August 9, 1945.
Enlarge / An atomic mushroom cloud rises over Nagasaki, Japan, after the detonation of the Fat Man nuclear bomb on August 9, 1945.
Public domain

Ars Technica: I'd like to walk through each category in your taxonomy of how AI might be used by scientists. What are the good things that could come from each one, and what are the associated risks? Let's start with AI as Oracle. 

Lisa Messeri: AI as Oracle is any application in which you take a huge corpus of knowledge and from it get a set of discrete concrete answers or proposals. It's a response to the overwhelming production of scientific knowledge: tools that can objectively and efficiently search, evaluate, and summarize scientific literature and also generate new hypotheses. There's an infinite amount of knowledge to be absorbed, and it only grows every day. Wouldn't it be nice if you could take an AI tool, train it on the existing corpus of published scientific literature, and then ask it to summarize everything, produce your literature review, identify what questions remain to be asked, or take all of the known findings in one subfield and extrapolate where these lead to?

The vision is seductive. It saves time, it's efficient, it will make us more productive. The main risk is that it filters all these diverse questions through one narrow passage point, which is the AI tool. There are studies where teams were given diverse datasets or given a diverse set of literature and asked to determine what is significant about this literature. Depending on who you are, what questions you're asking, and what research questions you're interested in, you'll have a different answer. That, in turn, raises different questions you might ask of the literature. If you don't have a single passage point through which you're filtering existing literature, you have a much wider base that you're building in terms of potential future projects. If everything starts going through the same oracle to say what is or isn't in the literature, you automatically have a narrowing at the bottom.

Many AI tools reawaken the myth that there can be an objective standpoint-free science in the form of the "objective" AI. But these AI tools don't come from nowhere. They're not a view from nowhere. They're a view from a very particular somewhere. And that somewhere embeds the standpoint of those who create these AI tools: a very narrow set of disciplinary expertise—computer scientists, machine learning experts. Any knowledge we ask from these tools is reinforcing that single standpoint, but it's pretending as if that standpoint doesn't exist.

jump to endpage 2 of 5

Molly Crockett: I think every scientist has had the experience of being in a journal club where everyone has read the same paper and somebody else says something that you hadn't thought of. "Oh yeah, I totally missed that. Oh, that's a cool idea; that didn't occur to me." That's the power of doing science in a community with diverse ways of thinking about the world. My worry about AI as Oracle is that we lose that diversity that makes science stronger. There are now studies showing that on many different definitions of diversity, more diverse teams produce science that is more robust, that is more impactful, that is more innovative. This makes the retreat back to the myth of the singular objective knower all the more troubling.

That said, I have occasionally used AI-assisted literature search tools that turned up papers relevant to my project that I hadn't found. That was genuinely useful. AI as Oracle is problematic when it's used to narrow or reduce, but we hope people will think about use cases that might go in the opposite direction. The vision is more, not less—broader, not narrower.

We're not cherry-picking obscure citations. This is what top scientists are saying about the future of AI.

Ars Technica: Let's move on to AI as Surrogate.

Molly Crockett: AI as Surrogate is a vision that primarily came up in the behavioral sciences. We first came across this while reading a paper—published in a top journal—about whether large language models could replace human participants. My immediate reaction was just, 'What the fuck?' The visions for AI that we document are all from Nature, Science, PNAS, etc. It is the top publications that are giving a platform to these visions. So we're not cherry-picking obscure citations here and there in the literature. This is what the top scientists are saying about the future of AI.

AI as Surrogate is also appealing because it promises to make scientists more productive, and we are all operating within institutions that demand a certain level of productivity. In my field, we have moved from collecting data from human participants exclusively in person to large-scale online data collection. It's faster and cheaper, of course. It also helps reach a more diverse body of participants for studies than undergraduates and psychology classes, which was the previous status quo.

The move toward online data collection has had many advantages for behavioral science. It has also helped prime the pump for this idea that we could replace human participants with AI tools for certain research tasks. When you collect data from people online, you lose the face-to-face interaction. It is a very dehumanizing context. The humans who are providing the data for your study are now exclusively showing up as data points in your spreadsheet. So this move toward AI as Surrogate for human subjects is part of a longer historical trajectory that has been happening for some time. But we discovered that scientists in other fields beyond behavioral science were also envisioning AI models to replace data that were missing or difficult to obtain.

Lisa Messeri: My initial assumption was that AI as Surrogate was going to be primarily about the human sciences. But in, say, a field like physics, there are some phenomena with very few data points. So there is also an attractive desire to create more data points, to create a richer dataset so that you could present more robust findings. Creating synthetic datasets even within the physical sciences conforms to what we value as robust and important science, which today is primarily big data findings. The big data revolution happened about a decade before AI. Now that we have big data, what do we do with all this data? We need tools and AI happens to be an excellent tool for solving the problem that arose from science highly valuing data and quantitative understanding over other forms of knowing.

Illusions of understanding in AI-driven scientific research
Enlarge / Illusions of understanding in AI-driven scientific research
L. Messeri and M.J. Crockett

Molly Crockett: I think a lot of behavioral scientists share an intuition that because large language models are trained on the entire Internet, they represent the most diverse and comprehensive set of data about humans that exists. That's just not correct. The Internet is really not representative of human language in many ways. As of 2021, almost 3 billion people had never used the Internet before, much less written something on the Internet. The Internet and the text that is on the Internet dramatically over-represents people who are young and the viewpoints that are mainstream or more respected or acceptable in dominant cultures. So even though there's a lot of data on the Internet that has been used to train language models, it's not representative of all humans by a long shot.

Ars Technica: That leads nicely into AI as Quant. 

Lisa Messeri: AI as Oracle is important at the beginning of the research process when you're generating hypotheses, surveying the literature, et cetera. AI as Surrogate is at the moment of data collection: How do you get a robust and large sample dataset that you can then run your analysis on? AI as Quant is the moment of running the analysis on these huge datasets. Again, we are not saying that science should never use AI and could never benefit from AI. Instead, we're asking the community to be thoughtful about under what circumstances we use AI and understanding that these different visions have different potentialities and different risks associated with them.

The caution we offer for AI as Quant is to think about the decisions that have been made that get embedded in these quantitative tools of analysis. It's very easy to quickly lose track of what these decisions were in training the algorithm, in the dataset used to train the tool. There's all these little decisions that go into creating a tool. This is not exclusive to AI. But there is a danger of the creation of monocultures. If a tool proves to be very successful in one circumstance and then gets wide adaptation, not only do you get everyone using the same tool for the same methodological problem, but you also increasingly lose sight of some decisions that may have gone into creating that tool.

jump to endpage 3 of 5

Molly Crockett: AI as Quant is most relevant to the illusion of explanatory depth. As a scientist who uses machine learning in my work, when your tool works really well, it's easy to lose sight of what it is about the tool that you don't understand. A lot of the most exciting AI quant tools are not functionally interpretable. The algorithm is doing stuff and it works really well and you have no idea why, because there are a billion parameters that are all interacting in some way that is not transparent even to the people who are building the models. Even if it were transparent, it's beyond the capabilities of what the human mind can grasp.

If your goal is to predict a particular outcome and your model works really well in predicting that outcome, great. But there are many documented cases of confusing prediction with explanation or understanding. There is a long philosophical debate about the complementary roles of prediction and explanation in science. Both are important, but we really want to highlight how essential it is that scientists not mistake a good predictive tool for a deep understanding of whatever it is that you're studying. This is especially important if you're going to be using multiple tools at once.

One concrete example: My team built a machine learning algorithm to predict moral outrage expressions on Twitter. It works really well. It does as well as showing a tweet to a human and asking, "Is this person outraged or not?" In order to train that algorithm, we showed a bunch of tweets to human participants and asked them to say whether this tweet contained outrage. Because we have that ground truth of human perception, we can be reasonably certain that our tool is doing what we want it to do.

Let's say we made the move to AI as Surrogate and used a language model to label tweets with outrage or not and then used those machine-generated labels to train another algorithm to predict how outrage evolves over time or something. Once you have multiple models interacting which are not interpretable and might be making errors in a systematic way that you are not able to recognize, that's where we start to get into dangerous territory. Legal scholar Jonathan Zittrain has called this concept "intellectual debt": As soon as you have multiple systems interacting in a complex environment, you can very quickly get to a point where there are errors propagating through the system, but you don't know where they originate because each individual system is not interpretable to the scientists.

Lisa Messeri (left) and Molly Crockett (right).
Enlarge / Lisa Messeri (left) and Molly Crockett (right).
Yale University/Princeton University

Ars Technica: Finally, we have AI as Arbiter. 

Lisa Messeri: All of these visions cascade on one another. You have AI as Oracle summarizing the literature. You have AI as Surrogate producing large datasets that otherwise might not be as efficiently and quickly produced. You have AI as Quant efficiently analyzing these large datasets. You have AI as Scribe, which we don't even talk about in this paper, helping us write the papers. We are submitting papers to journals at a much larger rate. Journal editors could use an AI tool to more efficiently sift through this deluge of papers, even if it's just on the first round of desk rejects or the first round of funding proposals. AI as Arbiter is this idea that at the end of the research pipeline, perhaps there's another tool that could help us adjudicate or be an arbiter for peer review or grant reviewing.

I have seen more caution about whether this would be a helpful tool or not to scientific knowledge. Different journals have stated different policies in terms of whether or not AI can be used as part of peer review. On the one hand, there's skepticism about whether AI can actually do this job. On the other, there are people proposing that this could be the ideal solution because the AI as Arbiter is objective. Instead of having messy humans decide what papers should or shouldn't get published, we have an AI tool that can take away human judgment.

That's the illusion of objectivity: this idea that an AI, which is created by humans trained on human data, is somehow above human bias and foibles and mistakes. It forgets the fact that in the end, any AI tool is still only representing the data it was trained on and the creators that made it in the first place. At least when we have humans in the loop, we know that there is bias, and we have processes and systems that can account for that bias.

Molly Crockett: I have three reviews that are overdue right now, so I'd really love it if AI as Arbiter could be real. But I am not deluding myself that this would be a good world to live in. One problem that we're already seeing discussed is the application of AI tools for much more consequential decisions than whether my paper gets accepted to a journal or not—decisions about whether to give somebody health insurance and so forth.

Whenever there is a decision that is distributing scarce resources, if you are denied access to a resource, you have a right to an explanation for why that was denied. AI tools that are not interpretable cannot provide that explanation. That might be part of the reason why there is more suspicion about AI as Arbiter relative to the other visions because every scientist has had the experience of feeling like their paper has been unfairly rejected from a journal, especially desk rejections where no explanation is given. I don't think anyone wants to live in that world.

jump to endpage 4 of 5

Ars Technica: Your central point is that AI could usher in a paradoxical future where there is much more production of scientific work yet less understanding and real knowledge—and the latter is the ultimate goal of science.

Lisa Messeri: Exactly. There is only a narrow slice of the world that we can understand through AI tools and methods, and we are concerned that an over-reliance and over-investment in these approaches will actually narrow what we know about the world.

Molly Crockett: This is the illusion of exploratory breadth. It's something I've already seen happening in my field with the move toward online research. This was a necessary outgrowth of the replication crisis and recognizing that we need to recruit larger samples for our studies. Well, how are we going to do that? One way to do that is to do research online where we can instantly and affordably get access to populations of hundreds or thousands of participants for our research.

But not every question about human behavior is easily asked of an online sample. What has happened in the published literature is an over-representation of that type of research: self-reported survey questions that can be asked in an online context. Everyone who's moved toward that as a method has done so to the exclusion and the marginalization of the types of research that just don't work in that paradigm.

If we don't want to end up in a future where we produce more but understand less, scientists need the insights of scholars in the humanities.

Lisa Messeri: For example, when fMRI became a tool accessible to a wide range of researchers, suddenly there was a lot of work wedged toward questions that fMRI was suited to ask and answer, excluding questions that fMRI is not suited to ask and answer, which are abundant. One could fairly ask, 'Isn't AI just another tool that narrows the scope of the work being done?' There are two aspects where AI differs. One is the ubiquity of it. A physicist couldn't really use fMRI in a productive and meaningful way. But AI is being imagined as a tool that runs the gamut from the humanist to the qualitative social scientist to the quantitative social scientist to the quantitative scientist. It runs across the research pipeline. That wide breadth is stunning, and AI therefore poses a much higher risk.

We use the term monoculture, drawing from the agricultural analogy. Part of monocultural farming is that it produces more. You can get more bananas, for example, but it also is more prone to single-point failure. As soon as there is one disease in a crop, everything gets wiped out and you don't have the robustness of a poly-culture. The other thing that makes AI unique is the tendency to anthropomorphize AI and how that leads to communities of knowledge.

Molly Crockett: This helps feed into a lot of the hype around AI. The more we think about AI as human-like, the more we extend hopes and dreams and responsibilities to those tools that they don't necessarily deserve. Historically, the research on this question has been around illusions of understanding that arise in communities of knowledge composed of humans. But increasingly, there's a tension to the idea that this can also happen with machines.

I think scientists are very appropriately cautious about what these tools bode for the future. At the same time, scientists lack training in the theoretical and methodological approaches from the humanities that are going to help us move through this. If we don't want to end up in a future where we produce more but understand less, we scientists need the insights and collaboration of scholars in the humanities. This is going to require us to rethink the hierarchies that are baked into the academy and to develop a healthier respect for the scholarship that is happening in the humanities. We need it. It can help us.

So much of the discourse around AI pushes this message of inevitability: that AI is here, it is not going away, it's inevitable that this is going to bring us to a bright future and solve all our problems. That message is coming from people who stand to make a lot of money from AI and its uptake all across society, including science. But we decide when and how we are going to use AI tools in our work. This is not inevitable. We just need to be really careful that these tools serve us. We're not saying that they can't. We're just adamant that we need to educate ourselves in the ways that AI introduces epistemic risk to the production of scientific knowledge. Scientists working alone are not going to engineer our way out of those risks.

Nature, 2024. DOI: 10.1038/s41586-024-07146-0  (About DOIs).