1 Introduction
Generative AI (
GenAI) tools, defined as any
“end user tool [...] whose technical implementation includes a generative model based on deep learning”,
1 are the latest in a long line of technologies that raise questions about their impact on the quality of human thought, a line that includes writing (objected to by Socrates), printing (objected to by Trithemius), calculators (objected to by teachers of arithmetic), and the Internet.
Such consternation is not unfounded. Used improperly, technologies can and do result in the deterioration of cognitive faculties that ought to be preserved. As Bainbridge [
7] noted, a key irony of automation is that by mechanising routine tasks and leaving exception-handling to the human user, you deprive the user of the routine opportunities to practice their judgment and strengthen their cognitive musculature, leaving them atrophied and unprepared when the exceptions do arise.
In response, research has begun looking closely at how different activities are impacted by GenAI and the extent to which cognitive offloading [
8] occurs, and whether this may be an undesirable thing. Some work has focused, for instance, on studying the effects of GenAI use on memory (e.g., [
1,
106]) and on creativity (e.g., [
28,
100]). Moreover, design research has also been developing interventions that
improve the ability of people to think in certain ways (e.g., [
24]). We review these lines of work in Section
2.
In this paper, we focus on a higher-level concept that captures another aspect of thought considered desirable and worthy of preservation:
critical thinking (defined in Section
2). The effect of the use of GenAI tools on critical thinking, as a direct object of inquiry, has not yet been explored.
Moreover, we focus on critical thinking for
knowledge work (as conceptualised by Drucker [
30] and Kidd [
67]). Much research on the effect of GenAI on thinking skills is focused on educational settings, where concern for skill cultivation is most acute (e.g., the effect of GenAI code completion tools on programming and computer science education [
107]). As previously noted [
116,
119], critical thinking has been operationalised in detail in certain specific disciplines, such as academic history, clinical psychology, and nursing. But the ostensible shifts in critical thinking behaviours brought about by GenAI extend to a broad set of professions and knowledge workflows — GenAI tools are now widely used in knowledge work [
13] — and little is known about the critical thinking demands of these. We lack broad-based empirical examples of what
kinds of knowledge work activities are considered by professionals to require critical thinking.
Recent work has motivated the need for critical thinking support in AI-assisted knowledge work [
116,
119]. It is motivated primarily by the observation of the tendency of AI-assisted knowledge workflows to be subject to “mechanised convergence” [
114], i.e., that users with access to GenAI tools produce a less diverse set of outcomes for the same task, compared to those without. This tendency for convergence reflects a lack of personal, contextualised, critical and reflective judgment of AI output and thus can be interpreted as a deterioration of critical thinking.
However, we lack direct empirical evidence for an interpretation that posits a connection between mechanised convergence and critical thinking. Output diversity is a proxy for critical thinking, and a flawed one. For instance, users who reuse GenAI output without editing it may have nonetheless performed a critical, reflective judgment in forming the decision not to edit it. Such reflective thinking is invisible to measures that focus only on the ultimate artefact produced. Without knowing how knowledge workers enact critical thinking when using GenAI and the associated challenges, we risk creating interventions that do not address workers’ real needs.
In this paper, we aim to address this gap by conducting a survey of a professionally diverse set of knowledge workers (
n = 319), eliciting detailed real-world examples of tasks (936) for which they use GenAI, and directly measuring their perceptions of critical thinking during these tasks: when is critical thinking necessary, how is critical thinking enacted, whether GenAI tools affect the effort of critical thinking, and to what extent (Section
3). We focus on “enaction” (i.e., actions that are signals or manifestations) of critical thinking rather than critical thinking
per se, because critical thinking itself as a pure mental phenomenon is difficult for people to self-observe, reflect on, and report.
Concretely, we aim to answer two research questions:
RQ1
When and how do knowledge workers perceive the enaction of critical thinking when using GenAI?
RQ2
When and why do knowledge workers perceive increased/ decreased effort for critical thinking due to GenAI?
With respect to RQ1 (Section
4), the study reveals that knowledge workers engage in critical thinking when using GenAI tools primarily to ensure the quality of their work. They define critical thinking as setting clear goals, refining prompts, and assessing AI-generated content to meet specific criteria and standards. Their reflective approach involves verifying outputs against external sources and their own expertise, especially in tasks that require higher accuracy.
The data identify key motivators for critical thinking: the desire to enhance work quality, avoid negative outcomes, and develop skills. However, several barriers inhibit this reflective process, including lack of awareness, limited motivation due to time pressure or job scope, and difficulty improving AI responses in unfamiliar domains. Surprisingly, while AI can improve efficiency, it may also reduce critical engagement, particularly in routine or lower-stakes tasks in which users simply rely on AI, raising concerns about long-term reliance and diminished independent problem-solving.
Regarding RQ2 (Section
5), GenAI tools appear to reduce the perceived effort required for critical thinking tasks among knowledge workers, especially when they have higher confidence in AI capabilities. However, workers who are confident in their own skills tend to perceive greater effort in these tasks, particularly when evaluating and applying AI responses.
The data shows a shift in cognitive effort as knowledge workers increasingly move from task execution to oversight when using GenAI. While this shift “from material production to critical integration” has been observed in prior studies [
114], such studies are typically controlled studies in narrow domains with small participant samples. Our data provides complementary evidence that this also occurs in real-world use of GenAI tools, across a wide variety of tasks and professions. For tasks like knowledge retrieval, AI reduces effort by automating information gathering, but workers must now invest more in verifying the accuracy of AI outputs. Similarly, while AI simplifies content creation, workers still need to spend time aligning outputs with specific needs and quality standards.
Our paper makes the following contributions:
•
We review the literature on interaction design interventions for critical thinking, and studies of the effects of automation on knowledge workflows (Section
2).
•
We describe the development and deployment of a survey for gathering empirical evidence for knowledge workers’ experiences and perceptions of the effect of GenAI on critical thinking (Section
3). We find that GenAI tools reduce the perceived effort of critical thinking while also encouraging over-reliance on AI, with confidence in the tool often diminishing independent problem-solving. As workers shift from task execution to AI oversight, they trade hands-on engagement for the challenge of verifying and editing AI outputs, revealing both the efficiency gains and the risks of diminished critical reflection (Sections
4 and
5).
•
Drawing from our survey insights, we highlight how the use of GenAI tools creates new challenges for critical thinking. We outline implications for designing GenAI to support knowledge workers to enhance their awareness, motivation, and ability to think critically (Section
6).
3 Method
To answer our research questions — when and how knowledge workers perceive the enaction of critical thinking when using GenAI (RQ1), and when and why do knowledge workers perceive increased/decreased effort for critical thinking due to GenAI (RQ2) — we conducted an online survey on the Prolific platform
2 to study knowledge workers’ experiences with critical thinking when using GenAI tools for their work.
To ensure participants fully understood the scope and meaning of our questions on critical thinking, as part of the survey study onboarding, they were introduced to the concept of critical thinking in the context of using GenAI through concrete examples of how critical thinking could be applied at various levels of Bloom’s taxonomy (e.g., checking the tone of generated emails, verifying the accuracy of code snippets, and assessing potential biases in data insights). These examples served to sensitise participants to the various dimensions of critical thinking while avoiding conceptualising critical thinking too narrowly. These acted as “cognitive priming”, helping participants better understand the concept of critical thinking, thus soliciting better recognition of critical thinking behaviours in participants’ daily GenAI use.
In total, we received 319 survey responses, in which participants shared a total of 936 real-world examples where they used a GenAI tool for their work, and shared how critical thinking played a role in these tasks.
To answer RQ1, we created an explanatory regression model with a dependent variable measuring whether participants perceived the enaction of critical thinking when using GenAI tools for the tasks they shared, and independent variables corresponding to two sets of factors that we hypothesised might correlate with the tendency to engage with tasks critically: 1) task factors: measures about the task at hand — e.g., task type, confidence in doing the task. 2) User factors: measures about users — e.g., age, gender, occupation, tendency to reflect in work, and trust in GenAI. In addition, we analysed participants’ motivators and inhibitors for critical thinking from their free-text responses.
To answer RQ2, we create explanatory regression models with dependent variables measuring whether participants perceived different cognitive activities constituting critical thinking (e.g., breaking down a problem, putting together ideas) to be more or less effortful when using a GenAI tool for the tasks compared to when not using one. Independent variables included the same set of factors as for RQ1 above. We also analysed participants’ free-text responses to understand why they perceived these cognitive activities as more or less effortful due to GenAI.
3.1 Survey Design
To model the relationship between task and user factors as they relate to critical thinking activities, we designed a survey as follows (see Appendix
A.1 for the complete survey).
3.1.1 Task-Related Factors.
Prior studies have shown that knowledge workers apply GenAI tools for a range of tasks and express different needs while doing these tasks [
13], and that their perceived confidence in themselves and AI doing the tasks can influence their use and reliance on the tool [
20,
22,
83,
130]. We hypothesised that factors relating to the user’s task, including task type, confidence in themselves, and AI doing the task, could affect their critical thinking.
Task type.
Brachman et al. [
13] classify knowledge workers’ current usage of GenAI tools into nine types (See Table
1), grouped into three major categories: 1) for
creation, 2) to find or work with
information, 3) to get
advice. This taxonomy offers clear distinctions among the major categories of task type, which we hypothesised would correlate with users’ critical thinking due to differing objectives and requirements. We follow Brachman et al. [
13]’s taxonomy and operationalise their task type categorisation in our survey, focusing on the major categories. For each GenAI tool use example, participants were first asked to describe in detail the task they did (i.e.,
Please tell us: 1) what you were trying to achieve, 2) in what GenAI tool, and 3) how you used the GenAI tool, including any prompts.). Then, they were asked to pick one of the nine task types that best described their task. Using this information, we classified each example as creation, information, or advice, per the Brachman et al. [
13] taxonomy.
Task confidence. Guided by prior studies on user confidence in AI-assisted decision-making [
20,
85,
130], for each self-reported task we consider three aspects of user confidence: 1)
confidence in self (i.e.,
How confident are you in your ability to do this task without GenAI?), 2)
confidence in GenAI (i.e.,
How confident are you in the ability of GenAI to do this task?), and 3)
confidence in evaluation (i.e.,
How confident are you, in the course of your normal work, in evaluating the output that AI produces for this task?). Participants rated each aspect of confidence on a five-point scale ranging from
“not at all confident” (1) to
“extremely confident” (5).
3.1.2 User factors.
We hypothesised that participants’ general tendency to reflective thinking and trust in GenAI would affect their baseline critical thinking awareness and practice, and adapted validated instruments from prior work to measure this.
Tendency to reflect on work. We use Kember et al. [
65]’s Reflective Thinking Inventory to measure participants’ baseline tendency to think reflectively. Reflective thinking is closely related to critical thinking (Section
2) and the Kember et al. inventory can be interpreted as a proxy for the disposition to think critically [
38].
Trust in generative AI. We measure participants’ overall trust in GenAI, which has been shown to correlate with users’ attitudes and adoption of the use of the technologies [
43,
76]. To that end, we adapted the six-item Propensity to Trust Technology scale [
56], replacing the word “technology” with “GenAI”.
Gender, age, and occupation. We collect demographic information, including gender, age range and occupation. For occupation, participants self-selected the most appropriate occupation category from the Occupational Information Network (O*NET)’s occupational listings
3. We classify occupations as being
in risk of automation based on the economic analyses of Ghosh et al. [
42], including the categories of Office and Administrative Support, Sales and Related, Computer and Mathematical, Business and Financial Operations, and Arts, Design, Entertainment, Sports, and Media.
3.1.3 Critical Thinking, Associated Cognitive Activities, and Effort.
Perceived enaction of critical thinking. A key dependent variable of RQ1 — when knowledge workers perceive the enaction to think critically — was answered using a pair of questions, first asking whether participants perceived that they had performed critical thinking for that task (a binary yes/no question), followed by a free text question asking them to justify their response. If participants answered “yes” to the first question, they were asked to elaborate why and how they enacted critical thinking in free text (i.e., Please share one real-world example when you applied the critical thinking tactic(s) to this task, and explain why you did critical thinking.), as well as the challenges, if any, they faced while doing so (i.e., When applying this critical thinking tactic during your use of GenAI tool, have you ever encountered any challenges and obstacles?). If the participants answered “no” to the question, they were asked to elaborate on why they did not think critically for the task, in free text.
Perceived effort in critical thinking: Bloom’s taxonomy. As discussed in Section
2, we selected Bloom’s taxonomy as the framework to operationalise the measurement of critical thinking activities [
12]. The taxonomy includes six different levels of cognitive activities: Knowledge (i.e., recall), Comprehension (i.e., organising/translating ideas), Application (i.e., problem-solving), Analysis (i.e., breaking down a problem), Synthesis (i.e., putting together ideas), and Evaluation (i.e., evaluating and quality checking). See Table
2 for more details.
For each task example, participants were asked if, and how much, the use of the GenAI tool changed the effort of critical thinking activities compared to when they did not use the AI tool. We used the five-point scale “much less effort”, “less effort”, “about the same”, “more effort”, to “much more effort” (which we code as integers ranging between − 2 and + 2). Participants could choose “N/A” if they thought that a cognitive activity was not relevant to the task. Finally, participants were asked to elaborate in free-text why they had marked any critical thinking activities as requiring more or less effort with GenAI.
3.2 Study Setup and Recruitment
We recruited participants through the Prolific platform who self-reported using GenAI tools at work at least once per week. This criterion ensured the study focused on knowledge workers with direct, ongoing experience integrating GenAI tools into their day-to-day work tasks. We received 333 responses but excluded 14 from the analysis due to low response quality (i.e., low-effort free-text responses). For the remaining 319 responses, participants spent an average of 43.19 minutes (STD=23.13) in completing the survey. The 319 participants (159 men, 153 women, 5 non-binary/gender diverse, 2 prefer not to say) came from diverse age groups, occupations, and countries of residence (see Table
3). Participants were compensated with GBP £10 for completing the study. Our study protocol was approved by our institution’s ethics and compliance review board. All participants were briefed and signed a consent form.
3.3 Analysis Procedure
In our survey, participants were asked to share three real examples of their GenAI tool use at work. To increase the variety of examples collected, participants were asked to think of three different examples, one for each task type: Creation, Information, and Advice (see Section
3.1.1). Then, participants were asked to share an example of each task type in detail. The order of task types was randomised to avoid order and fatigue effects. For each example, as mentioned, we measure participants’ perceived enaction of critical thinking, perceived effort in critical cognitive activities, and perceived confidence. All participants shared three examples. However, they were allowed to skip any task type they did not have experience of and substitute another task type — e.g., a participant could share two examples about Creation and one example about Advice, if they had no experience of an Information task.
After participants shared three examples of using GenAI tools, the survey assessed their overall reflective thinking tendency, trust in GenAI, and demographic details such as gender, age group, and occupation.
We employed quantitative and qualitative analyses, guided by our research questions. Both
RQ1 — when and how do knowledge workers perceive the enaction of critical thinking when using GenAI? — and
RQ2 — when and why do knowledge workers perceive increased/decreased effort for critical thinking due to GenAI? — were answered via both quantitative and qualitative analysis (See Figure
1 for an overview of our approach).
3.3.1 Dataset Cleaning and Overview.
Our 319 participants shared a total of 957 real-world examples of their use of GenAI tools at work. We removed 11 examples lacking sufficient detail to analyse (e.g., brief or vague examples like “To build my portfolio.”). We also removed 11 examples for which a participant shared duplicated or non-GenAI tool use examples in their responses.
We retained 936 examples, including 374 (39.96%) related to Creation, 303 (32.37%) related to Information, and 259 (27.67%) related to Advice. Our participants self-reported to have enacted critical thinking for 555 (59.29%) of the examples they shared, and perceived critical thinking activities, overall, to require less effort when using a GenAI tool compared to when not using one (see
DV distribution in Table
4).
3.3.2 Quantitative Analysis.
To model the relationship between task and user factors (independent variables) with (1) a binary measure of users’ perceived enaction of critical thinking and (2) six five-point scales of users’ perceived effort in cognitive activities associated with critical thinking, we respectively fit (1) one random-intercepts logistic regression model and (2) six random-intercepts linear regression models. To account for repeated measures, we include Participant ID as a random intercept term. For all categorical variables, we selected the most common factor level as the baseline reference. To correct for multiple comparisons, we apply the Benjamini–Hochberg procedure [
9] with a total of 98 hypothesised predictors across the seven models, yielding a corrected p-value threshold of 0.007. We adjust the p-values accordingly and report significant effects based on these corrected values.
Table
4 summarises the seven models and reports the corrected p-values. For interpretability, we computed z-scores to standardise each numeric user factor (i.e., overall tendency to reflect, overall trust in GenAI). Thus, a positive coefficient implies the increase in log odds (in the logistic regression model) or the value (in the linear regression models), for every one standard deviation increase of that factor. A negative coefficient implies the opposite. For confidence scales (i.e., confidence in self, confidence in GenAI, confidence in evaluation), a positive coefficient is the increase in log odds/values for every one-point increase above the base score (1: not at all confident), and a negative coefficient implies the opposite. For categorical and binary factors (i.e., task type, gender, age group, occupation in risk of automation), the coefficient is the predicted difference in log odds/increase of the values for a given factor level relative to a baseline level. Positive coefficients imply increased log odds/values relative to the reference level and vice versa.
3.3.3 Qualitative Analysis.
Guided by our research questions, we open-coded [
23] participants’ free-text responses on i) why they did or did not think critically when using GenAI tool for the task, ii) why they perceived more or less effort to perform critical thinking activities with the GenAI tool. One researcher performed the initial coding on 50 survey responses in discussion with three other researchers to iteratively construct a codebook. Another researcher joined the coding process when the initial codebook was constructed, and was trained with the initial codebook. The two researchers then coded the remaining 269 survey responses. All research team members regularly met and discussed emerging themes during the coding process. Disagreements were negotiated and resolved at each stage, using negotiated agreement best practices [
87]. We report our findings in Sections
4 and
5, and include the codebook in Appendix Table
5. We also report on how frequently participants discussed the identified themes.
5 Findings for RQ2: When and why do knowledge workers perceive increased/decreased effort for critical thinking due to GenAI?
To answer
RQ2, we report a descriptive analysis of participants’ perceived effort in cognitive activities associated with critical thinking, as defined by Bloom’s taxonomy (Section
5.1) — i.e., recall (Knowledge), organising/translating ideas (Comprehension), problem solving (Application), breaking down a problem (Analysis), putting together ideas (Synthesis), and evaluating and quality checking (Evaluation). We complement this with an analysis of participants’ free text elaborations on why they perceived an increase or decrease in effort due to GenAI, observing three qualitative shifts in critical thinking effort (Section
5.2).
A perceived reduction in effort when using GenAI may be due to participants 1) enacting the same “amount” of critical thinking but feeling supported by GenAI, 2) offloading the work of critical thinking to GenAI, 3) enacting “less” critical thinking overall, or 4) conflating reduction in cognitive effort in general, with reduction in critical thinking effort specifically. We address each of these interpretations in context.
5.1 When knowledge workers perceive increased/decreased effort for critical thinking due to GenAI
In the majority of examples, knowledge workers perceive decreased effort for cognitive activities associated with critical thinking when using GenAI compared to not using one — examples that were reported as “much less effort” or “less effort” comprise 72% in Knowledge, 79% in Comprehension, 69% in Application, 72% in Analysis, 76% in Synthesis, and 55% in Evaluation dataset (See Figure
2). Moreover, knowledge workers tend to perceive that GenAI reduces the effort for cognitive activities associated with critical thinking when they have
greater confidence in AI doing the tasks and possess
higher overall trust in GenAI (see Table
4).
5.1.1 Task Factors.
We found that knowledge workers’ confidence in AI doing the tasks negatively correlated with perceived effort for five of the six cognitive activities (all except Application). The higher the participant’s confidence in AI, the greater is their perceived reduction in effort for Knowledge (β =-0.11, p = 0.029), Comprehension (β =-0.13, p = 0.014), Analysis (β =-0.15, p = 0.003), Synthesis (β =-0.12, p = 0.026), and Evaluation (β =-0.23, p < 0.001). Moreover, knowledge workers’ confidence in themselves doing the task correlates positively with perceived effort in Application (β =0.08, p = 0.029) and Evaluation ((β =0.10, p = 0.027). We qualitatively analyse participant rationales in the next section in more detail, but one explanation for why knowledge workers’ confidence in AI and in themselves had the opposite effects on perceived effort in these cognitive activities is the following. GenAI tools can decrease knowledge workers’ cognitive load by automating a significant portion of their tasks, but as knowledge workers have more confidence in doing the task themselves, they employ more engaged practices in steering AI responses, especially when applying (Application) and evaluating (Evaluation) AI responses.
These findings, along with our quantitative findings for
RQ1, reveal a connection between knowledge workers’ self-confidence and confidence in AI and their perceived critical thinking during GenAI tool use: 1)
a higher confidence in GenAI is associated with less critical thinking even though it is perceived as less effort to do so, and 2)
a higher self-confidence is associated with more critical thinking even though it is perceived as more effort to do so. We discuss this in more detail in Section
6.1.1.
5.1.2 User Factors.
In contrast to our findings about knowledge workers’ perceived enaction of critical thinking (see Section
4.2), we found no significant correlation between their overall tendency to reflect and perceived effort of critical thinking for any cognitive activities. This suggests that knowledge workers who do (or do not) tend to reflect on their work do not necessarily perceive a higher or lower effort of critical thinking with GenAI. However, knowledge workers’
overall trust in GenAI was negatively correlated with perceived effort for four of the six cognitive activities — i.e., higher trust in the technology is associated with less perceived effort for Knowledge (
β =-0.12,
p = 0.029), Application (
β =-0.17,
p = 0.002), Analysis (
β =-0.12,
p = 0.046), and Evaluation (
β =-0.24,
p < 0.001). Thus, knowledge workers with higher levels of trust in GenAI — generally or for specific tasks — perceive engaging in critical thinking activities to be less effortful. A possible explanation, supplemented with our qualitative analysis in RQ1 (see Section
4.3.2), is that trust and reliance on GenAI inhibit the enaction of critical thinking, i.e., users underinvest in critical thinking when using GenAI.
5.2 Why knowledge workers perceive increased/decreased effort for critical thinking due to GenAI
To understand why participants perceived an increase or decrease in the effort of critical thinking due to GenAI, we analysed the free-text responses in which they were asked to elaborate, mapping the responses onto the six cognitive activities.
We found that GenAI tools shift the effort of critical thinking in three distinct ways: for Knowledge and Comprehension, the effort shifts from information gathering to information verification; for Application, effort shifts from problem-solving to AI response integration; and for Analysis, Synthesis, and Evaluation, effort shifts from task execution to task stewardship.
5.2.1 Knowledge & Comprehension: From information gathering to information verification.
Efforts invested in Knowledge (e.g., retrieving relevant information) and Comprehension (understanding that information) often go hand in hand when using GenAI tools. In general, participants perceived less effort in retrieving and curating task-relevant information, because GenAI automates the process. However, they perceived more effort in verifying the information in the AI response.
Participants perceived less effort to fetch task-specific information at scale, and in real-time (111/319). For instance, P232 shared that her market research results through ChatGPT “are immediate and at a sufficient level of detail for me to get to grips with the basics of the industries. I would otherwise have to read a lot of press reports and subscribe to multiple newsletters.”
GenAI tools are perceived to organise and present information in a readable format (87/319). For example, P86 compared his experience in searching in a web browser with that in ChatGPT: “ Research using Google is time-consuming; even clicking on a couple of websites takes more time than asking a single question to an LLM. Also, the LLM produces organized answers... the tools and techniques were categorized by type, and a dotted list was produced for each.” Participants find it less effort to re-structure and summarise information in GenAI tools. E.g., P137 tried to update protocol documents to comply with a new standard: “ I did not have to check the templates one by one... Questions I had related to the procedures were answered by the GenAI, and it helped me to know better this new standard.”
However, many participants shared examples when they perceived more effort in information retrieval because the AI response can be wrong and needs verification (56/319). For example, when a lawyer (P147) used ChatGPT to find relevant laws for a legal case, he noticed “AI tends to make up information to agree with whatever points you are trying to make, so it takes valuable time to manually verify.”
5.2.2 Application: From problem-solving to response integration.
GenAI can contextually apply knowledge to users’ specific questions and examples, reducing perceived effort for Application overall. However, users must instead spend effort integrating GenAI output, in form and content (as mentioned in Section
4.1.3).
Participants perceived less effort in problem-solving and question answering because GenAI tools provide personalised solutions to their problems (77/319). For example, P154 compared his experience in reviewing code with and without ChatGPT: “trying to understand how something works or understanding the problem is the main challenge. People have to “google” a lot. Find the correct information and then try to find people facing similar problems. That takes a lot of effort. GPT simply answers those very fast and easily and mostly correctly.”
With in-context learning, GenAI can also apply users’ examples to new context (9/319). For example, participants used GenAI tools to generate text, guided with examples: “company has a set out list of possible scenarios and how we can address them, all I have to do is feed it to the AI, and it would generate a set response based on the data given” (P268).
Despite the ability for contextual tailoring, participants still reported an increased effort in having to apply the responses (19/319) to their tasks and to meet specific needs. For example, when P51 wrote a promotional blog post for their product launch, “the AI-generated content required substantial editing to align with specific marketing guidelines and tone preferences. This editing process could be time-consuming, particularly when ensuring that technical details were accurate and comprehensible to our target audience.” Additional application effort is incurred when knowledge workers integrate AI-generated content with content from other sources, or misjudge the extent to which GenAI output will be contextualised to their scenario. As P36 noted “the extra effort in determining that the code generated matched my existing code, and making subsequent alterations to make it fit was more effort than just doing it myself in the end.”
5.2.3 Analysis, Synthesis, and Evaluation: From task execution to task stewardship.
Participants perceived these activities, overall, to require less effort due to GenAI tools. Specifically, GenAI helps knowledge workers scaffold complicated tasks and information; it helps knowledge workers automate artefact creation; and it helps form feedback cycles that knowledge workers otherwise do not have access to. Nevertheless, knowledge workers perceived increased effort spent on AI stewardship — translating intentions into queries, steering AI responses, and assessing if the AI response meets their quality standards for work, while retaining accountability for the work.
Analysis. Participants reported reduced effort when GenAI tools helped to scaffold complicated tasks and information (48/319). For instance, P203 used ChatGPT to write a complex Slack message to an unfamiliar colleague, and “GenAI broke down the problem.” This helped her think Analytically, to derive criteria such as to “make sure the message structure is to the point and understandable to someone who doesn’t have the same background knowledge” as well as “ensure that I am not missing elements or being confusing with examples.”
However, GenAI tools also require users to
articulate their needs and translate intentions into a query (45/319), which was perceived to increase Analysis effort. As mentioned in Section
4.1.1, revising queries is a critical thinking activity specific to GenAI use. P24 described several phases of image generation prompting, saying
“Image generation requires more effort for everything except the actual image generation. I have to think of what I want to be drawn, then on how the AI wants it described, then correct it when it makes wacky outputs.”Synthesis. Participants perceived less effort when GenAI automates the creation process (129/319), such as drafting documents, responding to emails, or generating code.
However, participants noted that the reduced effort in Synthesis could lead to less critical engagement with the task. For instance, P131, when generating advising campaigns for her business, remarked having “to read what ChatGPT generates and make sure that it’s what I want, but not to [let it] think the whole idea.” Moreover, participants perceived it to be more effort to constantly steer AI responses (48/319), which incurs additional Synthetic thinking effort due to the cost of developing explicit steering prompts. For example, P110 tried to use Copilot to learn a subject more deeply, but realised: “its answers are prone to several [diversions] along the way. I need to constantly make sure the AI is following along the correct ‘thought process’, as inconsistencies evolve and amplify as I keep interacting with the AI.”
Evaluation. Finally, critical thinking is perceived to be less effort because GenAI tools provide personalised feedback loops for tasks (40/319) that users otherwise do not have access to. For example, to edit text P313 said he previously “would often go through multiple rounds of checks by others [humans] for feedback”, but with GenAI could do so “on my own time” by asking the “AI to do alternate versions, and compare what I like and don’t”.
In certain cases where GenAI is perceived to have a strength relative to the user’s own capability (e.g., in spelling or grammar in a non-native language), GenAI responses are perceived to make few mistakes (19/319). Thus, participants perceived a reduced effort needed for Evaluation, as P239 noted: “I can be confident that everything is spelt correctly, I don’t need to second guess myself... I can get the reassurance I need without having to bother another person to check it for me.”
Those cases notwithstanding, as noted in Section
4.1.2, participants needed to
evaluate AI-generated content (42/319) through several objective and subjective criteria, and reported increased effort in doing so.
6 Discussion
6.1 Implications for Designing GenAI Tools That Support Critical Thinking
6.1.1 Self-Confidence and Task Confidence.
Task confidence appears to significantly influence knowledge workers’ perceived enaction of critical thinking and the effort they invest in it. Specifically, a user’s confidence in GenAI is predictive of the extent to which critical thinking is exercised in GenAI-assisted tasks. Both our quantitative and qualitative results suggest that higher confidence in GenAI is associated with less critical thinking, as GenAI tools appear to reduce the perceived effort required for critical thinking tasks among knowledge workers. Conversely, with the important caveat that users’ self-confidence is a subjective measure of their knowledge, experiences, and abilities on the tasks [
20,
59,
85], higher self-confidence is associated with more critical thinking, even though workers who are confident in their own skills tend to perceive greater effort in these tasks, particularly when evaluating and applying AI responses.
Our analysis does not establish causation. However, based on our evidence, it is possible that fostering workers’ domain expertise and associated self-confidence may result in improved critical thinking when using GenAI. Task confidence significantly influences how users engage with AI tools, particularly in the context of human-AI “collaboration” (notwithstanding objections to that term [
113]). Previous frameworks have categorised human-AI collaborations by how often the user or the AI initiates an action [
95], and which entity takes on a “supervisory” role [
88]. Our findings shed light on this issue in the context of GenAI-assisted knowledge work. High task confidence is associated with users’ ability to delegate tasks effectively, fostering better stewardship while maintaining accountability. Conversely, lower self-confidence may lead users to rely more on AI, potentially diminishing their critical engagement and independent problem-solving skills. This reliance on AI can be seen as a form of cognitive offloading [
8], where users depend on AI to perform tasks they feel less confident in handling themselves.
Confidence in AI is associated with reduced critical thinking effort, while self-confidence is associated with increased critical thinking effort. This duality indicates that design strategies should focus on balancing these aspects. The aims are both to improve the quality of AI-assisted tasks and also to empower users to develop their skills and maintain a balanced “relationship” with AI. To address task confidence recalibration, AI tools could incorporate feedback mechanisms that help users gauge the reliability of AI outputs, when to trust the AI and when to apply their critical thinking skills. This aligns with the goals of explainable AI [
33]. Moreover, the user should remain responsible and accountable for the outcome. AI tools must support users in actively and critically customising and refining AI-generated content. Tools may incorporate explicit controls for users to regulate the extent of AI assistance, depending on their confidence levels and the task’s complexity.
6.1.2 Awareness, Motivation, and Execution of Critical Thinking.
Our study identifies key motivators for and inhibitors of critical thinking among knowledge workers using GenAI. The design implications are clear: critical thinking interventions for GenAI tools should aim to enhance and leverage motivators while mitigating and avoiding inhibitors.
One design approach is to enhance
awareness of critical thinking opportunities. Our findings indicate that knowledge workers tend to forgo critical thinking for tasks perceived as unimportant or secondary, while engaging in it when aiming to improve task quality or avoid negative outcomes. This suggests a need for both proactive and reactive critical thinking interventions. Proactive systems take the initiative [
52] to interrupt the user to highlight the need and opportunity for critical thinking in situations where it is likely to be overlooked; a reactive approach would allow the user to explicitly request critical thinking assistance when it is consciously needed.
Another approach is to increase the
motivation to think critically. Our study reveals that knowledge workers often neglect critical thinking when they perceive it as outside their job scope, but engage in it when aiming to improve their professional skills. Thus, critical thinking interventions for GenAI tools could be positioned as contributing to long-term skill development and professional growth, as opposed to an extraneous “co-auditing” [
46] task that is only relevant on a task-by-task basis.
Finally, design could aim to enhance the
ability to execute critical thinking. We find that knowledge workers often refrain from critical thinking when they lack the skills to inspect, improve, and guide AI-generated responses. GenAI tools could incorporate features that facilitate user learning, such as providing explanations of AI reasoning, suggesting areas for user refinement, or offering guided critiques. The tool could help develop specific critical thinking skills, such as analysing arguments [
72], or cross-referencing facts against authoritative sources. This would align with the motivation-enhancing approach of positioning AI as a partner in skill development.
6.2 Shifts in Critical Thinking Due to Generative AI
Critical thinking in knowledge work involves a range of cognitive activities, such as analysis, synthesis, and evaluation. We observed that the use of GenAI tools shifts the knowledge workers’ perceived critical thinking effort in three ways. Specifically, for recall and comprehension, the focus shifts from information gathering to information verification. For application, the emphasis shifts from problem-solving to AI response integration. Lastly, for analysis, synthesis, and evaluation, effort shifts from task execution to task stewardship.
The use of GenAI in knowledge work creates new cognitive tasks for knowledge workers. The task of response integration is a prime example. Knowledge workers must assess AI-generated content to determine its relevance and applicability to their specific tasks, often modifying the style and tone to align with the intended purpose and audience.
Conversely, some cognitive tasks become less necessary due to GenAI. For instance, information gathering has been significantly reduced. GenAI tools automate the process of fetching and curating task-relevant information, making it less effortful for knowledge workers. As a result, the cognitive load associated with searching for and compiling information has decreased.
Some cognitive tasks remain, but have evolved in their nature due to GenAI. One such is information verification; cross-referencing AI-generated outputs with external sources and their own expertise to ensure accuracy and reliability. Workers have always needed to verify the information they work with, but as a tool, GenAI has its own particular strengths and failure modes when it comes to correctness, accuracy, and bias.
With GenAI, knowledge workers also shift from task execution to oversight, requiring them to guide and monitor AI to produce high-quality outputs — a role we describe as “stewardship”. It is not that execution has disappeared altogether, nor is having high-level oversight on a task an entirely new cognitive role, but there is a shift from the former to the latter. Unlike in human-human collaboration, in a human-AI “collaboration”, the responsibility and accountability for the work still resides with the human user despite the labour of material production being delegated to the GenAI tool, which makes stewardship strike us as a more appropriate metaphor for what the human user is doing, than teammate, collaborator, or supervisor.
In light of these changes, training knowledge workers to think critically when working with GenAI should focus on developing skills in information verification, response integration, and task stewardship. Training programs should emphasise the importance of cross-referencing AI outputs, assessing the relevance and applicability of AI-generated content, and continuously refining and guiding AI processes. Additionally, a focus on maintaining foundational skills in information gathering and problem-solving would help workers avoid becoming overreliant on AI [
102].
6.3 Limitations
Our study has limitations that warrant consideration and offer avenues for future research. Firstly, we observed that participants occasionally conflated reduced effort in using GenAI with reduced effort in critical thinking with GenAI. This misconception may stem from the infrequent contemplation of critical thinking in their daily tasks (regardless of whether they use GenAI), potentially leading to inaccurate self-reporting. This conflation often occurred when participants were satisfied with AI-generated responses, suggesting that when AI produces expected outcomes, users may engage in less critical evaluation. Future studies could employ alternative measures of critical thinking, such as think-aloud protocols or task-based assessments, to better differentiate between effort reduction and critical thinking processes.
Secondly, we assess users’ subjective task confidence following prior work on AI-assisted decision-making [
20,
59,
85]. Still, one’s subjective self-confidence may not always be well-calibrated with respect to objective expertise on tasks [
39,
130]. Future work should explore this subjective/objective distinction in the context of critical thinking with GenAI in knowledge work.
Thirdly, our survey was conducted exclusively in English, with participants required to be fluent English speakers. This approach ensured consistency in data collection and feasibility of analysis by our English-speaking research team, but has no representation of non-English speaking populations or multilingual contexts. Future research could explore cross-linguistic and cross-cultural perspectives on GenAI usage and critical thinking.
Fourthly, our sample was biased towards younger, more technologically skilled participants who regularly use GenAI tools at work at least once per week. This demographic skew may not fully represent the broader population of knowledge workers, potentially overlooking the experiences and perceptions of older or less tech-oriented professionals.
Lastly, GenAI tools are constantly evolving, and the ways in which knowledge workers interact with these technologies are likely to change over time. We adopted the task taxonomy due to Brachman et al. [
13] to capture relatively stable and coarse-grained characteristics of tasks without overcomplicating our explanatory models. Future work on different goals can expand our measures with more detailed categorisation and/or task-specific measurements (e.g., task difficulty and skill). To that end, our study provides a valuable baseline for understanding critical thinking in the context of current GenAI tools. In future work, longitudinal studies tracking changes in AI usage patterns and their impact on critical thinking processes would be beneficial. Additionally, developers of GenAI tools can deploy telemetry, within-tool surveys, or experience sampling to their users, to gain more insight into how specific tools can evolve to better support critical thinking in different tasks.