The Risks of Letting AI Direct Conversations

by Arnaud Chevallier, Frédéric Dalsace and José Parra-Moyano

March 2, 2026

LUVLIMAGE/Getty Images

Summary.

Large language models are evolving from answer engines into conversational partners that shape decisions by asking their own questions. Research comparing more than 1,600 executives with 13 leading models finds that AI systems

Leer en español Ler em português

Competition is fierce among the companies offering gen AI tools based on large language models (LLMs). Tools like ChatGPT, Claude, and DeepSeek release new functionalities on an almost monthly basis. Many of these AI-based systems, all of which use a language-based model that enables them to interact in conversational ways with users, are evolving from mere “answer providers” into “conversational agents” that ask questions to generate more useful answers. One example of this phenomenon is WaLLM, a chatbot developed to function via WhatsApp. This chat goes beyond answering simple queries by offering follow-up questions, lists of trending and recent queries, and even a “Top Question of the Day.” Another example is OpenAI’s DeepResearch function, which, like Singaporian-based Manus, is designed to ask users questions. When user requests are ambiguous, rather than simply assuming one specific interpretation, systems now often ask a clarifying question to reduce the risk of creating frustration or making mistakes.

These systems thus engage in dialogue like consultants, colleagues, or study partners. Asking questions helps the models collect the information they need to give more appropriate answers and recommendations, an approach that mimics effective leaders’ who understand that smarter questions lead to better decisions.

But while the semantic structure of LLMs’ questions has been studied, little is known about the types of questions that these systems ask. That creates a liability for business leaders, since the questions we use in our decision making influence the information we gather and thus the decision outcomes. It is therefore critical that we understand whether the queries of LLMs differ from those asked by executives, and also whether they differ from one model to the next, which would directly affect the user’s decision—albeit surreptitiously.

Furthermore, as “agentic” features rapidly evolve, the criticality of questions isn’t limited to the interaction between user and model. Agentic AI involves semi-autonomous systems that act as if they are reasoning; they can draft plans to execute pre-defined tasks, evaluate the outcome of their tasks, make goal-directed decisions, and take actions to achieve those goals. They achieve these tasks by asking themselves questions. Therefore, understanding the models’ questioning styles can help business leaders interpret and evaluate the recommendations they receive.

This article provides evidence that LLMs use different types of questions than humans and that no consistent pattern exists across the models we tested. Therefore, we recommend that managers mindfully adapt their use of these systems, especially in a decision-making context, and we propose avenues to help them do so.

What question types do managers use to assist their decision making?

In the quest to gather relevant information and make better decisions, asking good questions has always been central, and we know that the types of questions that executives ask guide the decisions they make. In our research, we have identified five types of questions that support decision making, which we call the Leaders’ Question Mix (LQM).

Investigative questions (“What’s known?”) clarify root causes and potential solutions. They help decision makers better understand the problem or the potential solutions, for example using questions such as: What are the core roots of the problem? Or, how feasible is this option?
Speculative questions (“What if?”) explore alternatives and creative scenarios. They challenge assumptions and ask whether things could be done differently or whether something else might be considered. Using questions such as: “What is a different way to look at the problem?” or “What is our plan B?” spurs innovation and broadens thinking.
Productive questions (“Now what?”) help adjust the decision-making process. They ask about plans, resources, timing, and readiness. For example: Do we have what we need to proceed? Are we ready to decide? These help turn ideas into execution.
Interpretive questions (“So what?”) derive insights from the information obtained from the analysis. They encourage reflection on implications: What did we learn? How does this fit with our goals?
Subjective questions (“What’s unsaid?”) acknowledge the emotional or political elements in the decision-making team—or the broader environment. Questions such as “What aspects of the decision are you the most concerned about?” or “Who will be against us if we go forward?” surface the often unspoken interpersonal dynamics and emotional judgments in decisions.

Unlike physicians, therapists, journalists or lawyers, business leaders are typically not taught to ask questions. They use their judgment, and may overtime develop routines or “go-to” question types. Gravitating more toward some types of questions over others may make them reach different conclusions even when confronting the same situation. If a leader consistently fails to ask questions of a specific type, they might develop blind spots. We argue that effective leaders are good at “disengaging their autopilot.” They adapt their question mix to the specific decision they face rather than defaulting to whichever feels most comfortable to them.

We developed the LQM test to help leaders reflect on the types of questions they naturally prefer to use. We created and pre-tested a list of typical questions of each type that are useful during decision making, then presented pairs of different types to respondents, asking them to rate their preference for one or the other on a -3 to +3 scale. This process was repeated 10 times, so each question type was compared against the other four. We transformed the raw data into percentages that sum to 100%, indicating the manager’s preferred “question mix.” Scores range from 0% (least preferred) to 40% (most preferred).

Our research involved administering the LQM test to hundreds of executives. On average, executives tend to distribute their inquiries relatively evenly across the five domains: approximately 17% to 22% of their questions fall into each category. These aggregated results provide a reference for the relative usage of different question types by humans and serve as a benchmark for evaluating how AI chatbots formulate questions in a decision-making context.

AI and leaders ask different types of questions

Are AI systems asking the same type of questions in the same proportions as experienced executives? To explore this, we selected 13 widely used LLMs (Anthropic, OpenAI, Google, DeepSeek, Mistral, Perplexity, and xAI). We ran the same routine with each LLM as we did for executives, asking each system to take the test 200 times. We compared these results with those of a set of more than 1,600 executives who each took the self-assessment once, and we detected several patterns in these results.

First, LLMs and managers use different question mixes. Statistically, the question mix of every single LLM we tested differed from that of executives. Each LLM produced a mix that differed statistically in three question types or more from executives. Whereas humans balanced relatively uniformly across all five types, LLMs showed much more variation, skewing heavily toward some categories and underrepresenting others. As a group, executives assigned as few as 17.8 of their points (subjective) and as much as 21.8 (interpretive); that’s a spread of 4.1 points between these two extremes. LLMs, however, often had much wider spreads, as high as 23.1 points in fact, for Gemini 2.5, which favored much more interpretive questions over productive ones.

Second, LLMs consistently differ on interpretive and productive questions. Ten out of the 13 models assigned more points than executives did to interpretive questions. And all 13 models asked fewer productive questions than executives did.

Third, models differ widely from one another. We tested each possible pair of models to evaluate how different they were from one another. All 78 different pairs were statistically different, even when these pairs were evolutions of the same model (for example: Grok 3 and Grok 4).

Fourth, LLMs are generally more consistent than managers, but not always. In most cases, the variances in a LLM’s mix were smaller than managers, but two LLMs (Gemini 2.5 Pro and Grok 4) exhibit several cases of higher variance across several questions types.

Why these differences matter

These differences in questioning behavior have practical implications. As LLM models take on greater roles in brainstorming, problem solving, and decision support, imbalanced question mixes may translate to blind spots. Indeed, since each category of questions elicits reflection upon different aspect of the decision, if a model overlooks one aspect, it might never come up in the process. Sometimes executives don’t realize a crucial question was missing until it’s too late. Consider four cases.

First, a model that seldom asks productive questions, like Gemini 2.5 Pro or ChatGPT 5 might generate ideas but leave a team blind to important aspects of appropriate pacing, resource allocation, and implementation how-to’s. In fast-paced environments, this could lead to slower-than-optimal decision making. The result? Poor time allocation in the decision process (deciding too slowly or, possibly, too quickly), which is a recurring pitfall in managerial settings that now gets exacerbated if the AI advisor isn’t helping the decision makers adapt their decision process.

Second, an AI that under-utilizes subjective questions, like Sonar or ChatGPT 5, may overlook the human element. It might rarely ask “How do the stakeholders feel about this change?” or “Have we considered sufficient input from those who will be most affected?” Such omissions could lead to tone-deaf recommendations. For example, a system might enthusiastically suggest a cost-cutting initiative that fails to surface employee morale or unspoken objections. Human leaders often instinctively gauge the mood and alignment of their teams. Overrelying on an AI partner, a manager might miss the weak signals that would once have prevented her from pursuing a decision that looks good on paper but is doomed by affectives factors.

Third, the over-emphasis on investigative and interpretive questions by many LLMs can also be problematic. Yes, digging deep into data and its meaning is valuable, but an LLM that does this too much could bog down a discussion in analysis or rehash known facts. Experienced human facilitators know when to switch gears from analysis to action. Without moderation skills, LLMs keep asking interpretive questions (“what does this mean?”) even when the team would be better off moving forward in the decision process.

Fourth, variability among LLMs means that the choice of an AI partner could skew a team’s perspective. If an organization primarily uses one LLM that, say, rarely asks speculative questions (“Could we do this differently?”), the organization might miss out on innovative alternatives that no one, human or machine, ever brought up. On the flip side, an LLM that asks too many clarifying (i.e., investigative) questions can frustrate users (some have reported AI models getting stuck continually seeking confirmation before proceeding).

Effective inquiry requires a diverse, balanced mix of questions—something human leaders strive for and AI needs to learn. In short, a system heavily tilted toward one question type can surreptitiously steer conversations onto a single track, and this becomes a liability for the decision process.

Guiding AI’s inquiry: actions for business leaders

When LLM systems start asking questions, they become seductive partners. For managers, answering inconspicuous questions is probably easier than thinking hard about how to write the most precise prompt possible. The temptation is high to relax and feel that “things are getting care of” (e.g., “anyway, the LLM will ask clarifying questions”). One feels like all one has to do is to accept the guidance and follow the proposed menu.

But here is the catch: LLMs do not just suggest a possible menu, they shape it. Business leaders cannot assume that an “inquiry-driven” AI will cover all the important questions or the same questions that a human would. It takes human guidance to transform AI questioning into strategic thinking gains rather than misdirection or analysis paralysis. We propose several concrete actions for leaders to better prepare for, guide, and align LLM-assisted questioning with sound decision making.

For the specific decision you face, resist the tyranny of convenience. What does your decision need? Identify whether you are better served by partnering with a LLM or keeping the process fully human. If the former, identify what kind of LLM will be most useful: one that asks questions or one that only offers suggestions.

Mindfully choose the system you partner with. Our research shows that each LLM has its own question mix. In fact, each version has its own. Be mindful of the system you use and of its questioning profile, so that you can select one that serves you best for the decision you face. Consider not going with just one but with different ones to help you triangulate on a better question mix for your decision. This is especially salient if you’re using a company-specific system, as it may exacerbate the risk of engaging in groupthink or perpetuating biases.

Keep control. This means critically evaluating the LLM’s outputs. LLMs speak with assurance, even when unwarranted. A model can trick you into thinking that what it does is of high quality, even when it’s not, and use sycophantic statements that reinforce your confirmation biases. Although it is sometimes easier to accept competent-seeming suggestions containing phrases like “this is a great point, I’m glad you mentioned it” (which make the users perceive that there is a real person behind the “I” in that phrase) rather than doing the hard work ourselves, managers are first and foremost stewards of their organizations and, by extension, of the decisions they make. In that role, what matters isn’t doing what feels comfortable, but what is necessary.

Test the suggestions you get. Ask yourself “Must I believe this?” (i.e., look for opposing evidence) rather than “Can I believe this?” (look for supporting evidence). Additional ways of keeping control include regularly assessing your strengths and weaknesses against your LLM partner(s). Where is it better than me? Where am I better?

Pressure-test your decisions. Leaders hardly ever have an abundance of time to make their decisions. As a result, they are often compelled to front load the process with as much analysis as possible and delay the decision until the latest point, an approach meant to reduce uncertainty to the fullest extent.

An alternative is to budget your team’s time differently, allotting a moment after the decision point but before execution to conduct a pre-mortem. For example: “Assume we went with option X. We’re now six months later and it’s a disaster, which questions didn’t we ask?” This future-back approach might help you and your team identify any blind spots in your process that neither you nor the LLM assistant would spot in a different setting.

. . .

In the emerging era of inquiry-driven AI, the quality of the questions we get from LLMs may be as important as the quality of their answers. Forward-thinking leaders will treat this not as a quirk of technology but as an invitation to elevate how they decide. By understanding and guiding the way AI systems ask questions, and staying in the lead when we solicit questions from AI systems, we can benefit from their different perspective while compensating for their blind spots. This promises a new kind of partnership, one in which human wisdom and AI insights combine to ask and answer the questions that promote better decision making.

As the old saying (often attributed to Peter Drucker) goes, “The most serious mistakes are not the ones we make by giving the wrong answers, but by asking the wrong questions.” In the age of AI, aiming to ask better questions remains firmly in human hands, even when a system is voicing them.