Generative AI-driven features are now expected on digital platforms of every scale, across teams with both small and large budgets. The challenge product teams face is no longer access to powerful models, but the ability to translate that capability into consistent, valuable experiences.
This paper introduces the AI Value Generation Flywheel, a framework for designing AI-enabled experiences that earn trust through effort, build momentum through contextual awareness, and sustain engagement through the continual delivery of value. It encourages product teams to move beyond optimization for accuracy and to design for re-engagement.To move AI beyond novelty, the flywheel emphasizes intent recognition, differentiated value delivery, and the strategic use of system augmentation.
Each stage of the flywheel is explored in depth, from the imperative of determining user context to the importance of responding with the appropriate mode of output.In addition, this paper offers practical guidance for mitigating overfit, enabling chance discovery, managing ambiguity, and building toward self-sustaining engagement.The result is not just a set of design principles, but an experience strategy for teams aiming to move beyond AI integration and toward AI differentiation.
For the last two years, consumers have witnessed an “AI scramble” as companies have rushed to integrate “AI” into their products, primarily in the form of LLM-driven generative tools and assistants.
2025 marks the end of the AI scramble. OpenAI and Anthropic allow nearly anyone to leverage the inference capabilities of their models. Platforms from a number of providers make it possible for individuals or corporations to prompt-engineer their way to custom AI solutions, even with little to no programming knowledge. Perhaps most important is that model access and related tooling is affordable enough that nearly anyone with the seed of an idea can build a working prototype that has the same underlying capability as an enterprise solution.
There is no question that the capability of AI will continue to advance. Sure, there are conversations about scarcity of data necessary to improve LLMs, but that assumes LLMs are the only path forward. We are still in the early days of development, and the industry is still deciding whether LLMs are the only, or best solution. We are in the mainframe-equivalent era of AI when compared to the dawn of personal computing (see: Benedict Evans, The New Gatekeepers, https://www.ben-evans.com). The majority of capability is held by a handful of behemoths, who offer timeshare access to their models. While that centralization may change in time, what matters most right now is how we use the available tools to consistently achieve desired outcomes, and leverage AI to deliver real, measurable value to end users.
As we move past the initial hype cycle and barriers to entry continue to decline, products will no longer achieve relevance merely by incorporating vague AI capabilities. Instead, AI will become a standard, expected feature, driving relevance through the tangible value it provides to users, and the degree to which it reduces the effort required to achieve a particular outcome. Yet, AI features will be judged with greater scrutiny than other, more traditionally crafted features. Users will assess their value after every interaction to ensure that individual errors don’t quickly compound. They will consider whether any unpredictability or inaccuracy is outweighed by other benefits, including efficiency, ease of use, and reduction of effort.
Ultimately, every single interaction with an AI product will force the user to question whether the feature is truly useful, or merely a novelty, and the answer to that question will determine whether that user returns. A good experience may delay the question being asked again, but it soon will be. This scrutiny will be consistent and relentless, and even a shred of doubt can blossom into a decision not to continue. A decision that once made, will be extremely hard for the product to reverse.
This human scrutiny is even more critical when considered in the broader context of AI’s intended goal: Delivering outcomes so effective that they reduce the need for oversight and genuinely replace human effort. And in that future, consistency will be the clearest measure of value.
In their first wave, generative AI product integrations have followed a fairly uniform approach to communicating capability. This is likely because these integrations have also followed a fairly uniform approach to interaction: an open text field into which the user is expected to input a prompt or a question. This approach resembles a modern-day form of “surprise and delight” UX, where sample prompts stating “simply ask about…” encourage users to put in anything and marvel at the results.
This approach is effectively a hedge on the part of the product. A sample prompt makes no specific promises about how the resulting response will provide tangible value to a user’s individual workflow. This lack of specificity effectively buys the product a “free” engagement with the user for them to trial the output of any supported prompt. Prior to the resulting response, they likely have no preconceived notion as to whether it will be effective. In the case that the AI falls short, the user is prepared to have grace and try again. In the case that the AI succeeds, it has, through lack of specificity, exceeded the user’s expectations. In this way, each interaction with the AI can be compared to a health meter in video games. The more successful responses, the greater the acceptance for failure. Too many successive failures, and the desire to interact runs out.
As AI continues to mature, asking users to first imagine a feature’s capabilities before trying it will become increasingly problematic. Rather than allowing space for failure, users will set their expectations at the sky, set by the ever increasing number of highly-capable AI tools they have interacted with. We will reach this point by the end of 2025. A poor or low-value response, even if it is the user’s first attempt, will be enough to convince the user that the feature is incapable. Another risk, which is arguably worse, is an output that the user determines is “fine.” Such an outcome lacks the ability to create excitement for repeat use, and in the best case drives periodic engagement, and likely for low-stakes tasks. Put in business terms, the user is costing the product inference credits for activity that does not drive advocacy.
In their first wave, generative AI product integrations have followed a fairly uniform approach to communicating capability. This is likely because these integrations have also followed a fairly uniform approach to interaction: an open text field into which the user is expected to input a prompt or a question. This approach resembles a modern-day form of “surprise and delight” UX, where sample prompts stating “simply ask about…” encourage users to put in anything and marvel at the results.
This approach is effectively a hedge on the part of the product. A sample prompt makes no specific promises about how the resulting response will provide tangible value to a user’s individual workflow. This lack of specificity effectively buys the product a “free” engagement with the user for them to trial the output of any supported prompt. Prior to the resulting response, they likely have no preconceived notion as to whether it will be effective. In the case that the AI falls short, the user is prepared to have grace and try again. In the case that the AI succeeds, it has, through lack of specificity, exceeded the user’s expectations. In this way, each interaction with the AI can be compared to a health meter in video games. The more successful responses, the greater the acceptance for failure. Too many successive failures, and the desire to interact runs out.
As AI continues to mature, asking users to first imagine a feature’s capabilities before trying it will become increasingly problematic. Rather than allowing space for failure, users will set their expectations at the sky, set by the ever increasing number of highly-capable AI tools they have interacted with. We will reach this point by the end of 2025. A poor or low-value response, even if it is the user’s first attempt, will be enough to convince the user that the feature is incapable. Another risk, which is arguably worse, is an output that the user determines is “fine.” Such an outcome lacks the ability to create excitement for repeat use, and in the best case drives periodic engagement, and likely for low-stakes tasks. Put in business terms, the user is costing the product inference credits for activity that does not drive advocacy.
The potential outcome in all of these cases is that the user determines the AI feature to be unreliable or inadequate. Even worse, they could extend this determination to the product as a whole. In most cases, though, users will allow for some degree of inaccuracy with AI tools. While this expectation now feels universal, it cannot be understated how unique it is to AI, and the allowance it gives to experimental functionality.
Regardless, users expect that AI will make every attempt toward accuracy, so the key is to clearly acknowledge that AI can be incorrect, and design the entry point of the experience to demonstrate the functionality that the user should expect.
This approach prevents the user from setting their expectations at the moon only to have them dashed, and instead creates an opportunity for the AI to meet or even exceed expectations.
Each interaction with an AI-driven product resets the user’s expectations. A succession of positive experiences will create some allowance for negative ones, however, enough negative experiences will turn a user off to a product no matter how good their past experience was.
I will cover how this framework for assessment scales to multi-prompt and non-chat experiences, but because chat is an approachable and now familiar form of generative AI, let’s begin there.
In a chat experience, every system response is high-stakes.
An effective answer to the user’s prompt has the potential to drive retention or even advocacy for the AI-driven feature and the surrounding product. A poor answer will invite another attempt at best, but at worst, it can lead to permanent abandonment of both.
This make-or-break approach to experience design likely stems from the “black box” nature of LLMs and the frequently high degree of effort required to craft a prompt or subsequent clarification that attempts to clearly convey one’s intent to the model. Unlike more traditional UX flows, there is frequently not the same ability to undo a decision, or revert to a previous state. Some AI experiences allow for editing or removing a prompt, but a second attempt is no guarantee of success.
So, how do we confidently design an experience given such high stakes, and with no guarantee that every potential output will align to the user’s expectations?
Despite the constant ambiguity of LLMs, the solution for creating an effective experience is no different than any other UX feature: Focus intently on the people using the tool.
If you understand their pain points, their needs, and their motivations at each interaction with the AI, you can augment the system’s output with clear signals that the system is making every effort to provide value. These signals form an implicit dialog with the user instills confidence, builds trust, and ultimately give your product the capital it needs when the AI falls short. The experience can effectively refill the health meter by demonstrating effort and ambition to be accurate, even when it is not.
Effort, in this context, is expressed through strong understanding of the unique nature of an AI-enabled user journey, and recognition that both the inputs and outputs must consider the user’s unique point of view, their overarching goal, their immediate need, and their definition of successfully meeting that need. Effort on the part of the product will strengthen user confidence, and user confidence will naturally encourage repeat use.
How can design express effort without cluttering the experience with extraneous elements? By leveraging the three key principles necessary for every AI-enabled tool:
An AI-driven experience is not an end to end journey. It is circular. Every instance of user input and machine output is more than just a simple back-and-forth. Instead, it operates as a complete flywheel comprised of multiple stages that sustain user attention and satisfaction, and drive continuous engagement through the belief that they can derive limitless value from the experience.
Much like a mechanical flywheel, building initial momentum requires a great deal of energy. Once moving though, the flywheel can be somewhat self-sustaining, assuming all of the component parts remain effective. If any of the flywheel’s stages fail, the momentum will stall, and the user will abandon, and again require a great deal of energy to restart.
Long-term, the goal should be that the flywheel is entirely self-sustaining, maybe even transcending the laws of physics to become a perpetual motion machine. As the accuracy and impact of the AI improves, and context-driven augmentations refine its output, the value delivery will increase, even with fewer, or more abstract inputs. The ultimate result? Less need for individuals to provide detailed oversight and interaction.
This flywheel-driven approach ensures that every interaction contributes to building user trust and perceived value, encouraging continuous engagement with the tool’s output. The “energy” required to set the flywheel in motion is generated through the experience’s deep focus on how to maximize the impact and effectiveness of each stage given what is known about the capabilities of the system and the intent of the user.
There are three instances where this focus will be most impactful:
For new users, and in the early stages of an AI tool, the core AI-driven experience can, and should, be augmented with elements that supplement the AI technology. These elements might not even be driven by AI, and, given the current latency of LLM responses, shouldn’t be. For instance, before a user even inputs their first prompt, the experience should provide clear messaging that describes what the tool can and cannot do, as well as opportunities for getting started. Suggested prompts are an expression of confidence from the system, as are descriptions of supported use-cases. Examples of previously successful outputs can also signal the system’s aptitude and set user expectations accordingly. They may even generate excitement or anticipation for what opportunities the system can provide.
A user who has previously abandoned the experience will require a uniquely different welcome upon their return. An articulation of the tools advancement, would be table-stakes, while specific acknowledgement of the previous shortcoming can work to place the user at the center of the experience and assure them that the product is focused on generating positive outcomes for their individual need.
Once the user takes the leap and inputs their first prompt, every effort must be made to ensure an accurate interpretation of their ask, and deliver an effective response which meets their expectations, perhaps not only in terms of specific detail. Do they expect speed? Do they expect articulation? Do they expect to converse over the details of their ask? The system must make a judgement call based on what it understands about the user. Because in that initial interaction, the outcome that the system is really optimizing for is re-engagement.
The AI Value Generation Flywheel exists to guide that structure that ensures re-engagement. Each stage is designed to support the user journey, build trust, and create compounding value, whether the user is brand new, returning after a failed attempt, or already invested.
The AI Value Generation Flywheel is comprised of five key stages:
Understanding user context is paramount to delivering value with AI. For this reason, it is always the entry point to the flywheel, and feeds all of the following stages. In this case, context isn’t just the establishing prompt fed to the model. It is the complete understanding of the user’s background, expectations, and desires…to the degree that we can capture and ultimately model it.
To begin understanding the user, we must start by asking:
These are by all means not the only questions, and, while we can’t expect to answer all of these questions initially, they form a matrix that we should strive to fill as quickly and as thoroughly as possible. Some information will come via traditional analytics. Other insight about the user may be teased out through clever design. For example, the complexity (or lack thereof) of the first prompt may point to the user’s faith in the feature. Subsequent “clarifying questions” may be asked following the user’s first input that aim to understand them more than their question.
The system’s knowledge of who the user is, what the user knows, and what they wish to know is currency which is used to “buy” quality from the AI, and must be routinely gathered and fed back into the system. Without it, the subsequent stages of the flywheel will be forced to “guess,” creating friction, and reducing the momentum.
The next stage of the flywheel is often perceived by users as the first, since it marks their first material interaction with the system. There are many ways to approach the demonstration of value in an AI tool, but the “just ask...” model popularized by ChatGPT is often insufficient for most solutions. The reason it works for ChatGPT is that, for many users, the initial demonstration of value actually happened outside the product. The promise of its capabilities was communicated in the press, amplified by testimonials, and widely validated before users ever engaged with the system. Expectations were shaped by those examples, which built trust, curiosity, and even a degree of forgiveness for failure.
Over time, the demonstration of value provided by ChatGPT (and more recently by Claude and Perplexity) has become the benchmark against which other AI tools are judged, largely due to widespread exposure and high public expectations (see: OpenAI API). Anything less capable can feel underwhelming by comparison. As a result, AI products must clearly and specifically communicate their own value proposition to focus user expectations and differentiate from the mental model shaped by general-purpose LLMs.
There are two primary approaches to demonstrating the value of an AI-enabled tool.
The first is to speak to the tool’s capabilities. A common version of this is the “try me...” model described above, where suggested prompts or prompt preambles are shown when the user arrives. The thinking behind this approach is straightforward: if the tool encourages a particular type of question, it should be able to answer it. The issue, however, is that just because the tool can answer a question doesn’t mean it will answer it well. This illustrates the value gap between expectation and reality.
The second method is to show the outcomes the tool can produce. In its most literal form, this means displaying the actual response the system would generate for a given prompt. Accurate user context can make these examples feel more relevant, but in most cases the specific subject matter isn’t the point. What matters is that the user sees the quality of the output and thinks, “If the system can respond this well to that prompt, I bet it can respond well to mine.”
A key challenge with the second method is in the visual and experience design. Where can such examples live without disrupting the experience, as would be the case if forcing the user through a paced onboarding activity? Zindango’s deep research agent Apollo offers reports previously written by the agent. AnswerGrid has a video showing a quick but realistic demo of its Sheets product. Google’s AI Overview appeared seemingly overnight, layered onto existing functionality, and showed users how AI could deliver faster answers to their questions (see: Google Search Generative Experience).
The examples above detailed how the AI can demonstrate value before the user completes a journey through the flywheel. Following their first loop, and on every subsequent loop, value demonstration is a product of the combination of the AI’s output, the underlying product’s value stream, and the ancillary augmentations. All of which should be driven by the product’s contextual understanding of the user and their needs.
The structure of the output is as important as the content contained within. What method did the AI use to find its solution? How confident is it in its answer? In what ways might it be wrong? Are there multiple right answers? What steps should the user take to validate the presented information? By their nature, LLMs express confidence, even when they hallucinate, so crafting their response framework to be slightly more self-aware can go far in building user trust. Even a small degree of transparency regarding its interpretation of the user’s input, or its methodology for coming to a response will provide the user with tools to assess the answer, and signal whether, and how, to fact check.
Accuracy shouldn’t be the only metric by which an AI’s output is measured, especially in purpose-built AI tools. Instead, these systems should use AI to build differentiated experiences that deliver unique insights by leveraging the proprietary methodologies of the underlying platform. Put simply, a product’s AI-driven feature should provide greater, more specific value to the user than a general-purpose LLM alone.
Even more value emerges when three layers of intelligence are combined: the underlying model, the system’s unique approach, and the user’s individual perspective.
The first layer is where AI-driven products risk falling short. Too much reliance on the general-purpose model, even if fed with specific context and information about the user, will result in a shallow experience detached from the product’s value stream, or worse, hallucination that misrepresents the product’s core functionality.
If the AI functionality is achieved by prompt engineering an LLM, then the solution must tightly bind the LLM to the product’s value stream. This means adding appropriate guardrails and enabling the model to perform and combine proprietary actions effectively. This ensures that the AI focuses its output on the product’s unique information and methodologies.
Moreover, building methods for deeply understanding users will create opportunities to provide additional, richer context to the AI. These actions will help increase trust by demonstrating that the system understands the domain, the decision space, the benchmarks for quality, and, most importantly, the individuals who benefit from its output. These elements form the building blocks for differentiation.
The augmentations to the AI’s primary output may include:
These augmentations serve as further acknowledgment that the system is listening and aligned with the user’s intent and desired outcomes. While valuable in the current experience, they’re ultimately a stop-gap: a way of “guessing” at intent until the system can reliably recognize behavioral patterns and understand the user’s broader goal.
Once that level of intelligence is achieved, the ideal experience will be one where the system automatically identifies the right refinements, anticipates next steps, and delivers a response that fills in any gaps in the user’s input, while producing a meaningful, valuable output.
The system’s ability to fill those gaps will be proof that it can interpret abstract requests and arrive at the same (or better) result that the user would have achieved, had they completed the full process manually.
At that point, the role of augmentations will shift from assisting the user in achieving their desired outcome to helping them understand the knowledge the system applied on their behalf, including the insights it knew they lacked. The user may not always feel the need to explore that knowledge, but will know that if they do, applying it through a personal lens will help the system refine its approach, and continue toward becoming a true extension of the user.
Whether it’s a first-time user or someone evaluating the quality of a new output, these demonstrations of value are essential for building and maintaining trust. Once the user sees potential, the product must help them keep going. That’s where the next stage of the flywheel comes into play: facilitating engagement.
The following example shows how AI-augmented analysis can deliver contextual insights, suggested refinements, and engagement prompts. Here, AI interprets tabular EV charging session data and highlights cold weather inefficiency, demonstrating the flywheel’s“Determine Context,” “Demonstrate Value,” and “Facilitate Engagement” stages in action.
An example conversational AI tool that can reference tabular data, and specific data points, allowing the user to uncover insights with minimal effort and detailed understanding of the underlying data.
To increase the flywheel’s momentum while reducing human effort, the system must create relevant and meaningful opportunities for the user to interact. These opportunities should be clear, easily accessible, and require minimal effort.
Engaging with AI takes many forms. For many, the most familiar interaction is typing a prompt into an input field. But often, engagement is as simple as clicking a button in the case that the system has determined a potential next step, such as viewing related information.
These one-click interactions tend to sit at either extreme of the AI’s sophistication. On one end, they appear in less capable systems with a narrow range of supported functionality. On the other, they surface in highly sophisticated tools that can anticipate user needs and offer relevant actions based on recent behavior.
There are three key principles for meaningful interactions with AI, and all three work in tandem to achieve success following each user input.
AI-augmented interfaces also enable users to enrich structured data views. The following example shows an AI-driven table column that calculates charging efficiency, an illustration of the flywheel’s“Facilitate Engagement” and “Accurately Respond” stages.
An example interface for AI-powered table augmentation. Allowing users to integrating contextual AI output in tandem with structured data drives low-effort, high-impact engagement.
To create meaningful engagement, AI-driven tools must offer interactions that feel natural, intuitive, and timely. Contextual interactions typically surface in three ways: in relation to the AI’s output, associated with particular content, or in response to user behavior.
For example, an interaction contextually “paired” to the AI’s output could be a suggested follow-up prompt to drive deeper discovery or understanding. An interaction paired to content could take the form of a menu offering AI-enabled actions, such as summarization, the extraction of insights, or collecting similar materials based on that which the user has clicked.
When driven by user behavior, contextual interactions might recognize a broader activity from a series of smaller actions. They can also guide the user toward more efficient methods to achieve their goals. What unites all these interactions is their proactive nature. They do not “wait around” for the user to initiate them. Instead, they anticipate and present potential pathways, and actively encourage engagement.
Maximizing impact and user value through minimal interaction is a general UX principle, but is especially important with AI-driven experiences, where efficiency and ease of use determine long-term adoption. If the user must repeatedly explain themselves or correct the system’s output, then they have expended effort that may have been equally (or more) effective using another, non-AI driven approach. Such friction will result in user frustration, erode trust and slow the flywheel’s momentum.
Just as when demonstrating value, the North star goal of AI-enabled systems should be to anticipate user needs without requiring them to articulate it in detail. However, until the AI can correctly and reliably “read the user’s mind,” users should expect the interaction pattern to involve confirming or refining the AI’s interpretation of their intent. The accuracy of this interpretation can and should be increased through contextual understanding, but may continue to vary in accuracy as AI continues to advance.
Thoughtful selection of input paradigms, such as asking for clarification, prompting yes/no responses, or gauging importance on a sliding scale, can help the AI gather what it needs from the user in order to respond effectively while also continuing to learn.
User context gathered early in the flywheel, combined with insights from the user’s chosen interaction, helps the system interpret their intent more accurately than through interaction alone.
Understanding user intent can be thought of as an accumulation of layers, similar to the way an artist paints a canvas. An initial sketch lays out the general form, followed by layers of paint which add further detail until the final subject is revealed.
In AI-driven products, this process of understanding moves from a broad assumption to a refined, confident inference by combining general knowledge of the user’s archetype with detailed insights gathered through their recent and historical behavior. This includes their latest interaction, the previous actions leading up to that moment, and long-term usage patterns. This information can also be compared to the usage patterns of similar users who share key traits and behavior. By analyzing these layers together, the system builds a well-informed expectation of what the user is trying to achieve.
Once the system develops a theory about the user’s broader goal based on context and past behavior, it must translate this assumption and feed it into the AI‘s inference pipeline. This serves two key purposes:
Step two is much like a chess opening, where well-known patterns guide early moves, but as the event progresses, each response shapes a unique path. Similarly, while the system may recognize familiar trajectories toward a goal, user behavior can quickly diverge, forcing the AI to adapt dynamically.
At this point, the system faces a choice:
The AI’s threshold for signaling uncertainty must align with the user’s tolerance for interruptions. A key reference point is the user’s prompt and how they structured their input.
By interpreting these signals correctly, the AI can find the right balance between proactive decision-making and thoughtful inquiry when forming its understanding of the user’s activity. This expression of thoughtfulness will improve the user’s trust of the experience, align their expectations to the system’s capabilities, and keep them engaged.
The process of extrapolating user objectives is extremely valuable if the user is driving toward a large, multifaceted goal. But, what if their input is not in fact the first step toward a highly complex objective? What if they are looking for a single, concise answer, with no additional motive or intent?
The system should always include this eventuality as an option when assessing potential pathways that the user may take. Rather than assuming that every query is part of a larger activity, it must weight the likelihood that the user is just after a single, direct response.
However, users will likely not signal this intent. Instead, the system must infer it using the same contextual information detailed above: initial framing, prompt structure, language, and behavioral patterns. By recognizing that a single answer might be enough, the AI can mitigate against overwhelming the user with unnecessary elaboration beyond their request, or truncating an interaction that they expected would uncover greater detail than what was initially requested.
This piece has strongly argued in favor of using every available datapoint to understand the user and drive accuracy in AI-enabled features. However, when discussing AI and machine learning, it is essential to acknowledge the risk of overfitting, and explore strategies to address it. This is especially important for AI-driven solutions designed to surface new opportunities, insights, or pathways the user might not have otherwise considered.
Overfitting is when an algorithm’s inference is too closely tied to its understanding of the world, such that it excels in areas it understands, but struggles when considering new data (see: Microsoft Research, Beyond the Imitation Game). Imagine practicing for a spelling bee with a list of just five three-letter words. During practice, if asked to spell any of those five words, you could do so with 100% accuracy. Your model for the spelling bee is considered to be perfectly accurate in relation to its training data. But what happens on the day of the bee, when the list of words numbers in the hundreds, and their length is not limited? Your model of spelling comprehension, while flawless in testing, was poorly equipped to handle the reality of its intended use.
The risk of overfitting is inherent to any AI solution, but its impact can be mitigated through a number of specific steps, each of which generally align into approaches of consideration or remediation. These approaches are not mutually exclusive. Instead, they are better thought of as successive steps on the journey to fully-capable AI.
Even the most predictable individuals deviate from expected behavior. This truth must be accounted for when using AI systems. Just as the system should account for long-term vs. short-term objectives, it must also recognize when a user is exploring a new approach that doesn’t yet fit their established patterns. They may be experimenting, testing a different workflow, or simply shifting their research style.
To address overfitting, and prevent the user from feeling “tied” to a particular inference, the system must resist the urge to lock into rigid assumptions. It should continue adapting as it learns, while ensuring that some augmentations which are aligned to previous behavior patterns may need to shift or evolve over time before the user is comfortable with them disappearing entirely. After all, as the saying goes, “past performance is not indicative of future results.”
There is currently a broad expectation that AI can be inaccurate, and this understanding should continue to be embraced until more accurate systems have been discovered and trained.
Therefore, an effective frame of reference when designing experiences is one that considers methods by which the tools can articulate, either directly, or indirectly, the confidence they have in their inference, as well as the delta between the user’s current behavior, and that which the machine has modeled from previous demonstration.
This doesn’t mean that AI-enabled experiences should expose their statistical analysis of individual interactions, but instead that they should leverage augmentations to surface pathways that account for alternative eventualities driven by unforeseen context or prompts. It can then learn from the outcome, apply the resulting context to the next interaction, and maintain the momentum of the flywheel.
Consideration of the impact of overfitting should not be misconstrued as the need to prevent it entirely. If AI-enabled systems become too cautious in order to avoid mistakes, they will fail to provide the very value they have been built for: offloading work and intellectual labor from their human counterparts.
Until models can be trained more accurately and with less data, there will always be situations in which the AI cannot use its existing knowledge to address all potential circumstances. This lack of knowledge should not prevent the system from taking its best “guess” while allowing users to retain control.
They must be provided clear ways to intervene or reverse course when the AI makes incorrect assumptions, or even be given the opportunity to “teach” the system their preferred outcome in such scenarios.
Interventions may appear simple on paper, for instance allowing the user to stop an action mid-flow. But as AI decisions accumulate, course correction grows more complex. Users must be able to unwind AI-driven outputs, removing short-term context and memory associated with steps they deem irrelevant or incorrect. Without this capability, AI risks leading users down rigid paths that limit true discovery.
Within the flywheel, asserting confidence in user intent to the point that augmentations are significantly reduced—or eliminated altogether—should not happen prematurely. Only once the system can recognize the user’s behavior in an overwhelming number of circumstances, and once the flywheel has gained enough momentum, should it shift toward a more confident, “all-knowing” approach.
Context-aware AI can also surface proactive engagement prompts based on user behavior, and increase opportunities for taking action on specific elements within a view. In this example, the user is harnessing AI to explore emerging patterns in the dataset. Such functionality helps drive re-engagement and supports chance discovery.
An example UI for providing context-aware, element-specific engagement, enabled by AI.
The ability for an AI-enabled system to accurately provide information or respond to the user’s prompt comprises the final stage of the flywheel and is also the catalyst for propelling them into another cycle. Every stage prior has led to the system’s response by attempting to collect enough information from the user’s background and circumstance to provide the AI with the information it needs to respond accurately and meaningfully.
The determination of a “good” or “effective” answer is in the balance of accuracy with meaningfulness, as achieving one does not guarantee accomplishing the other. An accurate response could completely ignore the user’s true intent, while a meaningful one could simply be a response that quickly indicates that the tool will be ineffective for the user’s needs.
As has been the theme throughout each stage of the flywheel, the system’s output must adapt to the given context, and shape its answer around what it can infer about the user’s immediate and long-term intent. The element that cannot be overlooked though, is the need to not only optimize for user satisfaction, but also re engagement.
There will be a point that artificial intelligence will be aware of what its users do and do not know, and even what they do and do not have the appetite and capacity to learn. Such a point will represent a sea change in AI, and fundamentally reshape the way that information is delivered. It will also add new considerations to each stage of the flywheel. That moment is not yet here, however, it is an important consideration when thinking about how context informs the content and delivery method of an AI’s response.
An AI-enabled system must first determine how many times a user has previously interacted with it before responding to their prompt. This contextual determination will inform the manner in which the system frames the response in order to effectively drive the flywheel’s momentum.
Initial and early interactions should consider providing the user with additional background detailing the system’s “thought process” in order to instill confidence that AI methodology is aligned to the user’s expectation. The experience doesn’t need to be akin to a full “walkthrough” of the AI’s capabilities, but rather resemble an initial meeting of two colleagues, where each others’ backgrounds and expertise is shared. This approach will prove most valuable for products that boast a unique value stream amongst similar competitors, as it will be the moment in which the feature can underscore its unique approach and reflect it in the language of the response. The hope is that transparency will build the initial foundation for trust, and prevent any initial inaccuracy from deterring further use. In cases of inaccuracy, the user will learn how to better communicate with the system, and in turn it will learn how to interpret the user’s intent.
Over time, and as the user gains trust for the system, the need for the AI to qualify its responses can be reserved for more complex inputs, or cases where it infers that the user would benefit from further explanation, rather than for every response. Just as when addressing overfit, the system should be firm with this pivot, but allow the user to throttle the progression if necessary. Once the AI has determined that the user is satisfied by the core elements of its response, this understanding can then feed the following context stage of the flywheel, and inform the augmentations on the next rotation.
While it may seem that there are many cases in which intent can be derived almost directly through the user’s input and contextual information “How tall is Big Ben?” the ability of an AI-enabled tool to determine the user’s broader goal will make or break their confidence and potentially their desire to continue engaging. To use this example, did the user really just want to know how tall the parliamentary clock tower is, or is it just one datapoint referenced in a broader activity?
Is the user:
Their intent changes the definition of a “useful” response. Selecting the wrong one introduces friction at best, and at worst erodes user confidence that the system has actually been listening.
This is where the importance of the flywheel’s previous stages come into play, in particular Determine Context and Recognize Objectives. By considering the larger context of the conversation as well as the previous stages of the flywheel, the AI should be able to offer both the height of Big Ben (315 feet) and the importance of that datapoint relative to the user’s broader activity, for example, noting that it’s far shorter than many of London’s modern skyscrapers, or falls within the legal ceiling for recreational drone flights.
Determining the output mode for AI systems is currently one of the more complex challenges, as they must feel confident enough in the user’s intent to change or insert a form of output that was not (necessarily) asked for by the user. The simplest example is a case in which the user asks for a collection of entities or attributes. Did they want to receive them as a list? As a table? Or even visualized as a chart?
A more complex example is OpenAI’s introduction of the “Canvas,” which instantiates itself when the AI determines that the user is working on a “creation” activity such as coding or writing. Beyond simply injecting an element into the conversation flow, the Canvas will transform the view entirely, moving the chat area to the left and completely altering the interaction paradigm to be more command oriented. If this appears against the user’s wish, it is jarring and forces their intervention to reverse.
In other situations, the Canvas may not appear despite the user’s expectation or instruction that it should. This too is a frustrating experience which requires human effort to overcome. Despite a certain degree of understanding for AI inaccuracy, lack of consistency in the user experiences beyond just the answer adds additional friction for the user, and erodes trust in the same manner, if not more, than inaccurate inference. For this reason, AI tools must put mechanisms in place to ensure confidence in the selection of the output mode when responding to the user.
The first step necessary for the AI-enabled system to offer variation in modes is to determine and define which modes it has access to or can create. Not only will this help define mode-selection logic, it will also create the necessary guardrail in the case that a user asks for a mode which the system cannot provide. Part of the definition of modes will also include the logic surrounding when each should be used. Some examples include comparing like-information, showing trends, displaying code, compiling a document, or editing a large text.
When writing such logic within an AI-driven system, especially one powered by LLMs, asking the LLM to determine when and what mode to show may appear to be a natural extension of its inference responsibilities. However, the ambiguous nature of LLMs referenced elsewhere in this piece again introduces the potential for unknown in the final stage of the flywheel, immediately before the user determines the value of the output and whether to re engage.
Instead, more predictable frameworks such as conditional logic or symbolic AI should be used for mode determination as a means of tightly controlling when, and how, the various modes are used. The one caveat is that while mode selection should not be left to chance, there may still be opportunities to allow the LLM to interpret the user’s request, and provide that interpretation to the structured logic as the basis for its selection process.
The easiest way to determine the output mode is by asking the user to explicitly indicate their choice. This can be done in two ways: The first is through their language, in effect looking for “trigger words” in their prompt. “Find the newest cocktail bars in New York that also serve food, and present them in a table along with their neighborhood and operating hours” is a clear command that the user is asking for a table with explicit contents. However, if the prompt were “Find me the newest cocktail bars in New York.” is far more ambiguous. In such cases, it would be the user’s expectation that the system selects the mode it determines makes the most sense given all previous stages of the flywheel, or offers the ability to change the mode by referencing its library of options.
For example, the system may first respond with a text-based list, but then offer the user the option to view the response as a table, perhaps with columns including the neighborhood, operating hours, and price range. The design of the broader experience would determine whether the table would replace the initial response or simply add to it.
An alternative method for the user to indicate the AI’s response format is for the system to first select from the available modes and present the option or options to the user prior to outputting the response. The approach of clarifying user intent through additional and unexpected conversation does add friction to the experience, but the tradeoff is that it mitigates the risk of even further conversation to alter an incorrect selection of mode. The user experience of asking users “follow up” questions to their initial prompt is becoming more familiar through reasoning and deep research models, and can be effective when the user has an understanding of the number of follow ups that will be asked.
The ability for an AI-enabled system to accurately provide information or respond to the user’s prompt comprises the final stage of the flywheel and is also the catalyst for propelling them into another cycle. Every stage prior has led to the system’s response by attempting to collect enough information from the user’s background and circumstance to provide the AI with the information it needs to respond accurately and meaningfully.
The determination of a “good” or “effective” answer is in the balance of accuracy with meaningfulness, as achieving one does not guarantee accomplishing the other. An accurate response could completely ignore the user’s true intent, while a meaningful one could simply be a response that quickly indicates that the tool will be ineffective for the user’s needs.
As has been the theme throughout each stage of the flywheel, the system’s output must adapt to the given context, and shape its answer around what it can infer about the user’s immediate and long-term intent. The element that cannot be overlooked though, is the need to not only optimize for user satisfaction, but also re engagement.
The examples referenced so far have referred to the AI’s output as a “response” to the user’s prompt or input. The mental model of a “chat” or conversation was intentionally chosen due to the ubiquitous presence of turn-based conversational interfaces that most people familiar with AI can imagine. However, AI chatbots are just the first phase of more complex AI interfaces, evidenced by more efficient experiences already in-market. Examples include Google Analytics AI Insights, and Booking.com’s use of AI to aid customers with trip planning based on their preferences. Neither of these require their end users to compose verbose, text-based prompts, and instead gather their necessary inputs through other user actions or selections, which are in many cases indirect.
Another approach to AI-driven experiences that is becoming increasingly prevalent, are what might be classified as “multi-prompt” interfaces. These are just as they sound, where a single view of a product may incorporate multiple AI input/output pairings to analyze and present information to the user in a more valuable or actionable way. Examples include Kenley (formerly AnswerGrid), Hebbia, and AlphaSense Generative Grid, which allow users to create spreadsheet views where individual columns are populated by generative AI prompts.
Alternatively, platforms may offer users the ability to compose, save, and reuse “prompt chains” to automate repetitive workflows. These workflows often rely on multiple prompts that may be complex or verbose, but can be easily adapted to new activities by modifying just a few key references. These experiences might be described as “agents” or “workflows.” The key distinction is that the user can affect each activity in the chain, as opposed to just the outcome. Some companies making waves with this approach include Rogo and DaisyChain.
While these platforms aren’t conversational in nature, the interaction with the AI’s capability still consists of context and a goal which are fed into the system, initiate the inference, and result in an output. In some cases several outputs per second. Whether there is explicitly user input, one prompt, or many, these experiences all qualify as “AI-enabled systems” and must adhere to the flywheel, though in a manner that is slightly nuanced. Multiple prompts used independently to compose a view may “diversity” the AI-driven feature’s opportunity for success, where prompts that generate quality outputs “make up” for those that do not. However, if those underperforming prompts result in substantial user effort to correct, their negative impact has the potential to plague the rest of the experience. It may seem excessive, but every individual AI interaction must be governed by the flywheel, and potentially include its own augmentations, no matter how minimal its perceived impact on the larger experience.
Multi-prompt “chains” compound the potential for a negative experience, as a single inaccurate or ineffective prompt will impact all that follow. Therefore, adequate visibility must be built into each step of the chain in order to provide users with the greatest opportunity to see how the output of one informs the next, so that they can diagnose and correct any inaccuracy. For these “sequenced” experiences, augmentations may be associated with the whole chain, but the design may consider offering the user suggestions about how they may improve individual links. This approach will help the user maximize the value of the individual steps, and ultimately the experience as a whole. While complex, the context gathered through the user’s modification of each aspect of the experience will allow the many flywheels working in harmony to support and drive each other, while contributing to the momentum of the whole.
Whether used in isolation or chained across workflows, each prompt is a moment of earned trust. Multi-prompt systems multiply that responsibility, and the opportunity to compound value.
Multi-prompt interfaces further expand the potential of AI-enabled workflows. The following example shows an interface which provides the ability to save or create a template from a prompt, which can extend the value they experienced from a single interaction, and positive drive re-engagement.
An example interface for allowing the user to save prompts for compilation into a multi-prompt AI tool or agent for use elsewhere within their platform.
Building on the previous example, AI-driven conditional logic allows users to automate various tasks within a platform. The following screen demonstrates an AI-defined “if/then” condition that triggers targeted alerts, enabling scalable, adaptive interaction patterns.
Example interface for AI-driven conditional logic. Users define “if/then” conditions (with AI assistance) to automate responses and trigger alerts, enabling scalable, adaptive workflows aligned to user objectives.
The principles and practices outlined in the AI Value Generation Flywheel are not static. As models improve, interfaces evolve, and users grow more fluent in working with AI, the balance between transparency, control, and automation will shift. This will require AI product teams to continuously reevaluate not just how well their tools work, but whether they are building experiences that users can grow with. They must create experiences that scale from guided exploration to confident delegation.
Designing for this shift means looking beyond immediate tasks and toward an AI-enhanced relationship built on long-term trust, where systems become both more capable and more invisible. It is not enough to simply deliver a correct response. AI tools must demonstrate understanding, context, and intent, and they must do so consistently, across every loop of the flywheel.
As AI-enabled features mature from novelty to expectation, the core question facing product teams is no longer how to integrate AI, but how to make it work in a manner that is reliable, consistent, and valuable, such that it naturally drives re-engagement.
The AI Value Generation Flywheel offers a framework for doing just that. More than a conceptual model, it acts as a cross-functional anchor, guiding product, design, engineering, and leadership toward shared goals.
The flywheel is not a feature, it is the sum total of how an AI-enabled experience is conceived, executed, and evolved. It represents a systemic shift in how we apply AI to user problems. And it demands a shift in mindset: from building for AI to building with it, and continuously asking, “Are we earning the next interaction?”
Historically, trust in AI has been measured by accuracy. But as AI becomes more participatory and is embedded not just in tools, but in workflows and decisions, that definition must evolve (see: Nielsen Norman Group, The UX of Trust in AI).
Users will judge trustworthiness not only by what a system gets right, but how it shows up:
In this sense, trust becomes less about whether the AI’s response is always “right,” and more about the system’s demonstration of effort, care, and growth. Just as humans extend trust to collaborators who show commitment, users will increasingly reward systems that are self-aware, communicative of uncertainty, and aligned to their success.
As AI becomes more autonomous, trust will no longer be won at the model level, but at the experience level. This means that product thinkers, designers, and builders are no longer just shipping features. They are now the stewards of trust, the most critical factor in long-term AI adoption.
The future of AI-driven experiences hinges on consistency, contextual intelligence, and measurable outcomes that stimulate reengagement. As generative AI becomes table stakes on all platforms, users will evaluate its presence not just by its existence, but by its effectiveness, measured interaction by interaction.
The AI Value Generation Flywheel provides a structure for meeting that demand. It reframes AI experience design as an ongoing relationship, where each loop through the flywheel deepens system understanding and user trust.
The path forward for product builders is not simply to offer generative capability, but to embed differentiated value, through proprietary logic, domain-specific insight, and adaptive outputs that align to user goals. By doing so, AI becomes more than a utility. It becomes a collaborator. And in that future, the best experiences won’t just answer questions, they will understand the user well enough to anticipate what comes next.