Generative AI loading states
Ways to communicate to a user that generative AI is in the process of generating or processing a response.
On this page
Did this page help you?
Tell us more - optional
Components
Avatar
Visual representation of a user or generative AI entity.
Loading bar
A linear loading indicator that informs the user about an ongoing operation with unknown duration.
Key UX concepts
Latency
Generative AI experiences are commonly built by harnessing the power of LLMs (Large Language Models) and FMs (Foundational Models). The models available today vary in their ability to process and generate an output for any given prompt. As a result, the duration of time a user needs to wait for generative AI to return an output for their prompt (also called latency) can vary. To inform users that generative AI is actively working to produce an output for their prompt, a loading state is displayed for that duration.
Stages of loading
The duration for which a loading state is displayed while generative AI is processing and generating a response can depend on several factors such as latency, complexity of prompt sent by the user, and type of content being generated such as text, and image. Generative AI loading states can be categorized into two main stages:
After a user sends a prompt, generative AI starts processing the response and has no output to return back to the user yet. This stage is called processing. Once it starts generating a response while it continues to process the prompt, the process moves into the generation stage. The type of loading states to display depends on the duration and prominence of each of these stages.
Loading indicators and text
Loading indicators are visual elements that inform users about an ongoing operation. These indicators can be dynamic and utilize animation to capture users attention. Generative AI loading states use avatar and loading bar components for loading indicators. Pair loading indicators with visible loading text to inform users about the current state. This is to ensure that the information is accessible to all users.
Streaming
Streaming is a loading state where the output is displayed incrementally as it is being generated by the model. It is a built-in capability of a model where as soon as output tokens are generated, they are returned to the user. Streaming can only be enabled for models that support it. When enabled, generative AI returns a response to the user per character, word or sentence based on its processing capabilities, and usually appears like its typing. Streaming starts in the generation stage of generative AI loading.
Keep users informed
Optimization of latency in any generative AI experience depends on model capabilities, system architecture and complexity of prompts. However, the overall user experience can be enhanced by keeping users informed when generative AI is loading. It can help build trust and transparency with users. The following are some mechanisms to keep in mind:
Show a loading indicator with loading text. For example, the avatar component loading state with complimentary loading text in the chat bubble in a conversational generative AI experience.
Reduce time to response by incrementally returning a response instead of waiting for the complete response to be generated. For example, streaming a text response if supported by the model.
Show a loading state based on the type of content being loaded. For example, in conversational generative AI experiences, stream text responses and display a loading bar in the chat bubble to show that generative AI is loading a list of resources.
Common use cases
Processing a response
When generative AI is in processing stage upon receiving a prompt from the user in a conversation, display a loading indicator like the avatar with loading text in the chat bubble next to it.
Generative AI assistant
Generating a text response
When generative AI moves to generation stage in a conversation, start streaming the response if your model supports it. Incrementally return the text response and inline code snippets in the chat bubble to the user. Consider the following factors:
Display the loading avatar to provide additional affordance to users regarding the overall state of generative AI. This is especially helpful in situations when streaming slows intermittently and could potentially confuse users about the conclusion of the overall loading state.
If your model does not support streaming, display the loading avatar and loading text for both processing and generation stages of loading.
Generative AI assistant
Generating a response that contains other UI elements
When generative AI moves to generation stage in a conversation, and the response includes other UI elements such as list of resources, table, or code block display the loading bar with adjacent loading text. Consider the following factors:
Display loading bar with loading text in the chat bubble where the respective UI element will be rendered after loading is complete.
Display the loading avatar to indicate the overall state of generative AI, and stream text into the first chat bubble.
Generative AI assistant
General guidelines
Do
- Reserve streaming for text and inline code snippets only
Display the loading state for generative AI complimentary to the type of content it is returning to the user. If the response includes several types of content such as text, inline code snippets, individual code blocks, and tables, apply streaming to text and inline code snippets only. Show the loading bar for other content types such as tables, and code blocks.
Don't
- Avoid displaying a loading state for under one second
Displaying a loading state for under one second can seem jarring to users and can cause flickering in the UI. For example, If the model supports streaming and has rapid processing stage, don’t display the loading text in chat bubble, instead start streaming the response directly.
Writing guidelines
General writing guidelines
Use sentence case, but continue to capitalize proper nouns and brand names correctly in context.
Use end punctuation, except in headers and buttons. Don’t use exclamation points.
Use present-tense verbs and active voice.
Don't use please, thank you, ellipsis (...), ampersand (&), e.g., i.e., or etc. in writing.
Avoid directional language.
For example: use previous not above, use following not below.
Use device-independent language.
For example: use choose or select not click.
Component-specific guidelines
Loading message
Use the format: [Generating/Loading] [specific artifact]
Use generating when generative AI is compiling and building something net/new.
For example: Generating a response
Use loading or fetching when generative AI is pulling from something that already exists somewhere.
For example: Loading list of S3 buckets
Avoid end punctuation.
Accessibility guidelines
General accessibility guidelines
Follow the guidelines on alternative text and Accessible Rich Internet Applications (ARIA) regions for each component.
Make sure to define ARIA labels aligned with the language context of your application.
Don't add unnecessary markup for roles and landmarks. Follow the guidelines for each component.
Provide keyboard functionality to all available content in a logical and predictable order. The flow of information should make sense.
Component specific guidelines
Follow guidelines for avatar and loading bar.