Generative AI loading states

Did this page help you?

Tell us more - optional

Type of feedback

Description

1000 character(s) available. Do not disclose any personal, commercially sensitive, or confidential information.

Components

Avatar

Visual representation of a user or generative AI entity.

Loading bar

A linear loading indicator that informs the user about an ongoing operation with unknown duration.

Key UX concepts

Latency

Generative AI experiences are commonly built by harnessing the power of LLMs (Large Language Models) and FMs (Foundational Models). The models available today vary in their ability to process and generate an output for any given prompt. As a result, the duration of time a user needs to wait for generative AI to return an output for their prompt (also called latency) can vary. To inform users that generative AI is actively working to produce an output for their prompt, a loading state is displayed for that duration.

Stages of loading

The duration for which a loading state is displayed while generative AI is processing and generating a response can depend on several factors such as latency, complexity of prompt sent by the user, and type of content being generated such as text, and image. Generative AI loading states can be categorized into two main stages:

GenAI loading stages- processing and generation

After a user sends a prompt, generative AI starts processing the response and has no output to return back to the user yet. This stage is called processing. Once it starts generating a response while it continues to process the prompt, the process moves into the generation stage. The type of loading states to display depends on the duration and prominence of each of these stages.

Loading indicators and text

Loading indicators are visual elements that inform users about an ongoing operation. These indicators can be dynamic and utilize animation to capture users attention. Generative AI loading states use avatar and loading bar components for loading indicators. Pair loading indicators with visible loading text to inform users about the current state. This is to ensure that the information is accessible to all users.

Streaming

Streaming is a loading state where the output is displayed incrementally as it is being generated by the model. It is a built-in capability of a model where as soon as output tokens are generated, they are returned to the user. Streaming can only be enabled for models that support it. When enabled, generative AI returns a response to the user per character, word or sentence based on its processing capabilities, and usually appears like its typing. Streaming starts in the generation stage of generative AI loading.

Keep users informed

Optimization of latency in any generative AI experience depends on model capabilities, system architecture and complexity of prompts. However, the overall user experience can be enhanced by keeping users informed when generative AI is loading. It can help build trust and transparency with users. The following are some mechanisms to keep in mind:

Show a loading indicator with loading text. For example, the avatar component loading state with complimentary loading text in the chat bubble in a conversational generative AI experience.
Reduce time to response by incrementally returning a response instead of waiting for the complete response to be generated. For example, streaming a text response if supported by the model.
Show a loading state based on the type of content being loaded. For example, in conversational generative AI experiences, stream text responses and display a loading bar in the chat bubble to show that generative AI is loading a list of resources.

Common use cases

Processing a response

When generative AI is in processing stage upon receiving a prompt from the user in a conversation, display a loading indicator like the avatar with loading text in the chat bubble next to it.

Generative AI assistant

What can I do with Amazon S3?

Generating a response

Generating a text response

When generative AI moves to generation stage in a conversation, start streaming the response if your model supports it. Incrementally return the text response and inline code snippets in the chat bubble to the user. Consider the following factors:

Display the loading avatar to provide additional affordance to users regarding the overall state of generative AI. This is especially helpful in situations when streaming slows intermittently and could potentially confuse users about the conclusion of the overall loading state.
If your model does not support streaming, display the loading avatar and loading text for both processing and generation stages of loading.

Generative AI assistant

What can I do with Amazon S3?

Amazon S3 (Simple Storage Service) is a highly scalable, durable, and secure object storage service offered by Amazon Web services (AWS).

Generating a response that contains other UI elements

When generative AI moves to generation stage in a conversation, and the response includes other UI elements such as list of resources, table, or code block display the loading bar with adjacent loading text. Consider the following factors:

Display loading bar with loading text in the chat bubble where the respective UI element will be rendered after loading is complete.
Display the loading avatar to indicate the overall state of generative AI, and stream text into the first chat bubble.

Generative AI assistant

Show me all my EC2 instances in us-east-1.

You have 12 instances in Amazon Elastic Compute Cloud. Here are the details about all resources in us-east-1:

Loading list of EC2 instances

General guidelines

Do

Reserve streaming for text and inline code snippets only
Display the loading state for generative AI complimentary to the type of content it is returning to the user. If the response includes several types of content such as text, inline code snippets, individual code blocks, and tables, apply streaming to text and inline code snippets only. Show the loading bar for other content types such as tables, and code blocks.

Don't

Avoid displaying a loading state for under one second
Displaying a loading state for under one second can seem jarring to users and can cause flickering in the UI. For example, If the model supports streaming and has rapid processing stage, don’t display the loading text in chat bubble, instead start streaming the response directly.

Writing guidelines

General writing guidelines

Use sentence case, but continue to capitalize proper nouns and brand names correctly in context.
Use end punctuation, except in headers and buttons. Don’t use exclamation points.
Use present-tense verbs and active voice.
Don't use please, thank you, ellipsis (...), ampersand (&), e.g., i.e., or etc. in writing.
Avoid directional language.
- For example: use previous not above, use following not below.
Use device-independent language.
- For example: use choose or select not click.

Component-specific guidelines

Loading message

Use the format: [Generating/Loading] [specific artifact]
Use generating when generative AI is compiling and building something net/new.
- For example: Generating a response
Use loading or fetching when generative AI is pulling from something that already exists somewhere.
- For example: Loading list of S3 buckets
Avoid end punctuation.

Accessibility guidelines

General accessibility guidelines

Follow the guidelines on alternative text and Accessible Rich Internet Applications (ARIA) regions for each component.
Make sure to define ARIA labels aligned with the language context of your application.
Don't add unnecessary markup for roles and landmarks. Follow the guidelines for each component.
Provide keyboard functionality to all available content in a logical and predictable order. The flow of information should make sense.

Component specific guidelines

Follow guidelines for avatar and loading bar.

Did this page help you?

Tell us more - optional

Type of feedback

Description

1000 character(s) available. Do not disclose any personal, commercially sensitive, or confidential information.

General

Generative AI patterns

Resource management

On this page

Did this page help you?

Tell us more - optional

Components

Avatar

Loading bar

Key UX concepts

Latency

Stages of loading

Loading indicators and text

Streaming

Keep users informed

Common use cases

Processing a response

Generative AI assistant

Generating a text response

Generative AI assistant

Generating a response that contains other UI elements

Generative AI assistant

General guidelines

Do

Don't

Writing guidelines

General writing guidelines

Component-specific guidelines

Loading message

Accessibility guidelines

General accessibility guidelines

Component specific guidelines

Did this page help you?

Tell us more - optional