Difficulty: easy
Description: Design ChatGPT
ChatGPT is an AI assistant powered by large language models (LLMs) that can understand and generate human-like text. When a user interacts with ChatGPT, they provide a text prompt and receive a generated response. This process involves several key components and concepts:
In this design problem, we are specifically focusing on the inference infrastructure - the system that allows users to interact with an already trained LLM. We are not designing the training infrastructure, which is a separate, computationally intensive process that creates the model.
Training (Not in Scope) Dataset Collection & Curation LLM Architecture Training Model Parameter Optimization Inference (Our Focus) User Prompt Processing Token Generation & Streaming Chat History Management DeploymentIn other words, we are designing ChatGPT, the application that allows users to chat with an AI assistant.
Inference is the process of using a trained model to generate responses:
The inference process is computationally expensive, requiring specialized hardware like GPUs or TPUs. This is why most applications don't run LLMs locally but instead connect to remote inference servers.
In this problem, we're designing the application infrastructure that:
We'll assume the existence of an inference server that provides the actual model capabilities, and focus on building the system around it.
The traffic pattern of ChatGPT differs significantly from that of typical consumer applications. In contrast to platforms like Instagram—where content written by one user is read by many others—ChatGPT serves a strictly 1:1 interaction model between each user and the AI assistant.
Additionally, in real-world systems, it's common for traffic to be throttled or dropped when inference servers are overwhelmed. (You've probably seen the “Please try again later” message while chatting with overloaded AI services.)
Unique Load Profile:
This means we don’t need to use our master system design template, which is optimized for read-heavy workloads scenarios and eventual consistency.
Instead, our focus should be on:
Login and Retrieve Chat History: Users should be able to retrieve and view their chat history upon login.
Send Chat and Store Response: Users send new prompts and receive real-time AI-generated responses.
Retrieve past chat history for the current user session.
Response Body:
{ chats: [ { id: string, user_message: string, assistant_response: string, timestamp: string } ] }
Submit a new chat prompt and receive a streamed or full response.
Request Body:
{ message: string, stream: boolean (optional, default = false) }
Response Body:
Non-streaming (stream=false): { message_id: string, response: string } Streaming (stream=true): data: {"content": "Hello"} data: {"content": " world"} data: [DONE]
Persist a completed conversation turn (user + assistant) into the database.
Request Body:
{ message_id: string, user_message: string, assistant_response: string, timestamp: string }
Response Body:
{ status: "success" }
Users should be able to retrieve and view their chat history upon login.
When the user logs in, the Client calls the App Server to fetch chat history. The App Server queries the Database using user_id, retrieves chat logs, and sends them back to the client to render.
Users send new prompts and receive real-time AI-generated responses.
The Client sends a new chat message to the App Server. It forwards the prompt to the Inference Server. The response from the LLM is returned to the App Server.
Afterward, the App Server requests the Database to store the full chat exchange (user + assistant message). Once confirmed, the App Server delivers the response to the Client.
Streaming in ChatGPT is implemented using incremental token generation on the server and real-time rendering on the frontend. This improves perceived latency and mimics human-like typing behavior.
stream=true
flag on the API request.Example token stream:
data: {"choices":[{"delta":{"content":"The"}}]}
data: {"choices":[{"delta":{"content":" cat"}}]}
data: [DONE]
EventSource
.[DONE]
signal tells the UI to stop rendering.Example code:
const source = new EventSource('/chat?stream=true'); source.onmessage = (e) => { const token = JSON.parse(e.data).choices[0].delta.content; appendToTextbox(token); };
This part of the guide will focus on the various components that are often used to construct a system (the building blocks), and the design templates that provide a framework for structuring these blocks.
At the bare minimum you should know the core building blocks of system design
With these building blocks, you will be able to apply our template to solve many system design problems. We will dive into the details in the Design Template section. Here’s a sneak peak:
Additionally, you will want to understand these concepts
On top of these, there are ad hoc knowledge you would want to know tailored to certain problems. For example, geohashing for designing location-based services like Yelp or Uber, operational transform to solve problems like designing Google Doc. You can learn these these on a case-by-case basis. System design interviews are supposed to test your general design skills and not specific knowledge.
Finally, we have a series of practical problems for you to work through. You can find the problem in /problems. This hands-on practice will not only help you apply the principles learned but will also enhance your understanding of how to use the building blocks to construct effective solutions. The list of questions grow. We are actively adding more questions to the list.
Pro Member Exclusive
Upgrade your account to continue
Benefits
Unlimited access to practice tool with AI grading
Unlimited access to expert-written solutions
Unlock 100+ lessons with full course access
Access to all future content while subscribed
If you often open multiple tabs and struggle to keep track of them, Tabs Reminder is the solution you need. Tabs Reminder lets you set reminders for tabs so you can close them and get notified about them later. Never lose track of important tabs again with Tabs Reminder!
Try our Chrome extension today!
Share this article with your
friends and colleagues.
Earn points from views and
referrals who sign up.
Learn more