Building Real-Time NPC Dialogue in Unity with Large Language Models
Introduction
Real-time dialogue systems enhance immersion in games, especially when players can interact with NPCs dynamically. Integrating a large language model (LLM) like Qwen or OpenAI’s GPT into Unity allows for responsive NPCs that generate contextually appropriate replies and trigger in-game events. This blog post explains how to:
- Use Unity to call an LLM API.
- Construct effective prompts.
- Parse model responses into NPC dialogues and game events.
Setting Up the API Connection
Prerequisites
- Unity Engine installed (2021.3 or later).
- Familiarity with C# scripting.
- Access to an LLM API (e.g., Qwen or OpenAI GPT). Obtain an API key and endpoint URL from the provider.
Script Overview
We’ll use Unity’s UnityWebRequest
for API calls. The script connects to the Qwen model to generate NPC replies.
Here’s a step-by-step breakdown:
|
The apiKey
and endpoint
should be replaced with credentials provided by your LLM service. The model
specifies the version of the language model to use.
Sending Messages
The SendMessageToYiQianWen
function initiates communication with the API:
|
The function:
- Calls
GetYiQianWenResponse
to retrieve the model’s reply. - Displays the response on a Unity TextMeshPro (TMP) element.
Constructing the API Request
Requests are formatted in JSON, adhering to the LLM’s API schema:
|
Handling API Responses
The DisplayResponse
function processes and presents model responses. For structured replies, the API should return JSON, which can be parsed into usable components like NPC replies and game triggers:
|
Constructing Prompts for Real-Time Dialogue
The prompt guides the LLM’s behavior. Here’s an example:
|
Tips for Effective Prompts:
- Define NPC Roles: Specify the NPC’s personality, knowledge scope, and restrictions.
- Set Context: Include game lore or NPC backstory to keep responses relevant.
- Output Formatting: Clearly specify response structure (e.g., JSON).
Parsing Model Responses
The API response often includes nested JSON. For instance:
|
To extract data:
Parse the outer response:
var outerResponse = JsonConvert.DeserializeObject<OuterResponse>(response);
string innerContent = outerResponse.choices[0].message.content;Parse the inner JSON:
var innerResponse = JsonConvert.DeserializeObject<NPCResponse>(innerContent);
Use the extracted data to update game state and UI.
Triggering In-Game Events
The TriggerClue
method handles game actions linked to the response:
|
Real-Time Speech Integration
For immersive dialogue:
Speech-to-Text (STT): Use a transcriber to convert player speech into text.
realtimeTranscriber.StartListening();
bufferText += transcribedText + " ";Text-to-Speech (TTS): Convert NPC replies into audio using a TTS system.
SSTTS.SendTTSRequest(npcReply);
Periodic Updates
Buffering transcriptions and sending them at intervals improves performance:
|
Conclusion
This setup enables rich, real-time NPC dialogues powered by LLMs in Unity. The combination of well-structured prompts, robust API handling, and seamless Unity integration creates a dynamic player experience. As LLMs improve, the potential for more interactive storytelling grows exponentially.
About this Post
This post is written by FFFeiya, licensed under CC BY-NC 4.0.