Programming

January 8, 2025

Building Real-Time NPC Dialogue in Unity with Large Language Models

Introduction

Real-time dialogue systems enhance immersion in games, especially when players can interact with NPCs dynamically. Integrating a large language model (LLM) like Qwen or OpenAI’s GPT into Unity allows for responsive NPCs that generate contextually appropriate replies and trigger in-game events. This blog post explains how to:

Use Unity to call an LLM API.
Construct effective prompts.
Parse model responses into NPC dialogues and game events.

Setting Up the API Connection

Prerequisites

Unity Engine installed (2021.3 or later).
Familiarity with C# scripting.
Access to an LLM API (e.g., Qwen or OpenAI GPT). Obtain an API key and endpoint URL from the provider.

Script Overview

We’ll use Unity’s UnityWebRequest for API calls. The script connects to the Qwen model to generate NPC replies.

Here’s a step-by-step breakdown:

public string apiKey = "your-api-key";
public string endpoint = "your-api-endpoint";
public string model = "qwen-plus"; // Replace with the specific model name

The apiKey and endpoint should be replaced with credentials provided by your LLM service. The model specifies the version of the language model to use.

Sending Messages

The SendMessageToYiQianWen function initiates communication with the API:

public async void SendMessageToYiQianWen(string message)
{
    try
    {
        string response = await GetYiQianWenResponse(message);
        DisplayResponse(response);
    }
    catch (Exception ex)
    {
        Debug.LogError($"Error: {ex.Message}");
        DisplayResponse("Network or API configuration issue.");
    }
}

The function:

Calls GetYiQianWenResponse to retrieve the model’s reply.
Displays the response on a Unity TextMeshPro (TMP) element.

Constructing the API Request

Requests are formatted in JSON, adhering to the LLM’s API schema:

var requestData = new
{
    model = model,
    messages = new[]
    {
        new { role = "system", content = "You are a helpful assistant." },
        new { role = "user", content = message }
    }
};
string jsonData = JsonConvert.SerializeObject(requestData);

Handling API Responses

The DisplayResponse function processes and presents model responses. For structured replies, the API should return JSON, which can be parsed into usable components like NPC replies and game triggers:

public void DisplayResponse(string response)
{
    var parsedResponse = JsonConvert.DeserializeObject<NPCResponse>(response);
    if (parsedResponse != null)
    {
        responseText.text = parsedResponse.npc_reply;
        if (parsedResponse.clue_unlocked?.Length > 0)
        {
            foreach (var clueId in parsedResponse.clue_unlocked)
            {
                TriggerClue(clueId);
            }
        }
    }
    else
    {
        Debug.LogWarning("Invalid response format.");
    }
}

Constructing Prompts for Real-Time Dialogue

The prompt guides the LLM’s behavior. Here’s an example:

You are "Li Xin," a middle-aged laboratory administrator in a key optics lab in Shanghai. Respond realistically to the player’s queries, staying within the game’s context. Format your output as:
{
    "npc_reply": "[NPC's reply]",
    "clue_unlocked": [array of clue IDs, empty if none],
    "context": "[Explanation or additional context]"
}

Tips for Effective Prompts:

Define NPC Roles: Specify the NPC’s personality, knowledge scope, and restrictions.
Set Context: Include game lore or NPC backstory to keep responses relevant.
Output Formatting: Clearly specify response structure (e.g., JSON).

Parsing Model Responses

The API response often includes nested JSON. For instance:

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "{\"npc_reply\": \"Welcome to the lab!\", \"clue_unlocked\": [1], \"context\": \"Lab orientation\"}"
            }
        }
    ]
}

To extract data:

Parse the outer response:

var outerResponse = JsonConvert.DeserializeObject<OuterResponse>(response);
string innerContent = outerResponse.choices[0].message.content;

Parse the inner JSON:

var innerResponse = JsonConvert.DeserializeObject<NPCResponse>(innerContent);

Use the extracted data to update game state and UI.

Triggering In-Game Events

The TriggerClue method handles game actions linked to the response:

private void TriggerClue(int clueId)
{
    switch (clueId)
    {
        case 1:
            Debug.Log("Clue 1: Pick up the bottle.");
            targetObject.position = targetPosition;
            break;
        default:
            Debug.LogWarning("Undefined clue ID.");
            break;
    }
}

Real-Time Speech Integration

For immersive dialogue:

Speech-to-Text (STT): Use a transcriber to convert player speech into text.
realtimeTranscriber.StartListening(); bufferText += transcribedText + " ";
Text-to-Speech (TTS): Convert NPC replies into audio using a TTS system.
SSTTS.SendTTSRequest(npcReply);

Periodic Updates

Buffering transcriptions and sending them at intervals improves performance:

private IEnumerator SendBufferedTextPeriodically()
{
    while (isDialogueActive)
    {
        if (!string.IsNullOrWhiteSpace(bufferText))
        {
            yield return SendBufferedText(bufferText.Trim());
            bufferText = "";
        }
        yield return new WaitForSeconds(5f);
    }
}

Conclusion

This setup enables rich, real-time NPC dialogues powered by LLMs in Unity. The combination of well-structured prompts, robust API handling, and seamless Unity integration creates a dynamic player experience. As LLMs improve, the potential for more interactive storytelling grows exponentially.

About this Post

This post is written by FFFeiya, licensed under CC BY-NC 4.0.

#Unity #LLM

Building Real-Time NPC Dialogue in Unity with Large Language Models

Building Real-Time NPC Dialogue in Unity with Large Language Models

Introduction

Setting Up the API Connection

Prerequisites

Script Overview

Sending Messages

Constructing the API Request

Handling API Responses

Constructing Prompts for Real-Time Dialogue

Parsing Model Responses

Triggering In-Game Events

Real-Time Speech Integration

Periodic Updates

Conclusion

About this Post

Using iFlytek's Speech Model in Unity for Text-to-Speech (TTS) Playback