在 Unity 中用大语言模型构建实时 NPC 对话

引言

实时对话系统能显著提升游戏沉浸感：玩家不再只是点选固定选项，而是能用自然语言与 NPC 互动。把大语言模型（LLM）如 Qwen 或 OpenAI GPT 集成进 Unity，可以让 NPC 生成符合情境的回复，并且触发对应的游戏事件。本文会解释如何：

在 Unity 中调用 LLM API。
设计有效的 Prompt（提示词）。
将模型响应解析为 NPC 对话与游戏事件。

配置 API 连接

前置条件

已安装 Unity（2021.3 或更高）。
熟悉基本 C# 脚本。
拥有可用的 LLM API（例如 Qwen 或 OpenAI GPT）的 API Key 与 endpoint URL。

脚本概览

我们使用 Unity 的 UnityWebRequest 来发起 API 请求。示例脚本会连接到 Qwen 模型生成 NPC 回复。

下面是几个核心字段：

public string apiKey = "your-api-key";
public string endpoint = "your-api-endpoint";
public string model = "qwen-plus"; // Replace with the specific model name

apiKey 与 endpoint 需要替换成服务方提供的值；model 用来指定要调用的模型版本。

发送消息

SendMessageToYiQianWen 函数负责启动一次完整的请求流程：

public async void SendMessageToYiQianWen(string message)
{
    try
    {
        string response = await GetYiQianWenResponse(message);
        DisplayResponse(response);
    }
    catch (Exception ex)
    {
        Debug.LogError($"Error: {ex.Message}");
        DisplayResponse("Network or API configuration issue.");
    }
}

它做了两件事：

调用 GetYiQianWenResponse 获取模型回复；
把回复交给 DisplayResponse 做展示与后续解析。

构造 API 请求

请求通常按服务方 schema 组织成 JSON。下面示例把 system/user 两条消息放进 messages 数组里：

var requestData = new
{
    model = model,
    messages = new[]
    {
        new { role = "system", content = "You are a helpful assistant." },
        new { role = "user", content = message }
    }
};
string jsonData = JsonConvert.SerializeObject(requestData);

处理 API 响应

DisplayResponse 会将返回字符串解析成结构化数据。若服务端返回的是可解析 JSON，你就可以把“NPC 回复”“线索解锁”等字段直接取出来，驱动游戏逻辑：

public void DisplayResponse(string response)
{
    var parsedResponse = JsonConvert.DeserializeObject<NPCResponse>(response);
    if (parsedResponse != null)
    {
        responseText.text = parsedResponse.npc_reply;
        if (parsedResponse.clue_unlocked?.Length > 0)
        {
            foreach (var clueId in parsedResponse.clue_unlocked)
            {
                TriggerClue(clueId);
            }
        }
    }
    else
    {
        Debug.LogWarning("Invalid response format.");
    }
}

为实时对话构建 Prompt

Prompt 决定了模型“扮演谁”“知道什么”“怎么输出”。下面是一个示例：

You are "Li Xin," a middle-aged laboratory administrator in a key optics lab in Shanghai. Respond realistically to the player’s queries, staying within the game’s context. Format your output as:
{
    "npc_reply": "[NPC's reply]",
    "clue_unlocked": [array of clue IDs, empty if none],
    "context": "[Explanation or additional context]"
}

Prompt 编写建议：

定义 NPC 角色：写清性格、知识范围、禁区（例如不能透露某些信息）。
提供上下文：塞入世界观、背景、任务状态，让模型更不容易“跑题”。
强约束输出格式：明确要求 JSON（或其他结构），这样你才能稳定解析并触发事件。

解析模型响应

不少 LLM API 会返回“外层 JSON + 内层字符串 JSON”的嵌套结构。例如：

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "{\"npc_reply\": \"Welcome to the lab!\", \"clue_unlocked\": [1], \"context\": \"Lab orientation\"}"
            }
        }
    ]
}

提取步骤可以分两层：

先解析外层：

var outerResponse = JsonConvert.DeserializeObject<OuterResponse>(response);
string innerContent = outerResponse.choices[0].message.content;

再解析内层 JSON：

var innerResponse = JsonConvert.DeserializeObject<NPCResponse>(innerContent);

用解析后的字段更新 UI 与游戏状态。

触发游戏内事件

TriggerClue 用来把“线索 ID”映射到具体游戏行为：

private void TriggerClue(int clueId)
{
    switch (clueId)
    {
        case 1:
            Debug.Log("Clue 1: Pick up the bottle.");
            targetObject.position = targetPosition;
            break;
        default:
            Debug.LogWarning("Undefined clue ID.");
            break;
    }
}

你可以把它扩展成：开门、改 NPC 状态机、推进任务阶段、解锁 UI、写入存档等。

实时语音集成（可选）

为了更沉浸，可以把语音输入/输出加进来：

语音转文本（STT）：把玩家说话转成文本缓冲。

realtimeTranscriber.StartListening();
bufferText += transcribedText + " ";

文本转语音（TTS）：把 NPC 回复变成音频播出来。
SSTTS.SendTTSRequest(npcReply);

周期性发送（缓冲策略）

把实时转写的碎片拼成一段再发，通常能减少请求频率、降低抖动，并且让模型看到更完整的上下文：

private IEnumerator SendBufferedTextPeriodically()
{
    while (isDialogueActive)
    {
        if (!string.IsNullOrWhiteSpace(bufferText))
        {
            yield return SendBufferedText(bufferText.Trim());
            bufferText = "";
        }
        yield return new WaitForSeconds(5f);
    }
}

结语

通过“结构化 Prompt + 稳健的 API 调用 + 可解析的响应格式”，你可以在 Unity 里构建一个真正可用的实时 NPC 对话系统。模型越强，交互的自由度就越高；但越自由，就越需要工程手段把它收束到你的游戏规则里：输入校验、格式约束、失败降级、以及对事件触发的严格控制。