Programming

January 8, 2025

Using iFlytek's Speech Model in Unity for Text-to-Speech (TTS) Playback

Using iFlytek’s Speech Model in Unity for Text-to-Speech (TTS) Playback

Introduction

Integrating iFlytek’s Text-to-Speech (TTS) technology into Unity can elevate game immersion by enabling realistic and dynamic voice responses. This article will detail how to use iFlytek’s TTS in Unity, covering:

How iFlytek TTS works.
Step-by-step integration into Unity.
Handling audio data for playback.
Key considerations and troubleshooting.

How iFlytek TTS Works

iFlytek TTS synthesizes speech by:

Logging into iFlytek’s cloud services.
Sending text to the TTS API for synthesis.
Receiving and processing PCM (Pulse Code Modulation) audio data.
Playing the processed audio in Unity.

Prerequisites

Unity 2021 or later.
An iFlytek developer account and AppID.
The iFlytek SDK (including msc_x64.dll for Windows).

Step-by-Step Integration

1. Setting Up Unity

Create a Unity project and add a GameObject to host the TTS script.
Import the msc_x64.dll file into your project (place it in the Assets/Plugins folder).

2. Implementing the TTS Script

Below is a complete breakdown of the XunfeiTTS class:

Initialization

The Awake method logs into the iFlytek service and sets up an AudioSource component for playback:

private void Awake()
{
    audioSource = gameObject.AddComponent<AudioSource>();
    Login();
}

The Login and Logout methods handle authentication with iFlytek’s servers:

public void Login()
{
    int res = MSCDLL.MSPLogin(null, null, AppID);
    if (res == 0)
    {
        Debug.Log("iFlytek login successful!");
    }
    else
    {
        Debug.LogError($"iFlytek login failed, error code: {res}");
    }
}

public void Logout()
{
    int res = MSCDLL.MSPLogout();
    if (res == 0)
    {
        Debug.Log("iFlytek logout successful!");
    }
    else
    {
        Debug.LogError($"iFlytek logout failed, error code: {res}");
    }
}

Sending a TTS Request

The SendTTSRequest method initiates a TTS session, sends the text for synthesis, and fetches the resulting audio:

public void SendTTSRequest(string text)
{
    int res = 0;

    // Start TTS session
    ttsSession = MSCDLL.QTTSSessionBegin("engine_type=cloud,voice_name=aisjiuxu,speed=50,pitch=50,text_encoding=utf8,sample_rate=16000", ref res);
    if (res != 0 || ttsSession == IntPtr.Zero)
    {
        Debug.LogError($"TTS session start failed, error code: {res}");
        return;
    }

    // Send text for synthesis
    res = MSCDLL.QTTSTextPut(ttsSession, text, (uint)text.Length, null);
    if (res != 0)
    {
        Debug.LogError($"Text synthesis failed, error code: {res}");
        EndSession();
        return;
    }

    // Fetch and play audio
    FetchAndPlayAudio();
}

Fetching and Playing Audio

FetchAndPlayAudio retrieves PCM audio data from the TTS service and converts it for playback:

private void FetchAndPlayAudio()
{
    MemoryStream audioStream = new MemoryStream();
    SynthStatus synthStatus = SynthStatus.MSP_TTS_FLAG_STILL_HAVE_DATA;

    int res = 0;
    while (synthStatus == SynthStatus.MSP_TTS_FLAG_STILL_HAVE_DATA)
    {
        uint audioLen = 0;
        IntPtr audioData = MSCDLL.QTTSAudioGet(ttsSession, ref audioLen, ref synthStatus, ref res);

        if (res != 0)
        {
            Debug.LogError($"Audio retrieval failed, error code: {res}");
            EndSession();
            return;
        }

        if (audioData != IntPtr.Zero && audioLen > 0)
        {
            byte[] buffer = new byte[audioLen];
            Marshal.Copy(audioData, buffer, 0, (int)audioLen);
            audioStream.Write(buffer, 0, buffer.Length);
        }
    }

    PlayAudio(audioStream.ToArray());
    EndSession();
}

The PlayAudio method converts PCM data into a Unity AudioClip:

private void PlayAudio(byte[] pcmData)
{
    int sampleRate = 16000;
    int channels = 1;
    float[] audioSamples = new float[pcmData.Length / 2];

    for (int i = 0; i < audioSamples.Length; i++)
    {
        short sample = BitConverter.ToInt16(pcmData, i * 2);
        audioSamples[i] = sample / 32768f; // Normalize to [-1, 1]
    }

    AudioClip audioClip = AudioClip.Create("TTS_Audio", audioSamples.Length, channels, sampleRate, false);
    audioClip.SetData(audioSamples, 0);

    audioSource.clip = audioClip;
    audioSource.Play();
    Debug.Log("Audio playback complete.");
}

Ending the Session

The EndSession method ensures proper resource cleanup:

private void EndSession()
{
    int res = MSCDLL.QTTSSessionEnd(ttsSession, "end");
    if (res != 0)
    {
        Debug.LogError($"TTS session end failed, error code: {res}");
    }
    else
    {
        Debug.Log("TTS session ended successfully.");
    }
}

Key Considerations

Text Encoding

Ensure the text sent to the API is UTF-8 encoded to avoid errors.

Audio Format

The iFlytek TTS API typically outputs PCM audio at 16 kHz, mono. Convert this to a Unity-compatible format before playback.

Error Handling

Handle error codes returned by iFlytek’s SDK functions, as they provide vital information for debugging.

Dependency Management

Include all required SDK files (msc_x64.dll) in the Unity project.

Troubleshooting

No Audio Playback: Verify the AudioSource is correctly configured and the PCM data is properly converted.
SDK Errors: Refer to iFlytek’s SDK documentation for error code definitions.
DLL Not Found: Ensure the DLL is placed in the correct directory (Assets/Plugins).

Conclusion

By integrating iFlytek’s TTS into Unity, developers can create dynamic and engaging voice interactions in their games. This guide provides the foundational steps to harness iFlytek’s robust speech synthesis capabilities, paving the way for richer storytelling and enhanced player immersion.

About this Post

This post is written by FFFeiya, licensed under CC BY-NC 4.0.

#CSharp #Unity #TTS

Using iFlytek's Speech Model in Unity for Text-to-Speech (TTS) Playback

Using iFlytek’s Speech Model in Unity for Text-to-Speech (TTS) Playback

Introduction

How iFlytek TTS Works

Prerequisites

Step-by-Step Integration

1. Setting Up Unity

2. Implementing the TTS Script

Initialization

Sending a TTS Request

Fetching and Playing Audio

Ending the Session

Key Considerations

Text Encoding

Audio Format

Error Handling

Dependency Management

Troubleshooting

Conclusion

About this Post

Building Real-Time NPC Dialogue in Unity with Large Language Models

Bridging Unity with Python and ChatGPT: A Comprehensive Guide

Using iFlytek's Speech Model in Unity for Text-to-Speech (TTS) Playback

Using iFlytek’s Speech Model in Unity for Text-to-Speech (TTS) Playback

Introduction

How iFlytek TTS Works

Prerequisites

Step-by-Step Integration

1. Setting Up Unity

2. Implementing the TTS Script

Initialization

Login and Logout

Sending a TTS Request

Fetching and Playing Audio

Ending the Session

Key Considerations

Text Encoding

Audio Format

Error Handling

Dependency Management

Troubleshooting

Conclusion

About this Post

Building Real-Time NPC Dialogue in Unity with Large Language Models

Bridging Unity with Python and ChatGPT: A Comprehensive Guide