Introduction

Over the last few months, I have been working on my overlay for Twitch. The reason for the creation of the overlay is to stand out I guess and have an overlay I could enjoy. Overall the test went as expected as I did not expect any turnout and few errors on the overlay part. Now let us go into the details of the Twitch overlay and the general live test I did.

The Overlay Objective

The overall goal for the overlay is to create a program that handles all Twitch interactions and incorporates generative chat AI that controls a 3d model for a live AI assistant for stream. The AI assistant will have its own voice and will do call-outs for me. For this, I used Unity3d and Chat GPT API. I tried different chat AI but it did not work out as planned. Some of the features are missing as they are not required yet as I cannot get subscribers yet. For the first test, the bare minimum is only required.

The AI was able to reply back to me after I spoke and is able to interact with chat if they spoke. The model's mouth is able to move with audio of its voice. There are also a few animations that will play when AI talks. I think the raid code was in but did not get tested. The AI has her own mini-screen in the corner which I think is the best place to put it which is right under the Twitch chat screen. Then in the lower left is the subtitle box which is also under the game screen.

Issues

The first issue is DictationRecognizer from UnityEngine.Windows.Speech is having issues with timing out or something along those lines. DictationRecognizer is what I am using for my speech-to-text to be able to interact with the AI. I have a bool for DictationRecognizer in case it errors out or completes the cause to restart the DictationRecognizer again. But after a while, it looks like it just stops working completely which causes me to need to restart the whole overlay. I was thinking that shouldn't happen because of error trigger the bool to restart the DictationRecognizer but there must be another issue unless there overall timeout. Another issue is if the window is minimized it will not capture your voice. Having this working correctly is critical as it is needed for me to be able to interact with the AI. Below is my code for DictationRecognizer which has an issue.

using UnityEngine;
using UnityEngine.Windows.Speech;

public class DictationScript : MonoBehaviour
{
    public OpenAIInteraction OPENAI; // Reference to the OpenAIInteraction script

    private bool isListening = false; // Flag to track if the DictationRecognizer is actively listening
    private DictationRecognizer m_DictationRecognizer;

    void Start()
    {
        m_DictationRecognizer = new DictationRecognizer();

        m_DictationRecognizer.DictationResult += (text, confidence) =>
        {
            Debug.LogFormat("Dictation result: {0}", text);
            OPENAI.AddUserMessage(text);
        };

        m_DictationRecognizer.DictationComplete += (completionCause) =>
        {
            if (completionCause != DictationCompletionCause.Complete)
                Debug.LogErrorFormat("Dictation completed unsuccessfully: {0}.", completionCause);

            isListening = false; // Set the flag to false when recognition stops.
            StartListening(); // Start listening again to continue listening for audio.
        };

        m_DictationRecognizer.DictationError += (error, hresult) =>
        {
            Debug.LogErrorFormat("Dictation error: {0}; HResult = {1}.", error, hresult);

            isListening = false; // Set the flag to false when an error occurs.
            StartListening(); // Start listening again to continue listening for audio.
        };

        // Start listening for speech recognition.
        StartListening();
    }

    private void StartListening()
    {
        if (!isListening)
        {
            isListening = true;
            m_DictationRecognizer.Start();
        }
    }

    // Be sure to clean up resources when the script is destroyed.
    private void OnDestroy()
    {
        if (m_DictationRecognizer != null)
        {
            m_DictationRecognizer.Stop();
            m_DictationRecognizer.Dispose();
        }
    }
}

The next issue is the delay for generative chat AI to reply and the issue with the method of the output of the message. Let us start with the latter half of the issue. Because the audio output for the AI voice outputs and runs at normal speed the subtle text I have for the AI does not catch up properly. I currently have letter-by-letter and tried word-by-word being outputted but both lack the speed I need. Because of this, the next reply is delayed, and not sure why but was a few times when the AI just picked up random lines. There is also the issue I think of DictationRecognizer picking up no words and AI replying to them.

Not a total issue but after interacting with the AI I think the main prompt that I use for the AI needs some more refining. However, this could be because of my lack of skill in conversating with the AI or finding what works best.

The last noticeable issue I saw is pulling my screen to the program felt that it had some frame issues. This was 100% expected as I am pulling one screen to another to have the window captured and then streamed. I think the best way to fix this is just to have it on another computer.

Fixes

To fix the DictationRecognizer issue, I think I will try to find another method of speech-to-text. I do not want the solution to be press-to-speak which I think may fix the DictationRecognizer issue. Because that would require me to click out of the game I am playing to work. I think DictationRecognizer may just have overtime issues that I am not seeing or need to try something else that is in UnityEngine.Windows.Speech.

For the issue with the AI speech interaction, I will need to change the flow of the code. I may just output the whole subtitles simultaneously and slowly or quickly scroll down as some of the AI text is very long. This should remove the timing with text with audio overall. Then add larger time between interactions with AI and text spoken or chat.

The prompt issue will just require more testing with different types of prompts. For this, I may add more text to flush out the character it is trying to be.

Takeaways During the Stream

As I was testing I had a few different conventions between needing to restart the program. Some of them were me asking the AI to create jokes. Then there odd convention that I started trying to ask the AI to do the Genshin Impact event. Then there were parts where it asked me to involve Twitch chat which I didn't have anyone in chat that really talked so that was a no-go. The last part is I am not sure how I am going to handle the story parts of games like Genshin Impact with the AI going because the AI will end up talking over the dialog.

Conclusion

Overall like I said at the start it went as expected. There is a problem with speech-to-text, AI response speed and output speed, and then the need to fine-tune the prompt for the AI. The fixes to these issues are not hard or complex but would just use up a lot of time to fix. I will try the last takeaways a bit more for my next major test.

First Test of My Overlay for Twitch

July 29, 2023

Introduction

The Overlay Objective

Issues

Fixes

Takeaways During the Stream

Conclusion

Check Me Out At

Related Topics

Leave a Reply Cancel reply

First Try at Twitch AI Chat Assistant and Chatbot