How to Use Oculus Voice SDK

Overview

In this tutorial, we’ll build a simple app that lets users activate Voice commands by gazing at a sphere.

The app we are going to make is composed of two apps that communicate with each other:

1. Server side (Wit app)

2. Client side (Unity app).

Let’s start by creating the Wit app.

Create Wit App

To create the Wit app, sign up for a  Wit.ai account. Then, follow these steps:

  1. On the Understanding tab, enter ‘make the cube green’ in the Utterance field.
  2. On the Intent field, enter change_color and click on the Create Intent button.
  1. In the Utterance field, highlight (or double-click on) “cube” and then enter shape in the Entity for “cube” field. Click + Create Entity.

For more information, see Which entity should I use? in the Wit.ai documentation.

  1. Highlight ‘green’ and create a new entity and call it color. Now you should see something like this:
5. Click Train and Validate to train your app.
6. Repeat steps 4 through 6 with other possible utterances a user might say, such as Set the sphere to red, Make the cylinder green, Color the cube in Orange and so on.
TIP: After training, the app will start to identify entities on its own. However, it can sometimes make mistakes, especially initially. If this is an issue, try training several phrases and then tweaking the NLU’s mistakes along the way. Highlight the word that should be matched and set the correct entity. You can then click the X next to the incorrect entities to remove them.

On the Entities tab, verify that the following entities are present:

Now we are ready to make our Unity app.

Create Unity App

  1. Create a unity project using 3D Core template. You can call it Gaze_Tutorial.
  2. Go to File > Build Settings…. In the Platform panel, select Android. Click on Switch Platform. It might take several minutes for Unity to compile scripts and switch to the new platform.

Connect the Unity App to Your Wit App

  1. Import Voice SDK into Unity Editor. The Voice SDK unity package is included in Oculus Integration SDK. Download and import it into your newly created Unity project (Assets > Import Package > Custom Package…). After the import, you should be able to see the Oculus in the menu. We will use this menu later.
  1. Go back to your Wit app on Wit.ai website. From the Settings tab under Management, copy the Server Access Token.

3. In the Unity Editor, Click Oculus > Voice SDK > Settings and paste the Server Access Token into the Wit Configuration box.

Note: If you don’t see the Oculus menu, it means you have not installed the Oculus Integration SDK for this project. Go here for the installation instructions.

4. Click Link/Relink to link your Unity app with your Wit app.

5. Save a new Wit Configuration for your app by clicking on the Create button. Name the configuration file WitConfig-Gaze.

Test

Let’s test to see whether the Wit configuration file that we created works properly. We are going to send voice commands and expect to receive parsed data back. One easy way to achieve that is to use the Understanding Viewer window.

  1. Select Oculus > Voice SDK > Understanding Viewer.
  2. Make sure the newly created Wit configuration (WitConfig-Gaze) is set.
  3. Enter ‘Set the cylinder to green’ in the utterance field, and then click Send.

Now you should see the structured response (see the below figure).

 It has these sections:

  1. The text field. It contains the transcription of the utterance you sent to the Wit app. In our case, it’s ‘Set the cylinder to green’.
  2. The entities field. If the utterance was parsed successfully, this field contains the entities in your sentence. We expect to see ‘color’ and ‘shape’ entities here. If you open them, you eventually should find ‘cylinder’ and ‘green’ values.
  3. The intents field. We expect to see change_color there.
  4. The traits field. We do not expect to see any trait here because we did not define any (nor our utterance has any).

If you can see all the correct elements in the response above, the test is passed. You are good to go to the next section.

Setting Up the Scene

  1. In a new Unity 3D project (a 3D project), right-click the scene in the Hierarchy window and then select GameObject > 3D Objects > Cube.
  2. Select the added cube and go to the Inspector window and set Position X to -2.5. This will move the cube over to make room for other shapes.
  3. Repeat the steps above to add a sphere, a capsule, and a cylinder. Set their Position X to -0.75, 0.75 and 2.5 respectively.
  4. (Optional) Make all the shapes you’ve created black. You can achieve that by
  5. Right click on the Hierarchy window and select Create Empty and name it Shapes to group the shapes together.
  6. Select the four shapes and drag them into the Shapes GameObject (See the figure below).
  1. While the Shapes GameObject is selected, go to the inspector window and change
    • Position to (X = 0, Y = 1.5, Z = 3)
    • Scale to (X = 0.5, Y = 0.5, Z= 0.5)

Add VR To Scene

  1. Remove the Main Camera from your scene.
  2. Bring the OVRCameraRig prefab to the hierarchy. (You can search it in the project window or in the folder Oculus\VR\Prefabs).
  3. Go to Edit > Project Settings… > XR Plugin Management and click on Install Plugin Management button. After the installation, you should be able to see an Oculus checkbox. Click on it to install the Oculus package. It would take several minutes to finish the installation (See the figure below).

Test

Let’s test whether our app compiles without an error and runs in Virtual Reality. We expect to be able to look around in VR and see the shapes we added.

  1. Turn on your Oculus HMD and connect it to your computer.
  2. Wear the HMD. If you see the USB Debugging Prompt, click Allow (See the figure below).
  1. In the Unity Editor, go to File > Build Settings….
  2. Make sure ‘Android’ is selected as the target platform (see the figure below).
  3. Click on Refresh. Select Oculus Quest 2.
  4. Click on Build and Run.
    Note: If you get an error message saying Android Device Is Not Responding, it might have a USB Debugging Permission issue. Wear HMD and click Allow if you see a prompt window.
  5. Wear HMD. You should be able to see the shapes (cube, sphere, etc.).

Add UI Elements

  1. In Unity, right-click on the Hierarchy window and select UI > Canvas. Call it World-Space Canvas. Change the Render Mode of the canvas to World Space. Now, another property named Event Camera should appear below it. Drag the OVRCameraRig TrackingSpace CenterEyeAnchor into the Event Camera slot.
  1. Set the position of the World-Space Canvas to Pos X = 0.5, Pos Y = 2.5 and Pos Z = 3.5. Set its Scale property to X = 0.003, Y = 0.003, Z = 0.003.
  2. Right-click on the World-Space Canvas GameObject and then select Text – TextMeshPro. Name it Instruction Text (TMP). In its Text property, enter the following:
    Look at the white sphere, wait for “Listening…” to appear, and say “make the capsule orange”.
    • Set the Vertex Color to black and make itbold (Press B on the Font Style property)for easy reading.
    • Set the Rect TransformWidth property to 600 and Height to 200.
    • On the TextMeshPro – Text component, check the Auto Size property.
  3. Create another Canvas. Name it Screen-Space Canvas. Change its Render Mode to Screen Space – Camera. Set the Render Camera property to CenterEyeAnchor.
  4. Right-click on the Screen-Space Canvas and add an Image UI element. Name it Reticle Image. Set its Source Image property to GazeRing by dragging Oculus/VR/Textures/GazeRing.png into the slot.
    Note: If the slot refuses GazeRing image, it means the GazeRing is not imported as a Sprite. Thus, click on the GazeRing and in the inspector, set Texture Type property to Sprite (2D and UI). Now you should be able to drag the GazeRing to the Source Image property of Reticle.
  5. (Optional) Add a Directional Light to illuminate the objects. To do so, point the z-axis of the light toward the shapes.

The next important UI element is Gaze. We explain it in the next section.

Add Gaze Capability

In this section, we implement the gaze mechanism.

  1. Add a Sphere GameObject. Name it Gaze. Set its position to:
    Pos X = -1, Pos Y = 2.5, Pos Z = 3.5
  2. Find the scripts InteractionVisualizer.cs and GazeActivator.cs and attach them to the Gaze GameObject. If you cannot find them, you can recreate them using the following code:
// InteractionVisualizer.cs
/************************************************************************************
Licensed under the Oculus SDK Version 3.5 (the "License"); 
you may not use the Oculus SDK except in compliance with the License, 
which is provided at the time of installation or download, or which 
otherwise accompanies this software in either electronic or hard copy form.

You may obtain a copy of the License at

https://developer.oculus.com/licenses/sdk-3.5/

Unless required by applicable law or agreed to in writing, the Oculus SDK 
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
************************************************************************************/
using System.Collections;
using System.Collections.Generic;
using TMPro;
using UnityEngine;
using UnityEngine.UIElements;

namespace Oculus.Voice.Samples.XR.GazeActivation
{
    public class InteractionVisualizer : MonoBehaviour
    {
        [SerializeField] private TextMeshProUGUI text;
        private Material material;
        private bool active;

        // Start is called before the first frame update
        void Start()
        {
            material = GetComponent<Renderer>().material;
        }

        public void SetFocusedColor()
        {
            if (!active)
            {
                material.color = Color.blue;
            }
        }

        public void SetUnfocusedColor()
        {
            if (!active)
            {
                material.color = Color.white;
            }
        }

        public void OnStartedListening()
        {
            active = true;
            material.color = Color.red;
            if (text)
            {
                text.color = Color.green;
                text.text = "Listening...";
            }
        }

        public void OnStoppedListening()
        {
            transform.localScale = Vector3.one;
            active = true;
            material.color = Color.blue;
            if (text)
            {
                text.color = Color.white;
                if (text.text != "Listening...")
                {
                    text.text = "Processing...\nYou said: " + text.text;
                }
                else
                {
                    text.text = "Processing...";
                }
            }
        }

        public void SetInactive()
        {
            active = false;
            material.color = Color.white;
        }

        public void SetScale(float modifier)
        {
            transform.localScale = Vector3.one * (1 + .5f * modifier);
        }

        public void OnError(string type, string message)
        {
            if (text)
            {
                text.color = Color.red;
                text.text = "Error: " + type + "\n" + message;
            }
        }

        public void OnTranscription(string transcription)
        {
            if (text)
            {
                text.color = Color.white;
                text.text = transcription;
            }
        }
    }
}

And the code for GazeActivator.cs script:

// GazeActivator.cs
/************************************************************************************
Licensed under the Oculus SDK Version 3.5 (the "License"); 
you may not use the Oculus SDK except in compliance with the License, 
which is provided at the time of installation or download, or which 
otherwise accompanies this software in either electronic or hard copy form.

You may obtain a copy of the License at

https://developer.oculus.com/licenses/sdk-3.5/

Unless required by applicable law or agreed to in writing, the Oculus SDK 
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
************************************************************************************/
using UnityEngine;
using UnityEngine.Events;

namespace Oculus.Voice.Samples.XR.GazeActivation
{
    public class GazeActivator : MonoBehaviour
    {
        [SerializeField] private float activationTime = 2;

        [SerializeField] private UnityEvent onGazeStart = new UnityEvent();
        [SerializeField] private UnityEvent onGazeEnd = new UnityEvent();

        [SerializeField] private UnityEvent onActivation = new UnityEvent();

        private Camera gazeCamera;
        private bool gazing = false;

        private bool activated;
        private float gazeStart;

        private void Awake()
        {
            gazeCamera = Camera.main;
        }

        private void Update()
        {
            if (Physics.Raycast(gazeCamera.transform.position, gazeCamera.transform.forward,
                out var hit) && hit.collider.gameObject == gameObject)
            {
                if (!gazing)
                {
                    gazeStart = Time.time;
                    onGazeStart.Invoke();
                }

                gazing = true;
            }
            else if (gazing)
            {
                activated = false;
                gazing = false;
                onGazeEnd.Invoke();
            }

            if (gazing && Time.time - gazeStart > activationTime && !activated)
            {
                activated = true;
                onActivation.Invoke();
            }
        }
    }
}
  1. Drag and drop the Instruction Text (TMP) into the text slot of the “Interaction Visualizer” script.
  2. Find Gaze Activator component. It has several properties to be set,
    • Set Activation Time to 2.
    • Set OnGazeStart() to Gaze > InteractionVisualizer.SetFocusedColor
    • Set OnGazeEnd() to Gaze >  InteractionVisualizer.SetUnfocusedColor
    • Set OnActivation() to AppVoiceExperience > AppVoiceExperience.Activate

See the figure below:

Add App Voice Experience to the Scene

If you want your Unity app to send commands to the Wit.ai server and receive the results back, you’ll need to add an App Voice Experience GameObject to your scene.

  1. Click Assets > Create > Voice SDK > Add App Voice Experience to Scene and select the App Voice Experience GameObject.
  1. Drag the wit configuration file WitConfig-Gaze into its slot in the App Voice Experience component on App Voice Experience GameObject.
  1. Click on Events dropdown on App Voice Experience component.
    1. Set On Response (WitResponseNode) to Gaze > InteractionVisualizer.SetInactive
    2. Set On Error (String, String) to Gaze > InteractionVisualizer.SetOnError
    3. Set On Mic Level Changed (Single) to Gaze > InteractionVisualizer.SetScale
    4. Set On Start Listening () to Gaze > InteractionVisualizer.OnStartedListening
    5. Set On Stopped Listening () to Gaze > InteractionVisualizer.OnStoppedListening
    6. Set On Partial Transcription (String) to Gaze > IntractionVisualizer.OnTranscription
    7. Set On Full Transcription (String) to Gaze > InteractionVisualizer.OnTranscription

Add a Response Handler for Voice Commands

When a user speaks a command, the Voice SDK will send the utterance to the Wit API to do NLU processing. After the processing is complete, it will send back a structured response containing the extracted intent, entities and traits (if any).

One common way to extract the necessary information (e.g. capsule, green, etc.) from the wit response is to use the WitResponseMatcher script. Although you can attach it to any GameObject and extract the necessary fields, there is a way to set up this script automatically to extract those fields. The following explains how:

  1. In the Unity Editor, select the App Voice Experience GameObject in the Hierarchy window.
  2. Create an Empty GameObject as a child of App Voice Experience. Name it Color Response Handler.
  3. Click Oculus > Voice SDK > Understanding Viewer.
  4. Set the Wit Configuration field to WitConfig-gaze (or whatever config you created for this project).
  5. Enter “Make the capsule green” in the Utterance field. Click send. You should get the response shortly.
  6. In the Hierarchy window, make sure the Color Response Handler GameObject is selected.
  7. On the Understanding Viewer window, go to the entities > shape:shape > 0 > value = cube. Click on it. On the popup window, select Add response matcher to Color Response Handler (see the figure below).
  1. Verify Unity has added the response matcher component to the Color Response Handler GameObject:
  1. To extract the shape’s color from the response, go to the entities > color:color > 0 and click value = green.(See the figure below)
  1. In the window, we have two options:
    1. Add response matcher to Color Response Handler
      This option adds a new Wit Response Matcher component to the Color Response Handler GameObject. This approach is not desirable because we end up having two separate response matchers, one for shape and another for color. As we need to use both parameters together to set the shape to the requested color, we’d be better off to use a response matcher that extracts both parameters at once.
    2. Add value matcher to Color Response Handler’s Response Matcher
      This option modifies the current response matcher to extract both shape and color parameters at once. Then we can directly set the shape to the requested color. Therefore, select Add value matcher to Color Response Handler’s Response Matcher option.
  2. Verify that the new value matcher for Color is added to the Wit Response Matcher (see figure below).
  1. In the Hierarchy window, select the Shapes GameObject. In the Inspector window, click Add Component. Select New Script and name the new script ColorChanger. Add the following code to the script:
// ColorChanger.cs
/************************************************************************************
Licensed under the Oculus SDK Version 3.5 (the "License"); 
you may not use the Oculus SDK except in compliance with the License, 
which is provided at the time of installation or download, or which 
otherwise accompanies this software in either electronic or hard copy form.

You may obtain a copy of the License at

https://developer.oculus.com/licenses/sdk-3.5/

Unless required by applicable law or agreed to in writing, the Oculus SDK 
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
************************************************************************************/
using System;
using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class ColorChanger : MonoBehaviour
{
    private void SetColor(Transform trans, Color color)
    {
        trans.GetComponent<Renderer>().material.color = color;
    }

    public void UpdateColor(string[] values)
    {
        // Note: The 'values' array contains color and shape but their order depends on
        //       the value matchers on the Wit Response Matcher component on the
        //       Color Response Handler GameObject.
        var shapeString = values[0];
        var colorString = values[1];

        if (!ColorUtility.TryParseHtmlString(colorString, out var color)) return;
        if (string.IsNullOrEmpty(shapeString)) return;

        foreach (Transform child in transform)
        {
            if (child.name.IndexOf(shapeString, StringComparison.OrdinalIgnoreCase) != -1)
            {
                SetColor(child, color);
                return;
            }
        }
    }

}
  1. In the Hierarchy window, under App Voice Experience,select Color Response Handler GameObject. Open the Wit Response Matcher (Script) window. Click + under On Multi Value Event(String []), and then dragthe GameObject Shapes into its slot. On the function dropdown, select ColorChanger.UpdateColor(). (See the figure below).

Test the Integration

  1. Run your app by pressing Play.
  2. Click Oculus > Voice SDK > Understanding Viewer to open the viewer.
  3. In the Utterance field, type “Make the cylinder orange”. Click Send.
  4. You should see the cylinder turn orange.
  5. Make sure the Oculus HMD is connected to your computer.
  6. In the Unity Editor, go to File > Build Settings… .
  7. Make sure ‘Android’ is selected as the target platform.
  8. Click on Refresh. Select Oculus Quest 2.
  9. Click on Build and Run.
    Note: If you get an error message saying Android Device Is Not Responding, it might have a USB Debugging Permission issue. Wear HMD and click Allow if you see a prompt window.
  10. Wear the HMD. You should be able to see all the UI elements we added.
  11. Gaze at the white sphere, wait for 2 seconds and say Make the capsule green. The capsule should turn green shortly.

Note: I have not coded InteractionVisualizer.cs, the GazeActivator.cs or other scripts here. You can find them in the Oculus Voice SDK sample folder.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s