Custom Voice

10min

For customers who wish to bring their own (BYO) voice to UneeQ, integration is supported via a simple voice orchestration service, similar to the conversation integration which can be achieved via Synapse or your own custom solution

Similar to the conversation integration, a customer-hosted endpoint should be created and allow external requests from the UneeQ platform.

UneeQ will POST the following JSON payload to your BYO service:

JSON


Returning Audio

The BYO integration API expects single channel raw (no header) PCM audio at 16kHz. A 2XX status code will be treated as a success and any resultant response body will be treated as audio will be forwarded to the avatar for rendering and will be played to the user. Samples must be returned as 16 bit linearly encoded signed integers with a little endian byte ordering. It’s suggested that content is returned as type application/octet-stream. To summarise

  • API returns 200 OK
  • 16kHz audio
  • Mono
  • Samples are little endian 16 bit signed integers
  • Raw PCM, i.e no wav header or any encoding or compression
  • application/octet-stream

Returning Feedback

Errors may be returned by setting a non 2XX status code. Those error codes will be counted but at the time of writing the error body is not captured. Responses with non 2XX status codes are not played out to the user.

Sample Implementation

The listing below implements a BYO TTS application that calls out to the Google Cloud TTS API.

If using the Python sample below, you’ll need to have google-cloud-texttospeech installed. See Google’s documentation for how to attain Google application credentials. Note the first line in the sample uses pip to install the required dependencies.

If using the NodeJS sample below, you'll need a valid subscription to Microsoft Azure Cognitive Services API, and values for the other environment variables you will find in the code. Note that there are two separate files in the NodeJS code sample, you should create a new ExpressJS app, and then define the routes using the Orchestration Handler sample, and define the interface to Microsoft's Cognitive Speech service using the Microsoft Services Handler.

We expect that if you are reading this documentation, you are comfortable creating and hosting custom services. If you require assistance, please contact us at [email protected]

Python
Node.js


UneeQ Configuration

Once your service has been deployed and confirmed reachable, please provide the following details to UneeQ:

  • Fully-qualified URL of the service
  • API key for your service (optional)
  • Voice File / Name (common to providers like Google, Amazon and Azure - your service may not require this detail, or you may choose to hardcode the value within your service

UneeQ will configure your digital human to use the custom voice service that you have created. The next time you start a session with your digital human, you will hear the custom voice!

Troubleshooting

This section lists common problems and errors that may occur during configuration or rollout of a BYO TTS endpoint. If you don’t see your error here you might consider coming back and adding it once you figure it out

The digital human is talking really slowly / quickly

This is likely the sample rate, check that you are returning 16kHz audio

I can hear a click at the start of speech

This is likely because there are unexpected bytes at the start of the response body. Previously this has been identified as being a WAV file header. If this is present it needs to be stripped prior to being returned

All I hear is nasty violent static

You have returned the audio in the wrong format. This can happen just by swapping the byte order from little to big endian, or may be something more complicated like you are returning mp3 or some other audio format. Our application expects linear PCM, check the “Returning Audio” section above

Not playing well with NodeJS

When you’re using NodeJS, you might default to res.send(file) from Express, but when streaming back a binary file, it’s better to use res.sendFile(file) as this automatically handles a variety of things:

  1. res.sendFile(file):
    • res.sendFile(file) is specifically designed to send files as the response. It automatically sets the appropriate headers, including Content-Type based on the file extension, Content-Disposition as attachment, and handles the streaming of the file efficiently.
    • It takes care of setting the appropriate headers, buffering, and sending the file in chunks, which is more memory-efficient for large files.
    • It supports conditional requests (If-Modified-Since, If-None-Match) and ranges (Range, Accept-Ranges) out of the box.
    • It provides better security by preventing the serving of files outside the specified directory (root directory) using path normalization.
  2. res.send(file):
    • res.send(file) is a generic method used to send various types of responses, including files. However, it doesn't handle files in the same optimized manner as res.sendFile(file).
    • It requires manually setting the appropriate headers, including Content-Type, Content-Disposition, and handling the file streaming logic yourself.
    • It may load the entire file into memory before sending it, which can be memory-intensive for large files and may impact server performance.

In summary, res.sendFile(file) is the recommended approach when streaming files back as application/octet-stream in Express. It provides better performance, memory efficiency, security, and handles file-related functionalities more effectively.