Digital Human Behavior

Inline tagging

7min

Inline tagging is an XML style method for driving various behaviors during a digital human experience.

These tags are inserted into the transcript in the same way as SSML.

Here's an example:

XML


Inline tagging is only supported when using Text to Speech services that support Mark tag functionality.

As of October 2023, inline tags are only supported with Amazon Polly and Azure TTS

There are currently three main uses of inline tags:

  • Influencing emotion
  • Triggering actions
  • Camera changes

Usage

To use inline tags, you simply add the desired tag into the transcript of what the digital human will speak. They work in unison with SSML tags.

Here are some specifics:

  • Multiple tags can be used in any position within the transcript (but still need to comply with SSML requirements. See SSML section below)
  • Tags are self closing. A pair of tags is not required
  • Multiple emotions can be used in one response
  • Multiple actions can be used, but depending on timing, subsequent actions will be blocked from being taken if the prior action has not completed
    • Some actions can be combined as they do not conflict
  • Azure TTS does not support consecutive tags but Amazon Polly does

Here are some examples.

<uneeq:emotion_joy_strong />Absolutely! <uneeq:action_understandnod />That's absolutely something I can do.

  • Multiple tags
  • All self closed

<uneeq:emotion_sadness_strong /><uneeq:action_disappointed/>I'm afraid this won't work.

  • Two consecutive tags aren't supported by Azure TTS but work with Amazon Polly
  • The workaround for Azure TTS is to break up tags with word(s) in between

Certainly<uneeq:action_understandnod>, let me see what I can do for you.

  • Tag is not self closing, it is missing a /

<uneeq:emotion_anticipation_strong>That sounds very exciting!</uneeq:emotion_anticipation_strong>

  • Open and close tags were used

SSML

SSML continues to be fully supported. The simplest example of using SynAnim & SSML:

XML


A more complex example, using Azure SSML with one of their neural voices is below. Notice multiple UneeQ tags can be used, but they are placed within the "voice" tag but outside other elements which can be included within "voice" tags such as "prosody"

As at October 2023, there is a known issue with Azure TTS where if <break /> tags are used, they cause the timing of the inline tags to be incorrect. We have raised this bug with the Microsoft Azure team.

Break tags are able to be used with Azure TTS.

XML