You are currently viewing Dialogflow 17: Improving Speech recognition

Dialogflow 17: Improving Speech recognition


In this post we spoke about defining an entity and training phrase in order to recognize and ID. It was key to define a custom entity with the expected format of your ID, as we showed in the post:

Once you have that you would create an intent that uses that entity in the training phrases, but depending on what are your trying to capture, the natural language understanding engine might no be able to properly “understand” your entity, the ID in this case.

We have a powerful tool called Auto Speech Adaptation in Dialogflow that can help us to capture this type of information much better. It is disabled by default, and once enabled it makes the speech recognition adaptive to the training phrases examples that we can expect in each moment of the conversation.

To manage the conversation flow we use contexts, so intents with a particular input context can be triggered only when we have that context set. If we have a training phrase that uses the entity, like we did on the ID example:

It is recommended to make this parameter “required” to help to recognition of the entity.

Once autospeech adaptation is enabled, we will improve the speech recognition to try to match the given example in the training phrase. Here you have a couple of examples with the feature disabled, where it is not able to recognize the letter at the end of the ID:

If we go to the agent configuration, enabling this is a single click: 

You have to wait a few minutes to get it working, and I’m afraid there is no feedback on when it is working, but once it is you can see that the recognition is better. In this case the trailing letter is detected, as it is defined in the format of the entity that we are expecting:

In a real case you would need to do more detailed tests to measure the improvements. Be careful with the entity definition, as there are some limitations (i.e punctuations are forbidden).

It is also interesting to conduct the user interaction to get him/her answer directly with the letters and numbers to capture, for example saying something like “tell me your ID digits and letter”, and if we are not able to recognize the numbers advice to say them one by one.

Obviously this is only useful when Dialogflow is receiving audio requests, not with text agents. Google Assistant is an exception, because you might “talk” to the assistant, but it will do the speech to text detection and pass text requests to Dialogflow. This can also happen if you create an integration that calls the Speech-To-Text API and then sends the text to Dialogflow … bad idea.




I work for Google Cloud, but this post are personal ideas and opinions

Leave a Reply