In today’s post we are going to see how to handle answers with more than text to our users, leveraging devices with screens. I have to admit that this topic has taken me a little bit crazy, as the documentation was not that clear. The key is here, where you can find the following diagram:
In our case we are developing with Cloud Functions to handle the fulfillment on the left of the diagram, so we have to work with the Dialogflow API and not with the Conversation API from Actions on Google. This is not a problem by itself, but in many of the examples that you can find about rich responses it is common to see how they use the conversation interface, and I never managed to make them work.
If we take a look to the WebHookClient library of Dialogflow, we don’t have the conversation primitives, but we do have the ability to provide rich answers. What I haven’t found anywhere is how to detect the type of surface that we have, something easy with the conversation primitives. I ended up trying myself to find this way … If anyone knows a better alternative please let me know.
When a request gets to our fulfillment we have the Request Body accesible, and in fact you can see it in the Cloud Functions logs if you haven’t removed the line doing console.log of “Dialogflow Request body”. Taking a look to the content you can see the following info:
So we need to find out if “surface” is defined in our payload and if it has SCREEN_OUTPUT as capability; we will get this info accessing the object:
The function I have defined to detect if we have a screen is this one:
Once we know that we have a screen we can generate a different text and add rich responses which include cards with images, suggestions, custom payloads, etc. If we send them to a device without screen it will ignore them, but the problem is that the user might not have enough information to continue the conversation, or we might even want to change the conversation flow depending on having a screen or not.
In our Hoteles Martinez example, the answer when we find a valid hotel is this one:
In the function handling the intent we are defining a different output context if we have screen or not, as with a screen we would be able to finish the reservation, and without a screen my decision is to send an email to the user with the details and finish the interaction:
We could have many things into account about the right flow for our users depending of the use case of our chatbot … but this is another story.
I work for Google Cloud, but this post are personal ideas and opinions