Your next app will be an assistant

| 7 minutes read

For some years now, embedded technologies have been easily affordable (both in price and simplicity of control). You can look at any book store and find every month a magazine with fun stuff about domotic. “yeah, you can light on your house with your raspberry pi”. Yes, your house can be a good subject to make for IOT things. For most of those kits, you have to make it all – from hardware to software.

Since last year, we began to see “home assistant” boxes like Amazon Alexa or Google Home, giving full power and control for any connected stuff. For developers, the message is quite clear : here is an SDK to extend those assistant technologies. With such power at finger tips, everyone wants its “Jarvis” at home (see Mark Zuckerberg’s Jarvis).

The next challenge for IT makers is not to make a gorgeous web application, a beautiful mobile app or make your house lighting with dots. The real thing is to bring technology where it can really solve problems, and where it is difficult to bring technology! This new way of interacting with systems (vocal conversations & commands), gives control in a different way : you don’t need any more physical device in your hand… but a vocal device somewhere around you 🙂

Those assistants are based on latest cloud technologies like deep learning, language engine, vocal synthesis and cloud computing. One recent article about how apps are beginning to understand our daily life: the Google lens project is now capable of understanding your photos: https://techcrunch.com/2017/05/17/google-lens-will-let-smartphone-cameras-understand-what-they-see-and-take-action/

Making our own assistant

We worked on a project with La Région Occitanie Pyrénées-Méditerranée & MoiChef, for managing food products & help with provisioning. This last process is mainly done directly by contacting people (commercial phone call …), and thus, can hardly be replaced with digital boards.

How can we bring daily help, then? We tried to give a new way of working with this app via an assistant interface: basics vocal commands to help find items from providers and contact them if we have found anything interesting. This way, the product manager doesn’t have to keep a laptop while working on its package. But the hardest part has to come: redesign interaction with our application!

We decided to explore Amazon Alexa & Google Actions. Here below how technologies are talking to each other:

Amazon Alexa – Tool chain

Google Assistant – Chain tools

Behind the scenes, we can find similar way of proceeding:

  • speech recognition (vocal synthesis)
  • commands/conversation understanding (language engine – Alexa Skill & Google Actions/API.AI)
  • process with requested IT service (cloud service – AWS & Firebase)
  • proceed vocal output (vocal synthesis)

Conversation understanding will result in a webhook that triggers a remote service (cloud service usually), bringing all data and context to a service that will make decision about it. Amazon & Google systems have quite similar approach: a home box to control stuff and applications. But, Google has the advantage of having already billions of devices ready to spread its Google Assistant app.

Rethinking interactions

In Alexa, we build “skills” to extend capacity of Alexa, whereas in Google we build “actions” for Google Assistant. Alexa & API.AI contains tools to define all you need about the language of your assistant. The first thing to do is to define “intents” and “utterances”, to help the system understand your expressions. Amazon gives this good advice:

A successful Alexa skill starts with well-designed voice user interface (VUI). Engaging voice experiences are based on natural language and the fundamentals of human conversation.

Check the following guide to start designing for voice interface: https://developer.amazon.com/designing-for-voice/

Here we’re describing our different expressions in dedicated language tools:

Editing intents and utterances (Alexa at left, API.AI at right)

In both language engine, we defined sentences to query providers items: “look for {component}”, “search for {component}”, “please contact the provider”. The simple thing that we asking here is what kind of component do we have to search in our database.

You can then capture kind of variable (the component to search here) to be processed later in your cloud service. One important thing that you will quickly understand, interaction must be simple and straight forward. At the beginning you write beautiful sentences, and after some tests you will rationalize your intents for easier usage.

The second step to quickly reach is the ability to keep a conservation context between exchanges. You then begin a real conversation with your assistant, keeping data along each exchange.  Once you have defined your intents and gave enough examples to match usual phrases, your system is ready to go!

I encourage you to check out the associated documentation for speech recognition: https://developer.amazon.com/alexa-skills-kithttps://api.ai/docs/getting-started/basics. 

Also a good way to start is to begin with a simple example that you can run. The “fact” application that just replies a fact is a good way to start at small: https://github.com/alexa/skill-sample-nodejs-facthttps://github.com/actions-on-google/apiai-facts-about-google-nodejs

Building the app beyond

Both Alexa & Google Actions proposes a node.js SDK to help you quickly build your app and handle incoming speech request: https://github.com/alexa/alexa-skills-kit-sdk-for-nodejs & https://github.com/actions-on-google/actions-on-google-nodejs. The projects also have each a Github with sample projects. I must admit that Alexa’s ecosystem seems to be more active (number of stars, sample projects …) and samples quality are more helpful than Google one’s. Amazon Alexa’s resources are very complete on the subject.

Each captured sentence will then be fired to our cloud service. Here is how I connect my “ItemsWith” intent with Alexa (on Amazon Lambda):

Pretty much the same thing with Google Voice Actions (on Firebase Functions):

The two SDK are very similar, you can find the same concepts approach. It’s then easy to make a common module, and plug it on Amazon Lambda or Firebase Functions, to make response for Alexa or Google Assistant. A quick recap of the available API:

Note that you are not isolated to vocal speech response only. You can also send cards UI to the companion app, to help you support your vocal interface. This is very useful, because you will quickly see that VUI is highly constraining and additional GUI support like cards, can help you overcome “simple” situation: ask for list of information, secondary way of making choices, things that are naturally more visual.

Now chat with your assistant

For now, Amazon Alexa & Google Assistant are available in English only. It’s then not so easy to test your new ability and hardware (or compatible one) is only available in the UK, US and Germany. But you can use the simulator tools to play with your assistant, like a chat bot:

Testing with Alexa Simulator

testing with Google Assistant Sandbox

Voice, The Next Disruption?

Using the voice to control something isn’t new, but technologies are converging to something  great. Almost every smartphone’s owner tried to use Siri or Google Now, and it’s very impressive to see how smarter those little assistant applications are becoming. Disruption will rise from new use cases allowed by such home or custom assistant boxes, extended to online applications.

Technology stack beyond is still moving, but greatly promising. Speech recognition is efficient, but you must be exhaustive to ensure that you cover almost all cases. Sure that deep learning will improve the situation, and allow us to make more “living” things. Companion apps are also part of the experience, allowing additional visual support.

Amazon is surely in lead position at this time (availability, simplicity of use & build, resources, documentation …). I was disappointed by the limited resources on the Google side, compared to those that can be found at Amazon. It’s also interesting to see pure player alternatives like snips, coming onto the market.

Arnaud Giuliani Author: Arnaud Giuliani

French Java Software Tech, create and run #java #server gears (distributed and other under the hood stuffs). Also like to make #android apps

Like it?  Share  it!

Share Button
What do  You  think? Write a comment!

Leave a Reply

Required fields are marked *.


CommentLuv badge