Building Personal AI Assistants in 2020

While Personal Assistants as an idea are hardly news in 2020, with all the amazing progress made since DARPA CALO and PAL, followed by Siri, Alexa, Microsoft Cortana, and Google Assistant, it seems like the dream of Personal Assistant is still there, in the future.

We all know how simple if not stupid these AI assistants are. They can do only very primitive tasks, often misunderstand us, and require too much actions from our side to get things done. In short, they require too much effort to get things done, even in Home Automation space that was supposed to unleash AI assistant’s usefulness.

Why is it so? What’s the True Personal AI Assistant? How far are from building it?

5 Levels of Autonomy across Self-Driving Cars and AI Assistants

Last year I was researching self-driving cars, and of course one of the first things I’ve got onto was the now famous 5 levels of autonomy that have been introduced by SAE International in 2016:

SAE J3016 Levels of Driving Automation (2016)

Of course, autonomous “anything” is an old dream of our technology-driven world, and this model of 5 levels of autonomy was quickly reused by Alan Nichol from Rasa who published his own vision of 5 levels of autonomy for AI assistants:

5 Levels of Autonomy for AI Assistants (2018)

When I’ve looked onto this diagram, I was puzzled by the lack of focus on things like these:

complex process-based tasks understanding
complex process-based tasks execution and broader agency
proactivity
work with humans (e.g., gather team reports)
strategic counsel
personal counsel

With these key things that are so typical for Personal Assistants, I’ve decided to finally start a quests towards identifying what a True Personal AI Assistant should be.

Are Existing Autonomy Levels for AI Assistants Good Enough?

In our quest to identify what the Personal Assistant should be, it’s a good idea to critically look at the situation and ask yourself a question: is this model above good enough? As I said above, while I agree with some of the ideas in the Rasa’s article on O’Reilly Radar website, to me, their model shown above comes from limitations of the current AI Assistants, and lacks the needs of the real users.

If there’s anything I’ve learned from my first startup, it’s that you start with your customers and what they need. Who is the customer of personal assistants?

Learning from Personal Assistants, as in Humans

Personal assistants aren’t a new idea. Lots of people in power, and lots of wealthy people had personal assistants for centuries. In the modern days, not only they have personal assistants, they also exist at the team level. For example, when I joined Microsoft Russia in 2007, our team, Dpe Russia, had an administrative assistant. Yes, even though we’ve had lots of departments with their standardized processes (finances, HR, etc.), we still had our administrative assistant. Our admin assistant didn’t do things just for our director, but also helped the rest of us, too.

And my first obvious question is this:

What Do We Know About Human Personal Assistants?

While Personal Assistants are all humans, they can also be categorized by their level of autonomy. Why is it important?

Let’s take a look at one of the existing categorizations, courtesy of Inc.com’s article (picture is mine), and inspired by talks to our potential customers, and supported by numerous articles on PA duties:

4 Levels of Personal Assistants

It’s quite clear that while Rasa’s approach to Levels of Autonomy of AI Assistants is a good step towards understanding of the research directions for AI assistants, it’s not really inspired by the problems of the real-world people.

Did anyone analyze Personal Assistants before building an AI assistant?

Cortana

Yes, interestingly enough, Microsoft’s Cortana team did look into PAs and their tasks, and used them as an inspiration for Cortana.

What did they learn?

A few things:

Key thing is trust – user tells very personal information, and expects it to be kept private between us
Personality is important to build emotional trust
Knowing something about a person is like having a bible about that person – whom they meet with, whom they talk to, email to, what they buy, what they eat, what they wear

What did they make in Cortana based on these findings?

They’ve introduced “Notebook” to show what AI assistant knows about its user
Original approach for AI assistant to act through 3 triggers – time, place, people

Cortana Notebook (2014)

Unfortunately something happened in 2014 or later that stopped this direction of development in Cortana. In 2014, Cortana had an amazing team including people like Larry Heck, Zig Serafin, and others, but later they all moved on: Zig left the ship in 2016 to Qualtrics, Larry Heck left the ship in 2014 to lead Google Assistant’s Deep Dialogue. Given my experience with Yandex Alice, I believe it’s quite hard to build a personal AI assistant within an existing corporation – too many powers in play stopping you from building a something truly useful for the end users. It’s not only that you have to believe in the vision to support a product that isn’t bringing money just yet; it’s also a huge fight within the company for resources, as well as for either including existing services, or making sure they won’t block the 3rd party ones. And if you’d talk to anyone senior enough at Microsoft today, you’d probably hear that Cortana is no longer of strategic value to the company.

Arguably the most successful cases are Alexa (e.g., you can change your music and home automation providers in it) and Google Assistant, and yet they both are still relatively awful in day-to-day usage.

Facebook M Assistant

Facebook made a very curious and courageous trip towards building a general-purpose AI assistant by combing the power of AI and real humans. In the beginning, Facebook was working hard to hide the fact that its AI assistant was, in fact, human-aided:

Facebook M

Facebook’s experiment was fascinating in that it wasn’t limited to things AI assistants could do. Instead, its product leadership’s goal was to make a system that could learn from the unlimited amount of things people want to get done, and to automate them.

Unfortunately, the team never managed to get more than 40% of tasks to be automated, and the project was cancelled.

What Can We Learn By Comparing What Real-Human Personal Assistants Do and What Current AI Assistants Can Do?

A few things:

Even the most advanced AI Assistants available to the public (e.g., Alexa) are mostly capable of doing only some of the Level 1, Level 2 tasks, and can emulate doing Level 3 tasks:

Level 1: They can do some of the direct tasks (tell the weather, order an Uber, remind you to take pills); they can perform limited proactive tasks like reminders

Level 2: They can do some simple routines (remind you to check heating system everyday at noon, switch off lights after rooms being no longer occupied for 5 min), they can even help to schedule meetings (x.ai) within some boundaries; transcribe and send meeting notes (Webex, Reason8)

Level 3: They can emulate a personal therapist (e.g., Replika), or a chit-chat buddy (Microsoft Xiaoice, Yandex Alice)
They can’t do things that need more than one step or require intellect, e.g.:

Level 2: Prepare trip tickets for team building; conduct routine research
They can’t act on your behalf like real human assistants except very limited activities:

Level 2: Accumulate and send you team weekly reports

Level 3: Create a virtual group, set a goal, find and organize people into it, start discussions towards solving that goal, prepare and send you a report about it; take on moving to a new office;

Level 4: Build and maintain relationships with people, teams, and entire organizations (internal and external)
They can’t solve all of the problems on their own:

Level 3: Handle problems with vendors; fix office or home automation stuff by bringing necessary support, setting goals, and making sure goals are met and problems are solved; etc.
Level 4: Resolve miscommunications between departments
They don’t know about us. The best things they can know are our locations (home, office), our names, and that’s about it. While they can keep shopping lists, tasks, reminders, they don’t have any sense about the contents. They don’t know about our social connections, and they can’t do anything related to other people except direct commands like “send this picture to Bob” or “schedule a meeting with Bill”. They don’t know about our preferences.

Level 1: send email to my Mom

Level 2: gather team weekly reports

Level 3: create cross-disciplinary committee on office goodwill; talk to me about your daughter and your mother’s experience on daughter’s desire to change her name

Level 4: organize an annual strategic planning
They are mostly not transparent. Even if our assistants learn something about us besides locations, they don’t share that with us. For example, Yandex Alice could learn about key people you talk to via email, but it won’t maintain that list for you to see it, and edit it.
They don’t always do things in your favor. For example, if you want to ask Yandex Alice to play music, you need Yandex.Music subscription (Alexa here is a good example though, as it allows you to change your Music provider; but you can’t do that for Reminders and Tasks, or other things in Alexa). Corporate-built AI assistants usually limit you to their respective ecosystem services. While it’d be fine if you were so all-into those corporation ecosystem, most of us usually use a mix of services, and they belong to different ecosystems.

. For example, if you want to ask Yandex Alice to play music, you need Yandex.Music subscription (Alexa here is a good example though, as it allows you to change your Music provider; but you can’t do that for Reminders and Tasks, or other things in Alexa). Corporate-built AI assistants usually limit you to their respective ecosystem services. While it’d be fine if you were so all-into those corporation ecosystem, most of us usually use a mix of services, and they belong to different ecosystems.

Example: I have Alexa, Microsoft Office 365 and Windows 10, Google Android, Samsung Notes, Facebook, Gmail, Yandex Music, Yandex’s KinoPoisk, Ivi.ru, and a few more other services. None of the Assistants I have allow me to control all of these things from one place, thus requiring me to remember which Assistant to rely upon to solve my problems.
They can’t be trusted. While there are things like GDPR these days, corporate-built AI assistants store information in their clouds, making it available to company’s employees and official authorities.

Summary & What’s Next

While AI Assistants are here for a decade already, and are here to stay, they are still very limited in their capabilities. Rasa’s recent approach to identifying 5 Levels of Autonomy of AI Assistants hides the real complexity of AI assistant tasks. True focus should be on Trust, Autonomy, and Context.

In one of the next blog posts I’ll dive into Autonomy and Context, which are inevitable to building a True Personal AI Assistant.

Author: Daniel Kornev

CPO at DeepPavlov.ai. Passionate about Conversational AI & Space Exploration. Founded Zet Universe, Inc. Previously worked at Microsoft, Yandex, Google, and Microsoft Research. This is my older blog (circa 2010), the primary one is at https://danielko.medium.com/ View all posts by Daniel Kornev