Everyday skills, such as making your bed or even pressing a doorbell, might seem trivial to us, but are actually quite complicated for today’s robots. Think about your performance the first time you tried a sport. Did you seek help from a peer or coach? Did you perform better after that?

Most probably you answered yes. It turns out, this is essentially how humans expand their skillset while growing up, *learning from demonstration (LfD)* or *imitation*. In the words of Aristotle, “Imitation is natural to man from childhood”. Hence, it’s unnatural to expect robots to be pre-programmed with all the desired skills required to assist us humans in our daily activities. To address this problem, learning from demonstration enables robots to continuously expand their capabilities by allowing people to teach them new skills by showing what to do instead of programming. However, we don’t want robots to be mere copycats. Similar to humans, we want robots to reproduce the desired skills in many different scenarios.

We have devised a new approach which enables robots to extract the important constraints of the desired skill form multiple human demonstrations. A drawer opening skill would for example, require a robot to reach the drawer (constraint in goal position) and then do a pulling motion (additional constraint in direction of motion). Once the skill is learned, our approach allows efficient adaptation of the skill to: 1) changes in the environment, for example avoiding hitting the table in front while opening a drawer, and 2) changes in the skill requirements, for example reaching the drawer from different locations. We call our approach *CLAMP (Combined Learning from demonstration And Motion Planning)*. CLAMP generates trajectories for a desired skill which are *optimal* according to the demonstrations while remaining *feasible* in the reproduction scenario.

## Human demonstrations for skill learning

For a desired skill, a human provides multiple trajectory demonstrations. The demonstrations are provided such that they cover different ways of executing the skill. In the example shown above, a human provides demonstrations for a *box-opening* skill by grabbing and moving the robot (*kinesthetic teaching*). There are different ways of reaching the box, hence demonstrations show more variations initially. However, once the box is reached, opening the box requires a highly constrained sliding motion, hence all the demonstrations look the same towards the end.

## Skill optimality from demonstrations

In order to extract the skill constraints, CLAMP learns a linear stochastic dynamical system from the demonstrations i.e. . Linear ridge regression is carried out to learn such a system. Trajectory rollouts from this dynamical system generate a distribution over trajectories p(θ). As shown above, the variance of the of the distribution for the *box-opening* skill decreases as the box is reached, exhibiting highly constrained motion. This trajectory distribution acts as the *optimality* criterion.

## Skill feasibility in reproduction scenario

In a given reproduction scenario composed of various conditions, the *feasibility* of a trajectory is defined by the likelihood of satisfying all the conditions. For example, trajectories colliding with a new obstacle are less likely to be the desired trajectory than those farther away. Moreover, if we want to begin the skill from a new position, the trajectories starting from the given position should be far more likely than those starting from other initial robot positions. We denote this feasibility by the likelihood p(e|θ).

## Probabilistic inference for skill reproduction

To reproduce the learned skill in the new scenario, CLAMP selects the most feasible trajectory from the learned distribution of trajectories. To achieve this, CLAMP adopts the probabilistic inference view on motion planning. A posterior distribution is found by conditioning the optimality prior on the feasibility likelihood. The maxima of this posterior is the desired optimal and feasible trajectory.

However, the mentioned probabilistic inference procedure can be time-consuming. This is undesirable since in real-world scenarios we would want the robot to reproduce the desired skill instantly. To enable this, CLAMP exploits the Markovian structure in the learned prior trajectory distribution. It turns out that such an inference problem can be solved efficiently by posing it on a factor graph.

## Conclusion

As shown, CLAMP provides an effective way of generalizing skills in real-world scenarios. Like humans, we expect robots to continuously learn and adapt, and CLAMP is a step towards making this happen. Future extensions of CLAMP will seek to address various other real-world scenarios involving unstructured and dynamic environments.

More details on this method can be found in our research paper.

## Reference:

Rana, Muhammad Asif, Mustafa Mukadam, S. Reza Ahmadzadeh, Sonia Chernova, and Byron Boots. “Towards Robust Skill Generalization: Unifying Learning from Demonstration and Motion Planning.” In *Conference on Robot Learning*, pp. 109-118. 2017.

[Written by Rana Asif]