Here’s an interesting video from an IET event held in Bristol in early October. Mike Aldered talks through the story and development behind the 360 eye, and highlights lots of interesting things surrounding the development of robots from a company perspective.
Mike does a great job does illustrating some of the differences in the conceptual problems that academia addresses, and the real challenges faced when applying some of the solutions to real-world problems. Makes me want to point to my last post again. Ask yourself how can you apply your research to the real world? It is worth thinking about!
Watch the video here.
Our lab got a new companion today in the form of a SociBot, a hand made/assembled 3 DoF robot head with a Retro Projected Face (RPF) sitting on top of a static torso, produced by Engineered Arts Ltd in Cornwall, UK. Setting you back about £10,000, it’s an interesting piece of kit and could well be a milestone in the development toward very (convincing) facially expressive robots.
So, how do these things work? Well, it’s actually really simple. You take a projector, locate it in the back of the robot’s head and project a face onto a translucent material which has the profile of a face. This way you as the user can’t see into the head, but you can see the face. It’s cheap, it’s simple and it’s becoming more popular at the moment. Retro-projected faces are not exactly a new idea in the world of social robotics however (and they have an even longer history in the world of theater I’ve come to learn). There’re have been a few universities exploring them in Europe, with a notable early example coming from Fred Delaunay who did his PhD work at Plymouth a couple of years ago on his LightHead robot (Fred has since gone on to pursue a startup company, Syntheligence, in Paris to commercialise his efforts since it seems to have caught on quite a bit). That said, they do have many useful things to offer the world of robotics.
For example, the face has no moving parts, and thus actuator noise is less of an issue (which isn’t altogether true, projectors get hot and do have a fun whirring away). Faces can also be rendered in real time (thus noiseless eye saccades, blinking and even pupil dilations are possible), and are very flexible to alterations such as blushing. You can put any face you want on SociBot, with animation possibilities where the sky’s the limit. The folks who put together the software for the face system (amusingly called “InYaFace”) have also made a model of Paul Ekman’s influential Facial Action Coding system (FACs) which allows the robot to make all kinds of facial expressions, and allows it to tie in very well with research output that use the FACs system as a foundation.
The robot itself run’s Linux Ubuntu and uses lots of off the shelf software such as OpenNI for the Acus XTion Pro to do body tracking and gesture recognition, and face/age/emotion detection software using the head mounted camera as the feed source. TTS is also built in, as is Speech Recognition (I think), but there is no built in mic, only a mic input port. Also, we (by that I mean Tony) bought the SociBot Mini, but there is also a larger version where you can add a touch screen/kiosk as joint virtual space that both the robot and user can manipulate.
SociBot can be programmed using Python and has a growing API which is constantly under development and it looks promising. You also have the ability to program the robot via a web based GUI as Engineered Arts seem keen to have the robots be an “open” platform where you can log in remotely to control them.
Unboxing the robot
Given my recent post about the unveiling of Pepper by Aldebaran and the new Nao Evo that we got a little while ago, I was curious to see what Engineered Arts had put into the robot as a default behaviour. I must admit that I was rather pleasantly surprised. The robot appeared to be doing human tracking and seemed to look at new people when they came into view.
To make comparison with Aldebaran, I think that what really stands out here is that it took very little effort to start playing with the different aspects of the robot. When you get a Nao, it really doesn’t do that much when turned on for the first time, and there is quite a bit of setup required before you get to see some interesting stuff happen. Through the web interface however, we were quickly able to play with the different faces and expressions that SociBot has to offer, as well as the different little entertaining programs such as the robot singing a David Bowie number as well as blasting out “Singing in the Rain”. Lots of laughs were to be had very quickly and it was good fun exploring what you can do with the robot once it arrives.
From a product design / UX perspective, Engineered Arts have got this spot on. When you open the box of a robot, this is perhaps the most excited you will ever be about that robot, and making it easy to play with the thing and leaving a very good impression. Overall, however, some things are a bit rough around the edges, but the fact that this is a hand made robot tells you everything that you need to know about Engineered Arts: they really do like to build robots in house.
Drawbacks to retro-projected faces
Now that post is in part an overview of the current state-of-the-art in RPF technology, I can certainly see room for improvement as there are some notable shortcomings. Let’s start with some low hanging fruit, which comes in the form of volume used. As the face is projected, most of the head is actually empty space in order to avoid having yourself a little shadow puppet show. This (lacking) use of the space is actually inefficient in my view as it puts quite some limitations on where you can locate cameras. Both LightHead and SociBot have a camera mounted at the very top of the forehead. This means that you loose out on potential video quality and sensory. Robots like iCub and Romero have cameras that are mounted in actuated eyes which allows them to converge (which is interesting from a Developmental Robotics point of view), but also compensate for robot’s head and body movements, providing a more stable video feed. Perhaps this is a slightly minor point as the robot’s are generally static in the grand scheme of things, when they start to walk, I can see things changing quickly (in fact this is exactly why Romeo has cameras in the eyes).
Another problem is to do with luminosity and energy dispersion. As the “screen ratio” of the modern off the shelf projector isn’t really suitable to cover a full face at the distance required, these systems turn to specialised optics in order to increase the projection field of view. However, this comes at the cost of spreading the energy in the projection over a larger area which results in a lower overall luminosity over a given area. This is furthered even more as when the projection hits the translucent material, the energy refracts and scatters even more, which means you loose more luminosity, as well as the image loosing some sharpness. Of course the obvious solution is to put in a more powerful projector, but this has the drawback that it will likely get hotter, thus needing more fan cooling, and with that fan running in a hollow head, that sound echoes and reverberates.
Personally, I’m still waiting for the flexible electronic screens technology to develop as this will likely overcome most of these issues. If you can produce a screen that takes the shape of a face, you suddenly no longer need a projector at all. You gain back the space lost in head, loose the currently noisy fan that echos and luminosity becomes less of an issue. Marry this with touch screen technology and perhaps actuation mounted under the screen and I think that you have a very versatile piece of kit indeed!
Video Posted on Updated on
Ever since reading Cynthia Breazeal’s book, “Designing Sociable Robots“, I’ve had this constant itch to implement her visual attention model on a robot, mainly the Nao as there’re four of them laying around in the lab these days. So, suffice to say that I’ve finally gotten around to scratching this particular itch, and boy does it feel good! 🙂
So, if you haven’t already read this book (and if you work in social robotics, shame on you), I highly recommend it! It’s full of lots in interesting insights and thoughts, and it is a sure read for any new MSc/PhD students that might be embarking on their research journeys.
To get to the point, in one of the chapters, Breazeal describes the vision system running on Kismet. This is actually something that was developed by Brian Scassellati (whilst working on “Cog”, if I recall), and I must say, I think that it is a little gem (hence why I wanted to see it run on the Nao). The model is intended to make the robot attend to things that it can see in the environment (e.g. things that move, people, objects, colours, ect) using basic visual features. Basically a bottom-up approach to visual processing: take lots of basic, simple features, and combine “upwards” to something that is more complex.
I’ve finally implemented the model, from scratch and made it run using either a Desktop webcam, or using it with an Aldebaran Nao. This little personal project also holds a more serious utility. I’m now beginning to make an online portfolio of my coding skills as I have seen some employers request example code recently (and I’m currently on a job hunt). I’ve made two YouTube videos of the model. The first is it running on my Desktop machine in the lab, where I talk through the model and the parameters that drive it. In the second video I show the slightly adapted version running with a Nao. Here are those two videos:
I have to admit that there is certainly room for improvement and fine tuning in the parameter settings, as well as some nice extensions. For example I had a bit of trouble as there is quite a lot of red in our office and the robot was immediately drawn to this. Either I need to change the method for attention point selection, or I need to take distance into account in some way (but there isn’t and RGBD sensor on the Nao at the moment). Currently for attention point selection I am finding all the pixels that share the same max value in the Saliency Map and finding the Center of Mass of the largest connected region of these. Alas in the videos this was sometimes background items…
Talking about possible extensions, I certainly see alot of room to have an adaptive mechanism that provides the “Top Down” task orientated control of the feature weights (at least) as was done with Kismet. There are a small subset of the different parameters driving the model and finding values that work can be a little tricky. Furthermore, I suspect as soon as you change setting, you will need to tweak parameters again.
Coding this system up also made me think about the blog post I wrote a about what a robot should do out of the box. I recall that the Nao was doing at least face detection and tracking. I pondered the idea of whether this kind of model would work as on out of the box program. Rather than having fixed weights, the robot could have some pre-set modes (as Kistmet did) and just cycle through these at different intervals. Perhaps the biggest problem will be the onboard processing that would need to happen. My program is multi-threaded (each feature map is computed in it’s own thread, as is the Nao motor control) and isn’t exactly computationally cheap, and so I can see it using quite a bit of the processing resources.
Anyway, there are lots of possibilities with this model both with respect to tweaking it, extending it, and merging it with other “modules” that do other things. As such, I’ve made the code available to download:
Desktop + webcam version (needs Qt SDK, OpenCV libs and ArUco libs): Link
Version for the Nao (needs Qt SDK, OpenCV libs, ArUco libs and NaoQi C++ SDK, v 1.14.5 in my case): Link
Note: With the NaoQi SDK, this isn’t free. You need to be a Developer and I have access through the Research Projects at Plymouth University. I can’t provide you with the SDK as this would go against the agreement we have with Aldebaran… Sorry… 😦