In a face-to-face conversation, you end up saying more in subtle gestures, tics, and shifting of pose than you do in words. Unfortunately, most of us would likely miss most of these cues.
In a face-to-face conversation, you end up saying more in subtle gestures, tics, and shifting of pose than you do in words. Unfortunately, most of us would likely miss most of these cues. Sure, we may wise up later, replaying the conversation in our heads—X did twiddle his thumbs while answering the question, Y looked away while agreeing with the boss, Z repeatedly promised he would come to the party, but kept shifting weight from one foot to the other, etc. But how many of us actually catch on while the conversation is happening? If this is the case with us humans, imagine how badly the slightly intelligent machines we have built must be failing at this. They rely on software-based voice, limited visual, textual or other modes of communication in their interaction with us, and even with facial recognition getting better and better and helping decipher a handful of human expressions, they are likely to have little clue about a human user’s actual state of mind.
AI and intelligent robotics will increasingly become a part of daily living, and as this happens, there will be need for robots and intelligent machines to understand us, and not just our commands and instructions. Robotics researchers at the Carnegie Mellon University (CMU) have brought us closer to this. Open Pose, the new technology that they have released on Github as open source, fills the gap between plain vanilla interaction and meaningful, somewhat sensitive communication. Open Pose tracks human body language in real time and lets machines study and interpret these. Some robots today can interpret individual body poses, but this isn’t enough to understand non-verbal communication, especially for large groups.
Using computer vision and machine learning to process video frames, Open Pose tracks multiple people simultaneously. It tracks barely perceptible movement of a person’s head, facial features, torso and limbs, and even individual fingers. The CMU researchers used a dome lined with 500 cameras to capture body poses of various individuals from a wide range of angles and built a data set with the images. The images were passed through a “keypoint detector” that identified and labelled specific body parts. The detected keypoints were then projected in 3D and body-tracking algorithms were developed to understand how each pose or movement appeared from different perspectives. The system was taught to determine how, for instance, a whole hand looked when it is in a particular position, even if some fingers are not completely visible.
With the code released to the public via Github, there is bound to be a great deal of experimentation and the dataset could grow vastly. This could vastly improve the quality of human-machine interaction—it will allow machines to even understand moods and whether a human is amenable to an interruption by a machine and whatnot. If you are angry that a robot has taken your job, it might actually know whether it should avoid you or offer some well-meaning counselling!