Interact with robots and other devices by gesturing, using wearable muscle and motion sensors
overview video | conference media | sensors | gesture detection | publications | news
From spaceships to Roombas, robots have the potential to be valuable assistants and to extend our capabilities. But it can still be hard to tell them what to do – we’d like to interact with a robot as if we were interacting with another person, but it’s often clumsy to use pre-specified voice/touchscreen commands or to set up elaborate sensors. Allowing robots to understand our nonverbal cues such as gestures with minimal setup or calibration can be an important step towards more pervasive human-robot collaboration.
This system, dubbed Conduct-a-Bot, aims to take a step towards these goals by detecting gestures from wearable muscle and motion sensors. A user can make gestures to remotely control a robot by wearing small sensors on their biceps, triceps, and forearm. The current system detects 8 predefined navigational gestures without requiring offline calibration or training data – a new user can simply put on the sensors and start gesturing to remotely pilot a drone.
For more information, check out the virtual presentation below |
By using a small number of wearable sensors and plug-and-play algorithms, the system aims to start reducing the barrier to casual users interacting with robots. It builds an expandable vocabulary for communicating with a robot assistant or other electronic devices in a more natural way. We look forward to extending this vocabulary to additional scenarios and to evaluating it with more users and robots.
Photos by Joseph DelPreto, MIT CSAIL
Sensors: Wearable EMG and IMU
Gestures are detected using wearable muscle and motions sensors. Muscle sensors, called electromyography (EMG) sensors, are worn on the biceps and triceps to detect when the upper arm muscles are tensed. A wireless device with EMG and motion sensors is also worn on the forearm.
In the current experiments, MyoWare processing boards with Covidien electrodes and an NI data acquisition device were used to stream biceps and triceps activity. The Myo Gesture Control Armband was used to monitor forearm activity. Alternative sensors and acquisition devices could be substituted in the future.
EMG sensors monitor biceps, triceps, and forearm muscles. The forearm device also includes an IMU. |
Photos by Joseph DelPreto, MIT CSAIL
Gesture Detection: Classification Pipelines
Machine learning pipelines process the muscle and motion signals to classify 8 possible gestures at any time. For most of the gestures, unsupervised classifiers process the muscle and motion data to learn how to separate gestures from other motions in real time; Gaussian Mixture Models (GMMs) are continuously updated to cluster the streaming data and create adaptive thresholds. This lets the system calibrate itself to each person’s signals while they’re making gestures that control the robot. Since it doesn’t need any calibration data ahead of time, this can help users start interacting with the robot quickly.
In parallel with these classification pipelines, a neural network predicts wrist flexion or extension from forearm muscle signals. The network is trained on data from previous users instead of requiring new training data from each user.
Machine learning pipelines classify gestures from wearable muscle and motion sensors
The closed-loop system consists of EMG and IMU acquisition, classification pipelines, and robot control. LEDs cue gestures during open-loop blocks. Muscle and motion signal traces from all subjects for (a) arm stiffening, (b) fist clenching, (c) rotation gestures, (d) clusters of left and right gestures, and (e) clusters of up and down gestures. Classifiers successfully identified their respective gestures during cued blocks for each subject. Each bar summarizes 40 trials, except wrist activation which has 80 trials. Confusion matrices illustrate classification performance during (a) open-loop gesture blocks and (b) closed-loop robot control blocks. Main numbers and coloring represent final outputs, while parenthetical values in (a) represent predictions before a gesture hierarchy is imposed and before wrist flexion and extension are gated by an adaptive threshold.
Images by Joseph DelPreto, MIT CSAIL
Conference Media: Human-Robot Interaction 2020 (HRI ’20)
Virtual Presentation | Demo Video |
Publications
- J. DelPreto and D. Rus, “Plug-and-Play Gesture Control Using Muscle and Motion Sensors,” in Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction (HRI), New York, NY, USA, 2020, p. 439–448. doi:10.1145/3319502.3374823
[BibTeX] [Abstract] [Download PDF]As the capacity for machines to extend human capabilities continues to grow, the communication channels used must also expand. Allowing machines to interpret nonverbal commands such as gestures can help make interactions more similar to interactions with another person. Yet to be pervasive and effective in realistic scenarios, such interfaces should not require significant sensing infrastructure or per-user setup time. The presented work takes a step towards these goals by using wearable muscle and motion sensors to detect gestures without dedicated calibration or training procedures. An algorithm is presented for clustering unlabeled streaming data in real time, and it is applied to adaptively thresholding muscle and motion signals acquired via electromyography (EMG) and an inertial measurement unit (IMU). This enables plug-and-play online detection of arm stiffening, fist clenching, rotation gestures, and forearm activation. It also augments a neural network pipeline, trained only on strategically chosen training data from previous users, to detect left, right, up, and down gestures. Together, these pipelines offer a plug-and-play gesture vocabulary suitable for remotely controlling a robot. Experiments with 6 subjects evaluate classifier performance and interface efficacy. Classifiers correctly identified 97.6\% of 1,200 cued gestures, and a drone correctly responded to 81.6\% of 1,535 unstructured gestures as subjects remotely controlled it through target hoops during 119 minutes of total flight time.
@inproceedings{delpreto2020emgImuGesturesDrone, author={DelPreto, Joseph and Rus, Daniela}, title={Plug-and-Play Gesture Control Using Muscle and Motion Sensors}, year={2020}, month={March}, isbn={9781450367462}, publisher={ACM}, address={New York, NY, USA}, url={https://dl.acm.org/doi/10.1145/3319502.3374823?cid=99658989019}, doi={10.1145/3319502.3374823}, booktitle={Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction (HRI)}, pages={439–448}, numpages={10}, keywords={Robotics, EMG, Wearable Sensors, Human-Robot Interaction, Gestures, Plug-and-Play, Machine Learning, IMU, Teleoperation}, location={Cambridge, United Kingdom}, series={HRI ’20}, abstract={As the capacity for machines to extend human capabilities continues to grow, the communication channels used must also expand. Allowing machines to interpret nonverbal commands such as gestures can help make interactions more similar to interactions with another person. Yet to be pervasive and effective in realistic scenarios, such interfaces should not require significant sensing infrastructure or per-user setup time. The presented work takes a step towards these goals by using wearable muscle and motion sensors to detect gestures without dedicated calibration or training procedures. An algorithm is presented for clustering unlabeled streaming data in real time, and it is applied to adaptively thresholding muscle and motion signals acquired via electromyography (EMG) and an inertial measurement unit (IMU). This enables plug-and-play online detection of arm stiffening, fist clenching, rotation gestures, and forearm activation. It also augments a neural network pipeline, trained only on strategically chosen training data from previous users, to detect left, right, up, and down gestures. Together, these pipelines offer a plug-and-play gesture vocabulary suitable for remotely controlling a robot. Experiments with 6 subjects evaluate classifier performance and interface efficacy. Classifiers correctly identified 97.6\% of 1,200 cued gestures, and a drone correctly responded to 81.6\% of 1,535 unstructured gestures as subjects remotely controlled it through target hoops during 119 minutes of total flight time.} }
In the News
Special thanks to the MIT CSAIL communications team,
especially Rachel Gordon and Tom Buehler.