![]() | NatSGLD: A Dataset with Speech, Gestures, Logic, and Demonstrations for Robot Learning in Natural Human-Robot Interaction Snehesh Shrestha, Yantian Zha, Saketh Banagiri, Ge Gao, Yiannis Aloimonos, and Cornelia Fermuller. 2025. NatSGLD: A Dataset with Speech, Gesture, Logic, and Demonstration for Robot Learning in Natural Human-Robot Interaction. In Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction (HRI '25). IEEE Press, 1093–1098. ...we present the NatSGLD dataset, a multimodal dataset with natural human commands (speech and gestures), each paired with a demonstration trajectory and a Linear Temporal Logic (LTL) formula that provides a ground-truth interpretation of the commanded tasks... |
![]() | CrowdHRI: Gamifying HRI Data Collection as a Multiplayer Mixed Reality Game Nhi Tran and Snehesh Shrestha. 2025. CrowdHRI: Gamifying HRI Data Collection as a Multiplayer Mixed Reality Game. In Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction (HRI '25). IEEE Press, 1685–1689. ...This paper introduces CrowdHRI, a novel approach to gamify HRI data collection through a multiplayer mixed reality (MR) game. The proposed system integrates a web server and Unity-based client architecture, enabling users to schedule or join sessions dynamically. Through immersive MR, CrowdHRI offers realistic environments and supports customizable experimental setups, gathering high-fidelity data on human-robot interactions... |
![]() | VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference Yoo, Seong Jong*, Snehesh Shrestha*, Irina Muresanu, and Cornelia Fermüller. "VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference." arXiv preprint arXiv:2411.13607 (2024). ...We leverage the direct causal relationship between the music produced and the human motions creating them to address these challenges. We propose VioPose: a novel multimodal network that hierarchically estimates dynamics. High-level features are cascaded to low-level features and integrated into Bayesian updates... |
![]() | Choreographing the Digital Canvas: A Machine Learning Approach to Artistic Performance Peng, Siyuan, Kate Ladenheim, Snehesh Shrestha, and Cornelia Fermüller. "Choreographing the Digital Canvas: A Machine Learning Approach to Artistic Performance." arXiv preprint arXiv:2404.00054 (2024). ... The platform integrates a novel machine-learning (ML) model with an interactive interface to generate and visualize artistic movements. Our approach's core is a cyclic Attribute-Conditioned Variational Autoencoder (AC-VAE) model developed to address the challenge of capturing and generating realistic 3D human body motions from motion capture (MoCap) data. We created a unique dataset focused on the dynamics of falling movements, characterized by a new ontology that divides motion into three distinct phases: Impact, Glitch, and Fall... |
![]() | Preliminary Study of Mixed Reality Interfaces for Collaborative Robot Programming of a Manufacturing Assembly Task Board Medhavi Kamran, Snehesh Shrestha, Arnav Juneja, Shelly Bagchi, Jeremy Marvel, Megan Zimmerman, Vinh Nguyen. "Preliminary Study of Mixed Reality Interfaces for Collaborative Robot Programming of a Manufacturing Assembly Task Board." International Symposium on Technological Advances in Human-Robot Interaction (2024). ... This research aims to evaluate AR and VR interfaces with standardized test methods and metrics for programming collaborative robots in manufacturing settings... We aim to evaluate success rates and cognitive load on operators... |
![]() | Considerations for Minimizing Data Collection Biases for Eliciting Natural Behavior in Human-Robot Interaction Shrestha, Snehesh, Ge Gao, Cornelia Fermüller, Yiannis. "Considerations for Minimizing Data Collection Biases for Eliciting Natural Behavior in Human-Robot Interaction." HRI Workshop on Emerging Test Methods & Metrics for Accessible HRI (2023). ... Human-Robot Interaction standards informed by empirical data could save us time and effort and provide us with the path toward the robots of the future. To this end, we share some of our pilot studies, lessons learned, and how they affected the outcome of our experiments... |
![]() | hDesigner: Real-Time Haptic Feedback Pattern Designer Shrestha, Snehesh*, Ishan Tamrakar*, Cornelia Fermuller, and Yiannis Aloimonos. "hDesigner: Real-Time Haptic Feedback Pattern Designer." Augmented Humans Workshop for Intelligent Music Interfaces: When Interactive Assistance and Augmentation Meet Musical Instruments (2023). Haptic sensing can provide a new dimension to enhance people's musical and cinematic experiences. However, designing a haptic pattern is neither intuitive nor trivial. Imagined haptic patterns tend to be different from experienced ones... Our simple architecture, wireless connectivity, and easy-to-program communication protocol make it modular and easy to scale... |
![]() | NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction Shrestha, Snehesh, Yantian Zha, Ge Gao, Cornelia Fermüller, Yiannis Aloimonos. "NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction." AAAI Workshop on User-Centric Artificial Intelligence for Assistance in At-Home Tasks (2023). ...we introduce NatSGD, a multimodal HRI dataset that contains human commands as speech and gestures, along with robot behavior in the form of synchronized demonstrated robot trajectories. Our data enable HRI with Imitation Learning so that robots can learn to work with humans in challenging, real-life domains such as performing complex tasks in the kitchen... |
![]() | FEVA: Fast Event Video Annotation Tool Shrestha, Snehesh, William Sentosatio, Huiashu Peng, Cornelia Fermuller, and Yiannis Aloimonos. "FEVA: Fast Event Video Annotation Tool." arXiv preprint arXiv:2301.00482 (2023). ... We conducted an extensive survey of over 59 VAT and interviewed interdisciplinary researchers to evaluate the usability of the VAT. Our findings suggest that most current VAT have overwhelming user interfaces, poor interaction techniques, and difficult-to-understand features. These often lead to longer annotation time, label inconsistencies, and user fatigue. We introduce FEVA, a video annotation tool with streamlined interaction techniques and a dynamic interface that makes labeling tasks easy and fast. FEVA focuses on speed, accuracy, and simplicity to make annotation quick, consistent, and straightforward... |
![]() | Deep-Readout Random Recurrent Neural Networks for Real-World Temporal Data Evanusa, Matthew, Snehesh Shrestha, Vaishnavi Patil, Cornelia Fermüller, Michelle Girvan, and Yiannis Aloimonos. "Deep-Readout Random Recurrent Neural Networks for Real-World Temporal Data." SN Computer Science 3, no. 3 (2022): 222. ... we have developed a novel hybrid network, called Parallelized Deep Readout Echo State Network (PDR-ESN) that combines the deep learning readout with a fast random recurrent component, with multiple ESNs computing in parallel. We show the PDR-ESN architecture allows for different configurations of the sub-reservoirs, leading to different variants which we explore... |
![]() | AIMusicGuru: Music Assisted Human Pose Correction Shrestha, Snehesh, Cornelia Fermüller, Tianyu Huang, Pyone Thant Win, Adam Zukerman, Chethan M. Parameshwara, and Yiannis Aloimonos. "AIMusicGuru: Music Assisted Human Pose Correction." arXiv preprint arXiv:2203.12829 (2022). ...We present a method that leverages our understanding of the high degree of a causal relationship between the sound produced and the motion that produces them. We use the audio signature to refine and predict accurate human body pose motion models. We propose MAPnet (Music Assisted Pose network) for generating a fine grain motion model from sparse input pose sequences but continuous audio... |
![]() | When danger strikes: A linguistic tool for tracking America’s collective response to threats Choi, Virginia K., Snehesh Shrestha, Xinyue Pan, and Michele J. Gelfand. "When danger strikes: A linguistic tool for tracking America’s collective response to threats." Proceedings of the National Academy of Sciences 119, no. 4 (2022): e2113891119. ... we developed a threat dictionary, a computationally derived linguistic tool that indexes threat levels from mass communication channels. We demonstrate this measure’s convergent validity with objective threats in American history, including violent conflicts, natural disasters, and pathogen outbreaks such as the COVID-19 pandemic... |
![]() | Hybrid Backpropagation Parallel Reservoir Networks Evanusa, Matthew, Snehesh Shrestha, Michelle Girvan, Cornelia Fermüller, and Yiannis Aloimonos. "Hybrid Backpropagation Parallel Reservoir Networks." arXiv preprint arXiv:2010.14611 (2020). ... we propose a novel hybrid network, which we call Hybrid Backpropagation Parallel Echo State Network (HBP-ESN) which combines the effectiveness of learning random temporal features of reservoirs with the readout power of a deep neural network with batch normalization. We demonstrate the usefulness of our network on two complex real-world multi-dimensional time series datasets: a classification task for gesture recognition using skeleton keypoints from ChaLearn, and a regression task for the DEAP dataset for emotion recognition from EEG measurements... |
Publications
Publications