Skip to content

ROS project for the recognition of human actions, from RGBD input

Notifications You must be signed in to change notification settings

AndreAmaduzzi/kuka_action_recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Deep Learning – based Human Actions Recognition in a Collaborative Robotics Environment

ROS project for the recognition of human actions, from RGBD input

TO SEE A DEMO, SKIPE HERE: Demo

Overview

This ROS project allows to recognize some human actions performed in a manufacturing environment, from RGB-D input data. The following image illustrates the pipeline of the project:

Each task is executed by a ROS node, implemented in Python or C++. Specifically, the ROS nodes responsible for the execution of the pipeline are:

  • mask_rcnn.py, for 2D object segmentation;
  • pcd_segmentation.py, for RGBD to point cloud transformation;
  • human_body_detection.py, for 3D human body detection;
  • recognition.cpp, for all the remaining steps.

Input data

The algorithm can take as input any pair of color and depth images, which can be transferred through ROS messages sensor_msgs/Image.

The algorithm was tested on the ROS topics:

  • camera/rgb/image_rect_color
  • camera/depth_registered/hw_registered/image_rect_raw

Main ROS message

The information of each object detected in the scene is stored in a custom ROS message object.msg, hosting the following fields:

  • std_msgs/Header header, with the header of the input frame;
  • sensor_msgs/Image mask: an array of pixels with values 0 and 1 representing the seg- mentation mask of the object;
  • geometry_msgs/Point[] bb_2d: 2 points (xmin,ymin) and (xmax,ymax) of the bounding box predicted by the 2D Object Segmentation stage;
  • float64 score: confidence score of the object detection;
  • sensor_msgs/PointCloud2 pointcloud: 3D point cloud of the object, computed by the node pcd_segmentation.py;
  • label_object label:
    • string class_name: name of the current object class;
    • uint8 id: id of the object in scene (integer number);
    • float32[] color: 3D vector (r, g, b) describing the RGB value for the point cloud of the current object, assigned during the instance segmentation.

The information about all the objects in the scene is stored as a custom ROS message called objects_array:

  • object[] objects

Each ROS node executes a specific step of the pipeline and updates the fields of a ROS message objects_array.

2D Object Segmentation

In order to recognize the objects in the scene, the instance segmentation model Mask-RCNN was applied. This model belongs to the Tensorflow Object Detection API. All the details about the dataset and the training will be found in another repository on this Github profile. The first ROS node mask_rcnn.py is responsible for the inference of Mask-RCNN on the RGB input from the camera. The output produced by this processing unit consists in the instance segmentation annotations for each object in the camera frame (2D bounding boxes, segmentation masks, scores and labels). The functions used within this node belong to the library OpenCV.

RGBD to Point Cloud

The ROS node pcd_segmentation.py is in charge of the estimation of a 3D point cloud for each object which was detected in the scene by Mask-RCNN. The picture below summarizes the statement of the problem which is tackled at this step. The point cloud is estimated thanks to the depth information from the camera and the output of Mask-RCNN, by applying some functions belonging to the library Open3D. This unit updates the fields of the main ROS message, by including the coordinates of the 3D points being part of each point cloud.

3D Bounding Box Estimation

This step is in charge of the estimation of a 3D bounding box around each point cloud. Specifically, the algorithm estimates oriented bounding boxes, by using the library PCL. The picture below shows an example of the obtained result.

3D Human body detection

A technology for 3D human pose estimation was integrated with the overall project. This solution was designed by KUKA and it is based on the existing technology OpenPose. This algorithm provides the 3D keypoints of the main human body parts.

Analysis of the spatial relations among the objects

On the basis of all the information obtained before, the algorithm can identify the fundamental spatial relations among the objects in the scene. The solution relies on 3D data coming from the human pose estimation task and the 3D bounding boxes of the objects. For each frame, the relative positions of the objects are evaluated from a geometric point of view, in order to extract relations among each pair of elements. This solution is based on the paper F. Ziaeetabar, E. E. Aksoy, F. Wörgötter and M. Tamosiunaite, "Semantic analysis of manipulation actions using spatial relations," 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 2017, pp. 4612-4619.

Human Action Recognition

At this point, once the spatial relations have been computed, the system can classify the action which is being performed in front of the camera, thanks to a machine state - based solution. Within such a state machine, the final state represents the label of the action.

The state machine was manually coded, in order to provide the recognition of the following activities:

  1. Placing an object inside a box;
  2. Repairing a box with some tape;
  3. Repairing a camera with a screwdriver / drill / tape;
  4. Working with the computer;
  5. Texting with the phone.

Each activity is characterized by a sequence of states. The transitions between two states is dictated by several situations, such as specific spatial relations among objects (e.g. phone behind keyboard), the user holding an object (e.g. user holding a drill) or the presence of a particular object (e.g. box present in the scene). The following graph shows the state machine which has been used to detect the action "placing an object inside a box" and "repairing a box".

The images below show some steps of the execution of an action.
Placing a mouse inside a box

Results

Overall, the project lead to the following results:

  1. The accuracy of the method is in line with state-of-the-art research papers;
  2. The accuracy and robustness of the algorithm is strongly affected by the accuracy of the object detection.

Novelty of the project

This method is characterized by the following novel aspects:

  1. Focus on human activities in an industrial context;
  2. 3D-based activity recognition, leading to a more robust analysis of the geometric relations;
  3. Integration of a 3D human body detection with object segmentation.

Demonstration

Here, you can see a video showing the system running:

From left to right: 3D human body tracking, Instance Segmentation with Mask-RCNN, Point Clouds of the objects, frame with the label of the current state

About

ROS project for the recognition of human actions, from RGBD input

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published