Article Index

Three-dimensional hand tracking has attracted increasing attention and research interests in fields such as virtual reality, HCI and computer vision. Due to depth sensors, there is a plethora of new applications and methods for tracking the human hand, which were once very difficult with only 2D cameras. Another milestone is when predictive algorithms came to picture rather than tracking based on images alone. This survey presents some of the recent types and methods used for 3D hand tracking with the focus on vision-based tracking algorithms. We will first review the conventional methods that were used and then the newer methods with smart predictive algorithms.

The human hand is delicate and has a very intricate structure. The various muscles and joints in the hand provide a great range of movement and precision, which allows many different poses and gestures. These gestures are important in communicating the thoughts of an individual and can be useful to interact with devices. Hence, 3D hand tracking is of great importance to HCI, and can also be used for artistic, medical, or scientific purposes, human sign language interpretation, and for many other tasks.

Hand gesture recognition is a subset of hand tracking in which users perform gestures that the system identifies as unique gestures and corresponds accordingly [Song et al. 2015]. This kind of system will only see the action performed by the user's hand and may also find the position of the hand as well. This type of work can be found in hand gesture recognition literature such as Cheng et al. [2016]

Parts of hand tracking

Hand tracking has three main parts: Data acquisition, recognition and application. In the first part, data will be collected from the user or environment for processing. It is in this part where the sensors and prior data such as public datasets are used. The sensors are used to get the video feed (or even the depth video) of the human hands. Datasets are used to provide prior data for more accurate recognition and results during processing.

The second part is recognition and it involves analysing the data to arrive at the results. In this part, the data received from the first part is taken and then processed for finding the orientation and position of the hand (or hands). The algorithm used will depend on the sensor used to retrieve the data and it must be able to give the results in real time.

The third part is application and it is the end goal of the whole process used from the beginning. One of the main goals for 3D tracking is its use in HCI, such as interacting with objects in virtual or augmented reality [Benko et al. 2012]. Another popular use is in gaming platforms where the user will use their hands and perform actions and gestures to move through the different levels of the game.


There are three main motivations for implementing 3D tracking of a complex system like the human hand. They are:
  • Tracking a deformable object with many degrees of freedom such as the human hand is still a very challenging and difficult problem to resolve.
  • The current improvements in depth sensors increase the number of ways to perform efficient tracking methods and algorithms.
  • 3D hand tracking has a large influence on HCI. From a medical perspective, there are many applications where it can prove useful.
Take a Break and Laugh with CompleteGATE Fun Bytes. Fun Bytes is a collection of comic / funny / humorous content, Especially for you!

Problem Statement

The overall aim of 3D tracking is simple: To track the user's hand in 3D space precisely and accurately. But there are several side goals that are required to be achieved in order to get the optimum effect. The hand must be tracked such that the rendered output is perceived as a true replication of the human hand. The requirements of hand tracking are the following: 
  • According to the human sense of perception, the minimum rate of which the human can perceive one image from another image is on average 67 milliseconds (15 frames per second). Beyond which is perceived as a continuous motion and below which will be considered as separate images. The system must provide an accurate position and orientation of the hand within this time for each frame so the human eyes can perceive it as a constant motion.
  • The rendered hand must not jitter in 3D space as this does not conform to the true movement of the hand. The movement must be smooth and continuous.
  • The rendered hand poses must conform to the true poses of the human hand, such as the realistic movement of the fingers (one single digit of a finger will not move 90 degrees forward).

Brief overlook

The remainder of this article is organized in the following manner:

Data Acquisition : explain about the various methods to get data from the user for processing along with the advantages and disadvantages for the same.

Recognition: Elucidates the different ways to get the position and orientation of the human hand in 3D space.

Recent Methods: Shows two of the recent algorithms along with evaluations against the other state-of-the-art implementations.

Conclusion and Future work: Concludes the article in which we will encapsulate the various perspectives and methods derived from the study and suggest an optimum way to track the human hand in 3D space and provide accurate results in real time.

Code: 5 4514
You Might Like
You Might Like