Article Index

Data Acquisition

In the context of hand tracking, data is provided to the system in two ways:

  • Sensors
  • Datasets


A sensor takes data using physical properties of the outside world and converts it to digital information that can be processed by electronic systems. For 3D hand tracking, sensors are of mainly three types.

  1. Mount based sensors
  2. Multi touch sensors
  3. Vision based sensors

Mount based sensors
Mount based sensors are ones which are worn (or mounted) on the hand and they provide data to the system. Examples are accelerometers and gyroscopes which can track the relative orientation and position from a reference point for each finger and hand [Prisacariu et al. 2012]. There are different methods for placing the sensors on the hand, such as placing them on each digit of the finger or on strategic locations to calculate the remaining positions. It is highly accurate and capable of tracking the hands to sub-millimetre levels, but it is uncomfortable to wear at times and prevent users from feeling the actual environment. It is also very expensive and cannot be afforded at a larger scale for many people.

Example of a mounted sensor
Figure 1: An example of a mounted sensor. Image courtesy [Guracak, 2016]

Multi-touch sensors

An example of a multi touch sensor
Figure 2: An example of a multi touch sensor. Image courtesy [Rosenberg, 2016]

Multi-touch screen sensors are commonly used in smartphones. They record the point of contact of the human hand and the device. Common examples include the action of pinching which is usually to zoom a selection on screen or drag two fingers in a parallel direction to scroll through a document. Although accurate at certain tasks, a disadvantage with this type is that it only tracks the position of the tip of the fingers or any part of them in contact with the device and doesn't track the position of the hand itself or its orientation in space. Hence this kind of sensor is useful only for gesture-based interactions and not for tracking the whole hand.

Take a Break and Laugh with CompleteGATE Fun Bytes. Fun Bytes is a collection of comic / funny / humorous content, Especially for you!

Vision based sensors

Vision-based sensors capture images in the form of frames and send it to the system. They are useful for tracking as they do not require (for some types of sensors) electronic devices to be worn on the hand. They can also provide a larger distance from the user to the screen, unlike the multi-touch sensors. They can be broadly classified into 2D sensors such as the common webcams, and 3D depth sensors like Kinect and Leapmotion. The former only records the image while the latter also records the depth spectrum of the image. This enables a 3D perspective of the image that can be tracked more efficiently. However, the algorithms for these type of sensors is, in general, computationally expensive. In order to reduce the complexity, colour gloves are used that can be easily recognised by the system for tracking. However, this method comes with the same disadvantage as the mounted sensors as they interfere with the comfort of the user and their sense of touch.


Examples of vision based sensors
Figure 3: Examples of vision based sensors from Sharp et al. [2015] and Sun et al. [2014]

Vision-based sensors also have variants which are mounted on the human hand or on other parts such as the head as well. [Sun et al. 2014] made a variant which uses a gaze-directed camera. It is a device which is worn on the head and manipulates the focus of the person wearing it. However, they suffer from the same disadvantage of mounted sensors regarding comfort and portability.



As shown by the works of Tagliasacchi et al [2015], using only the data provided by the sensors will be inaccurate, due to the noise created by the sensor itself or other external factors. Hence the output sometimes even creates hand poses that are not actually plausible by a real human, such as a finger bent in the wrong direction. This kind of problem is tackled by the use of datasets. These datasets have a set of hand positions that are plausible by the human hand and a mapping that will tell which is the next likely hand position given one. An example of such a dataset is the one made by Schroder et al. [2014] which is made as a public dataset and is used by Tagliasacchi et al [2015] for their algorithm. This will be discussed in detail in a later section.

Code: 5 4514
You Might Like
You Might Like