Z-ENG: Monocular Depth Estimation
Monocular depth estimation is a hot topic today. Algorithms fulfilling this task can be found, e.g., in the field of autonomous vehicles. Depth estimation can be treated as the first step of inferring the geometry of a scene based on a 2 dimensional image. Technically speaking, the goal of depth estimation is to assign a depth value to each pixel of an image. Depth estimation can be done more easily by involving a stereo camera or a LIDAR device. However, such hardware devices are more cost intensive. Furthermore, 2D images are available on a much larger scale. An example of this problem can be found at: https://keras.io/examples/vision/depth_estimation/.
In this project, monocular depth estimation is applied to improve the gesture identification task. Gestures can be important to understand, e.g., the emotions, skills or competencies of a particular person. The depth coordinate can be important to distinguish between different gestures with a higher accuracy. For example, looking at a person with a web camera, touching the nose and just explaining something maybe shows the hands in the same position. However, if we know the depth coordinate, the two mentioned gestures can be distinguished easily.
The goal of this project is to conduct monocular depth estimation in an office environment, by analysing records of people in front of a web camera. The true depth coordinates are delivered with the help of a stereo camera. The student will work on the following tasks:
- Collect videos with a stereo camera in order to have a training dataset
- Study the already existing machine learning algorithms in this field
- Apply the existing algorithms on the collected data and hopefully improve the performance of these algorithms.