As part of #28, we have discussed how raw data would result on jittery rough data, even if the neural network used is theoretically as precise as a human eye predicting the facial movements of the subject. To compensate for jittery input, we will implement a sort of lag-compensation algorithm.
Background
John Carmack's work with Latency Mitigation for Virtual Reality Devices (source) explains that the physical movement from the user's head up to the eyes is critical to the experience. While the document is designed mainly for virtual reality, one can argue the methodologies used to provide a seamless experience for virtual reality can be applied for a face tracking application, as face tracking like HMDs, are also very demanding "human-in-the-loop" interfaces.
Byeong-Doo Choi, et al.'s work with frame interpolation using a novel algorithm for motion prediction would enhance a target video's temporal resolution, by using Adaptive OBMC. Such frame interpolation techniques according to the paper has been proven to give better results than the current algorithms used for frame interpolation in the market.
Strategy
As stated on the background, there are many strategies we can perform lag compensation for such raw jittery input from prediction data from the neural network, it is limited to these two strategies:
Frame Interpolation by Motion Prediction
Byeong Doo-Choi, et al. achieves frame interpolation by the following:
First, we propose the bilateral motion estimation scheme to obtain the motion field of an interpolated frame without yielding the hole and overlapping problems. Then, we partition a frame into several object regions by clustering motion vectors. We apply the variable-size block MC (VS-BMC) algorithm to object boundaries in order to reconstruct edge information with a higher quality. Finally, we use the adaptive overlapped block MC (OBMC), which adjusts the coefficients of overlapped windows based on the reliabilities of neighboring motion vectors. The adaptive OBMC (AOBMC) can overcome the limitations of the conventional OBMC, such as over-smoothing and poor de-blocking
According to their experiments, such method would produce better image quality for the interpolated frames, which is helpful for prediction in our neural network, however it comes with a cost of having to process the video at runtime, as the experiment is only done on pre-rendered video frames already.
View Bypass/Time Warping
John Carmack's work with reducing input latency for VR HMDs suggests a multitude of methods, one of them is View Bypass - a method achieved by getting a newer sampling of the input.
To achieve this, the input should be sampled once but can be used by both the simulation and the rendering task, thus reducing the latency for such. However, the input and the game thread must run in parallel and the programmer must be careful not to reference the game state otherwise it would cause a race condition.
Another method mentioned by Carmack is Time Warping, which he states that:
After drawing a frame with the best information at your disposal, possibly with bypassed view parameters, instead of displaying it directly, fetch the latest user input, generate updated view parameters, and calculate a transformation that warps the rendered image into a position that approximates where it would be with the updated parameters. Using that transform, warp the rendered image into an updated form on screen that reflects the new input. If there are two dimensional overlays present on the screen that need to remain fixed, they must be drawn or composited in after the warp operation, to prevent them from incorrectly moving as the view parameters change.
There are different methods of warping which is forward warping and reverse warping, and such warping methods can be used along with View Bypassing. However, the increased complexity for lag compensation of doing input with the main loop concurrently is possible as the input loop will be independent of the game state entirely.
Conclusion
Such strategies mentioned would allow us to have smoother experience, however, based on my personal analysis, I found that Carmack's solutions would be more feasible for a project of our scale. We simply don't have the team and the technical resources to do from-camera video interpolation as it would be computationally expensive to be implemented with minimal overhead.
area:documentation proposal priority:high