I am using an Intel RealSense D435 for a project at the moment. The SDK is solid, but lacks a great amount of features. For example, there is no support for human tracking, let alone skeleton tracking. It only supports connecting/disconnecting, obtaining the color and depth streams, correcting the camera shift between color and depth cameras and creating a point cloud from the depth stream.
The SDK supports multithreading through the use of concurrent buffers and is pretty stable.
To align multiple cameras, I wrote a tool to change each camera’s virtual position, height and pan&tilt angles. Results are more than adequate for our use case. I tried using Perspective-N-Point calibration, but could not get OpenCV’s solvePnP to work at all. I honestly believe it’s broken.
Realtime is a tough thing to crack. The best way is definitely something like Paul suggested, and to know the real-world positions of your cameras and position them similarly in code. Otherwise, use some kind of real world calibration like openCVs checkerboard.
It’s something the guys at http://www.depthkit.tv/ have also done a lot of work on with pretty cool results. They are more focused on volumetric filmmaking, but still super cool.
I don’t know if there is necessarily a best practices for it, but at least for Connected Worlds, Theo did a really nice video where he broke down some of the challenges they had while making it https://vimeo.com/131665883