At Nomagic, we teach robots the real world. It’s not a secret that the real world is immensely complex and to perform some tasks efficiently you wish you had an additional pair of eyes. Some of our robots had exactly the same feeling so we decided to help them.
In this post, we share our experience in building a system that allows you to quickly capture color and depth photos from one of up to 16 Intel RealSense cameras with minimal CPU consumption using Robot Operating System (ROS1). Source code included!
At the first glance, it may appear you could buy a few USB 3 Hubs, connect all the cameras and call it a day. However, getting a sufficient number of USB 3 ports is only a surface-level challenge.
The real challenge is caused by limited USB 3 bandwidth. There is a great article by Intel explaining various nuts and bolts related to connecting multiple cameras. The main take-away is that you should try to maximize the number and quality of USB 3 controllers handling the cameras. Since our PCs have only a single USB controller, we decided to buy this PCIe USB card with a separate USB controller for each port: https://www.delock.com/produkte/2072_Type-A/89365/merkmale.html
To achieve the desired number of ports (16) and make sure that each camera has a sufficient power supply, we bought 4 additional powered external USB hubs: https://www.delock.de/produkte/G_64053/merkmale.html
Connecting all the cameras was great fun:
You could hide from many robots, but not from this one 🙂
Most of the time it’s sufficient for a robot to use a single camera, and if they use more, it’s usually between 2–4 cameras. Thus, it’s no doubt that connecting 16 cameras to a single PC sounds like a bold idea. Let’s do it then!
We attached and activated the cameras one by one and observed the depth video on a screen using ros-realsense and RViz. After a while we noticed that FPS started to drop down. A quick investigation revealed the root cause:
One of the signs that the cameras are working 🙂
The meaning of the screenshot above is probably very clear for Linux users — full CPU utilization, which is a bad thing, especially if you want to run other programs on the machine. On our PC, a single camera consumed around 1.2 CPU cores. The next step was to investigate what is the source of the CPU load.
Profiling results showing that CPU is used mostly to perform filtering.
We did a little profiling and discovered that the most of it comes from the postprocessing done by ros-realsense. It turns out that each frame that arrives from the camera is being filtered to remove noises and fill missing data using spatial and/or temporal interpolations. Filtering of a single frame takes a few milliseconds, so around 10 cameras is enough to fill up even a beefy CPU. Filtering in ros-realsense is optional, however it greatly improves depth image quality, so we wanted to keep it.
Our next observation was that we don’t actually need to filter all the frames, since we end up with using a single frame every few seconds. Therefore, we forked and modified ros-realsense to take that into account — instead of filtering every possible frame, we store the most recent frames in a ring buffer and we apply filters only after receiving a request via ROS Service call. Keeping and filtering a number of frames (instead of just one) is necessary because some parts of the filtering depend on the past images.
The aforementioned optimization greatly reduced CPU usage to around 20% per camera. However, the battle wasn’t over yet. Low framerate persisted, however the pattern changed — instead of affecting all the cameras, it was much more localized — causing arbitrary cameras to reduce their performance intermittently. The (quite predictable) culprit was found in ros-realsense logs, which were full of messages looking like this one:
13/05 10:16:06,190 WARNING [140247325210368] (backend-v4l2.cpp:1057) Incomplete frame received: Incomplete video frame detected! Size 541696 out of 1843455 bytes (29%)
Even though we were below theoretical USB throughput, it was hard for cameras to send a complete frame. We lowered framerate to 6 frames per second and disabled infrared streams. Fortunately, that was enough to achieve stability.
When the battle was over, we decided to test our solution to check what latencies we can achieve.
Color frames (blue) do not require any processing, so they can be received very quickly — usually under 10 milliseconds. Requesting a depth frame (green) requires performing pending filtering, which takes around 150ms, depending on the luck in thread scheduling. Getting aligned depth (red) requires a little bit of additional work, hence the slight offset from the raw depth frames on the latency distribution above.
Operating cameras at low framerate and long processing times causes that the frames provided by our service are slightly aged. However, if your scene is static most of the time like ours, an age of 0.5 second should be fine.
The histogram above shows the age of received frames, measured using timestamps created at the beginning of exposure (hence the minimal age is equal to the inter-frame interval — 166 milliseconds).
To sum up — our modification can help significantly reduce the CPU usage caused by Intel RealSense if you need to get a single frame once in a while. With this modification and some additional hardware, it is possible to connect and use over a dozen cameras.
The source code of the modification can be found here: https://github.com/NoMagicAi/realsense-ros
Piotr Rybicki