Precomputing computer vision for looping video

I’m working on a project where I’m going to have some looping video, and I’m using CV to perform object detection on each frame and get the centroids and size of each found object in each frame.

It’s performing a bit slower than I’d like, so I thought the best way to optimize it is to precompute the objects in each video (because I’m looping the same 2-3 videos constantly). This is actually a bit trickier than it looks, as most video players don’t look up frame-by-frame by rather have a function like getCurrentTime(), getCurrentSeconds(), or getPositionInSeconds() which indexes by time rather than frame number, so even if I process the entire video and export a big JSON file, figuring out how to pull out the right precomputed “frame” isn’t as straightforward as I’d like since it’s more like “find the frame with the timestamp closest to 1.23532” rather than “grab the object data for frame 233”

Has anyone tried anything like this before, and perhaps has a clever solution?

This would totally depend on the type of data you’re working with, but you could do a rudimentary time to frame conversion and then interpolate between two known frames by the fract of the result. For example if your currentTime was 1.2, you would use

lerp ( getDataAtFrame ( secondsToFrame ( 1.0f ) ), getDataAtFrame ( secondsToFrame ( 2.0f ) ), 0.2f );

assuming you had stored frame data at 1 second intervals. Again, this is heavily dependent on the data you’re working with, but assuming it’s something like blob tracking or something, this would work fine.

Are you using my video player for the video decoding? I’ve seen a method in the underlying engine that might allow stricter frame control that I could add if that would help?

I’ve used HPV video in the past when frame accuracy was of the utmost importance (syncing across multiple machines, for example), so that might be an option if you have control of the source video and some patience, encoding is slow and the files are large but this might be ok given your specific problem.

Oh, that’s a really clever idea for handling the interpolation; it’s basically creating key-frames. I like it! The tricky part is then dealing with the creation / destruction of each object, but that shouldn’t be too tricky.

I am using your video player for the project; it’s been A+ excellent so far. I saw that FrameStep method and wasn’t sure if that was the way to go… but so far, it seems like precomputing the object tracking is the bottleneck, so I’m going to give that a go first.

thanks so much!