Efficient Real-Time Video Processing and Analysis

The recent developments in the research area of video processing and analysis have shown a significant shift towards more efficient, real-time, and context-aware solutions. There is a strong emphasis on optimizing video transcoding for live streaming, with frameworks that dynamically select presets and bitrates to enhance quality while managing computational resources. Additionally, there is a growing interest in online video captioning, where models generate detailed and temporally aligned captions without future frame access, improving video description comprehensiveness and frequency. Another notable trend is the development of online episodic memory systems for wearable devices, enabling real-time object localization and retrieval from past observations, which is crucial for assistive technologies. Self-supervised learning is also making strides in video instance segmentation for historical maps, reducing the need for manual annotation and improving geographic entity alignment. Furthermore, there is a focus on analyzing and optimizing bike-sharing systems to address issues like self-loop phenomena, enhancing service equity. Memory-based visual object tracking is advancing with distractor-aware models, improving robustness in the presence of distractors. Universal visual segmentation is being tackled with large language models, aiming for more complex reasoning and fine-grained understanding across image and video tasks. Text-driven video segmentation is also progressing, with models that retain contextual information in streaming scenarios. Lastly, there is innovation in automating the verification of large-scale POIs using street view data and in pricing public facilities to revalue private properties, leveraging large-scale urban data.

Noteworthy papers include one on optimizing video transcoding parameters for live streaming, which demonstrated significant PSNR gains and BD-rate reductions. Another notable contribution is the online dense video captioning model, which outperforms both offline and online methods while using less compute. The paper on online episodic memory visual query localization introduces a novel framework that outperforms offline methods in real-time object tracking and retrieval.

Sources

Optimal Transcoding Preset Selection for Live Video Streaming

Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning

Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory

Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps

Multiscale spatiotemporal heterogeneity analysis of bike-sharing system's self-loop phenomenon: Evidence from Shanghai

A Distractor-Aware Memory for Visual Object Tracking with SAM2

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation

DuMapper: Towards Automatic Verification of Large-Scale POIs with Street Views at Baidu Maps

MONOPOLY: Learning to Price Public Facilities for Revaluing Private Properties with Large-Scale Urban Data

DistinctAD: Distinctive Audio Description Generation in Contexts

Built with on top of