作为未来互联网3.0的主要应用场景,元字审成为目前包括IT领域在内很多应用的热点话题。报告从基本的数据概念讲起,重点结合讲者主持的国家重点研究计划项目的研发进展,对目前元宇宙的一些机会和发展现状,提出了自己的一些理解和观点,进而针对工业互联网未来的应用需求,介绍了工业元宇宙的相关技术及发展趋势,进而讨论了智能技术在工业领域更多场景的落地应用。
Video Moment Retrieval (VMR) aims to retrieve a temporal moment that semantically corresponds to a language query from an untrimmed video. Connecting computer vision and natural language, VMR has drawn significant attention from researchers in both communities. The existing solutions for this problem can be roughly divided into two categories based on whether candidate moments are generated: Moment-based approach and Clip-based approach. Both frameworks have respective shortcomings: the moment-based models suffer from heavy computations, while the performance of clip-based models is familiarly inferior to moment-based counterparts. To this end. we design an intuitive and efficient Dual-Channel Localization Network (DCLN) to balance computational cost and retrieval performance. Meanwhile, despite their effectiveness, Moment-based and Clip-based methods mostly focus only on aligning the query and single-level chip or moment features, and ignore the different granularities involved in the video itself, such as clip, moment, or video, resulting in insufficient cross-modal interaction. To this end, we also propose a Temporal Localization Network with Hierarchical Contrastive Learning (HCLNet) for the VMR task. This report will detail these two works and also share our deeper insights.
主办:CCF
承办:CCF协同计算专业委员会、太原科技大学