24
0
0
2024-11-17

Special Issue on Multi-Speaker, Multi-Microphone, and Multi-Modal Distant Speech Recognition Submission Date: 2024-12-02 Automatic speech recognition (ASR) has significantly progressed in the single-speaker scenario, owing to extensive training data, sophisticated deep learning architectures, and abundant computing resources. Building on this success, the research community is now tackling real-world multi-speaker speech recognition, where the number and nature of the sound sources are unknown and changing over time. In this scenario, refining core multi-speaker speech processing technologies such as speech separation, speaker diarization, and robust speech recognition is essential, and the effective integration of these advancements becomes increasingly crucial. In addition, emerging approaches, such as end-to-end neural networks, speech foundation models, and advanced training methods (e.g., semi-supervised, self-supervised, and unsupervised training) incorporating multi-microphone and multi-modal information (such as video and accelerometer data), offer promising avenues to alleviate these challenges. This special issue gathers recent advances in multi-speaker, multi-microphone, and multi-modal speech processing studies to establish real-world conversational speech recognition.


Guest editors:


Assoc. Prof. Shinji Watanabe (Executive Guest Editor)

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America

Email: shinjiw@ieee.org

Areas of Expertise: Speech recognition, speech enhancement, and speaker diarization


Dr. Michael Mandel

Reality Labs, Meta, Menlo Park, California, United States of America

Email: mmandel@meta.com

Areas of Expertise: Source separation, noise robust ASR, electromyography


Dr. Marc Delcroix

NTT Corporation, Chiyoda-Ku, Japan

Email: marc.delcroix@ieee.org; marc.delcroix@ntt.com

Areas of Expertise: Robust speech recognition, speech enhancement, source separation and extraction


Dr. Leibny Paola Garcia Perera

Johns Hopkins University, Baltimore, Maryland, United States of America

Email: lgarci27@jhu.edu

Areas of Expertise: Speech recognition, speech enhancement, and speaker diarization, multimodal speech processing


Dr. Katerina Zmolikova

Meta, Menlo Park, California, United States of America

Email: kzmolikova@meta.com

Areas of Expertise: Speech separation and extraction, speech enhancement, robust speech recognition


Dr. Samuele Cornell

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America

Email: scornell@andrew.cmu.edu

Areas of Expertise: Robust speech recognition, speech separation and enhancement


Special issue information:


Relevant research topics include (but are not limited to):


Speaker identification and diarization

Speaker localization and beamforming

Single- or multi-microphone enhancement and source separation

Robust features and feature transforms

Robust acoustic and language modeling for distant or multi-talker ASR

Traditional or end-to-end robust speech recognition

Training schemes: data simulation and augmentation, semi-supervised, self-supervised, and unsupervised training for distant or multi-talker speech processing

Pre-training and fine-tuning of speech and audio foundation models and their application to distant and multi-talker speech processing

Robust speaker and language recognition

Robust paralinguistics

Cross-environment or cross-dataset performance analysis

Environmental background noise modeling

Multimodal speech processing

Systems, resources, and tools for distant Speech Recognition


In addition to traditional research papers, the special issue also hopes to include descriptions of successful conversational speech recognition systems where the contribution is more in the implementation than the techniques themselves, as well as successful applications of conversational speech recognition systems. For example, the recently concluded seventh and eighth CHiME challenges serve as a focus for discussion in this special issue. The challenge considered the problem of conversational speech separation, speech recognition, and speaker diarization in everyday home environments from multi-microphone and multi-modal input. Seventh and eighth CHiME challenges consist of multiple tasks based on 1) distant automatic speech recognition with multiple devices in diverse scenarios, 2) unsupervised domain adaptation for conversational speech enhancement, 3) distant diarization and ASR in natural conferencing environments, and 4) ASR for multimodal conversations in smart glasses. Papers reporting evaluation results on the CHiME-7/8 datasets or other datasets dealing with real-world conversational speech recognition are equally welcome.


Manuscript submission information:


Tentative Dates:


Submission Open Date: August 19, 2024

Manuscript Submission Deadline: December 2, 2024

Editorial Acceptance Deadline: September 1, 2025


Contributed full papers must be submitted via Computer Speech & Language online submission system (Editorial Manager®): https://www.editorialmanager.com/ycsla/default2.aspx. Please select the article type “VSI: Multi-DSR” when submitting the manuscript online.


Please refer to the Guide for Authors to prepare your manuscript: https://www.elsevier.com/journals/computer-speech-and-language/0885-2308/guide-for-authors


For any further information, the authors may contact the Guest Editors.


Keywords:


Speech recognition, speech enhancement/separation, speaker diarization, multi-speaker, multi-microphone, multi-modal, Distant Speech Recognition, CHiME challenge

登录用户可以查看和发表评论, 请前往  登录 或  注册


SCHOLAT.com 学者网
免责声明 | 关于我们 | 用户反馈
联系我们: