Speech Enhancement
Overview of throat microphone speech enhancement task and evaluation metrics.
Task Overview
The TAPS dataset addresses the challenge of enhancing speech quality from throat microphone recordings. When speech signals pass through skin and tissues, they experience significant degradation, particularly in high-frequency components.
Key Challenges
- • Loss of high-frequency speech information
- • Attenuation of unvoiced sounds
- • Filtering effects of skin and tissue layers
- • Microphone placement variations
Evaluation Framework
Speech Quality
- • PESQ (Perceptual Evaluation of Speech Quality)
- • STOI (Short-Time Objective Intelligibility)
- • CSIG (Signal Distortion)
- • CBAK (Background Noise)
- • COVL (Overall Quality)
Speech Content
- • Character Error Rate (CER)
- • Word Error Rate (WER)
- • Speech-to-Text Accuracy
Results Overview
Our baseline models demonstrate significant improvements in both speech quality and content preservation. Detailed results and model implementations can be found in our baselines documentation.
View Baseline ModelsResearch Opportunities
The TAPS dataset opens up several research directions:
- Advanced model architectures for speech enhancement
- Multi-task learning approaches
- Novel loss functions for better high-frequency reconstruction
- Real-time enhancement solutions