Speech Enhancement

Overview of throat microphone speech enhancement task and evaluation metrics.

Task Overview

The TAPS dataset addresses the challenge of enhancing speech quality from throat microphone recordings. When speech signals pass through skin and tissues, they experience significant degradation, particularly in high-frequency components.

Key Challenges

  • • Loss of high-frequency speech information
  • • Attenuation of unvoiced sounds
  • • Filtering effects of skin and tissue layers
  • • Microphone placement variations

Evaluation Framework

Speech Quality

  • • PESQ (Perceptual Evaluation of Speech Quality)
  • • STOI (Short-Time Objective Intelligibility)
  • • CSIG (Signal Distortion)
  • • CBAK (Background Noise)
  • • COVL (Overall Quality)

Speech Content

  • • Character Error Rate (CER)
  • • Word Error Rate (WER)
  • • Speech-to-Text Accuracy

Results Overview

Our baseline models demonstrate significant improvements in both speech quality and content preservation. Detailed results and model implementations can be found in our baselines documentation.

View Baseline Models

Research Opportunities

The TAPS dataset opens up several research directions:

  • Advanced model architectures for speech enhancement
  • Multi-task learning approaches
  • Novel loss functions for better high-frequency reconstruction
  • Real-time enhancement solutions