Post Processing
Description of signal processing techniques and data preparation procedures used in the TAPS dataset.
Data Mismatch Correction
One of the key challenges in creating the TAPS dataset was addressing the timing differences between throat microphone and acoustic microphone signals. These mismatches occur due to several factors:
- Variations in speakers' larynx and oral structures
- Differences in phoneme production locations
- Distance between the acoustic microphone and speaker's lips
Technical Details
For detailed information about data mismatch analysis and correction methods, including:
- Mismatch variation analysis
- Impact of microphone distance
- Speaker-dependent variations
- Phoneme-dependent timing differences
Please refer to the "Technical Validation" section of our paper [Link to paper].
Background Noise Reduction
To ensure high-quality reference signals, we applied noise reduction to the acoustic microphone recordings using the Demucs speech enhancement model.
Process Details
- • Used Demucs pretrained causal version
- • Applied to acoustic microphone signals only
- • Preserved original signal characteristics
- • Minimal impact on speech content
Results
- • Reduced background noise
- • Enhanced signal clarity
- • Improved reference quality
- • Maintained natural speech characteristics
Note
The noise reduction was applied only to minimize minor background noise in the acoustic microphone recordings. The throat microphone signals were preserved in their original form to maintain the authenticity of the dataset.
Complete Processing Pipeline
-
High-pass Filtering
Applied 5th-order Butterworth high-pass filter with 50 Hz cut-off frequency to reduce gravitational acceleration effects
-
Timing Alignment
Corrected timing differences between throat and acoustic microphone signals
-
Noise Reduction
Applied Demucs-based noise reduction to acoustic microphone recordings
-
Trimming
Removed silent segments at the beginning and end of recordings
-
Quality Verification
Manual review of evaluation set utterances for accuracy