Documentation
Software
Software Architecture
Overview of the software infrastructure, data processing pipelines, and utilities used in the TAPS dataset.
Data Processing Pipeline
Signal Processing
- • 5th-order Butterworth high-pass filter (50Hz cutoff)
- • DC offset removal
- • Signal normalization
- • Timing synchronization between devices
Data Formats
- • WAV files (32-bit)
- • Throat mic: 8kHz sampling rate
- • Acoustic mic: 16kHz sampling rate
- • Standard file naming convention
Software Tools
Data Collection Software
Custom-built software for simultaneous recording from throat and acoustic microphones.
Features
- • Real-time monitoring
- • Automatic file naming and organization
- • Recording quality validation
- • Session management
Data Processing Tools
Suite of tools for data preprocessing, validation, and analysis.
Capabilities
- • Automatic signal alignment
- • Noise reduction
- • Quality metrics calculation
- • Batch processing support
Data Structure
File Organization
dataset/ ├── train/ │ ├── speaker_001/ │ │ ├── p01_u00_mic.wav │ │ ├── p01_u00_acc.wav │ │ └── ... │ └── ... ├── eval/ │ ├── speaker_051/ │ │ ├── p51_u00_mic.wav │ │ ├── p51_u00_acc.wav │ │ └── ... │ └── ... └── metadata/ ├── transcripts.txt └── speaker_info.json