Overview

The TAPS dataset recording environment was carefully designed to ensure high-quality, synchronized audio recordings from both throat and acoustic microphones.

TAPS recording setup configuration
Recording setup showing throat microphone and acoustic microphone configuration

Hardware Configuration

Each recording session utilized the following hardware components:

  • Throat Microphone: Accelerometer-based contact microphone attached to the participant's throat area to capture vocal cord vibrations directly.
  • Acoustic Microphone: High-quality condenser microphone positioned at a consistent distance from the participant's mouth to record standard speech audio.
  • Audio Interface: Digital interface for simultaneous capture of both microphone inputs, ensuring synchronized recording.
  • Pop Filter: Used with the acoustic microphone to reduce plosive sounds and breath noise.
  • Monitoring Headphones: Closed-back headphones for participants to verify recording quality in real-time.

Environment Control

To ensure optimal recording quality, the following environmental conditions were maintained:

  • Sound-treated Room: All recordings were conducted in a sound-treated environment with minimal ambient noise and echo.
  • Consistent Positioning: Specific distances and angles were maintained between the participants and microphones across all recording sessions.
  • Temperature and Humidity Control: The recording environment was kept at a comfortable temperature and humidity level to avoid affecting vocal performance.
  • Background Noise Monitoring: Continuous monitoring of ambient noise levels to ensure they remained below acceptable thresholds.

Recording Protocol

Each recording session followed a structured protocol:

  1. Equipment Calibration: All recording equipment was calibrated at the beginning of each session.
  2. Microphone Placement: The throat microphone was carefully attached to the participant's throat area, while the acoustic microphone was positioned at an optimal distance.
  3. Level Setting: Input levels were adjusted for each participant to prevent clipping or excessive noise.
  4. Practice Readings: Participants were given time to familiarize themselves with the recording setup and material.
  5. Recording Session: Participants read the provided script at a natural pace with regular breaks between segments.
  6. Quality Verification: Each recording was reviewed for quality before proceeding to the next segment.

Technical Specifications

The recordings maintained the following technical specifications:

  • Sampling Rate: Acoustic microphone recordings at 16 kHz, throat microphone initially at 8 kHz (later upsampled).
  • Bit Depth: 16-bit for all recordings.
  • File Format: Uncompressed WAV format to preserve audio quality.
  • Channel Configuration: Mono recordings for each microphone type.
  • Storage: Original recordings stored with redundant backups to prevent data loss.

Additional Notes

The recording setup was designed to minimize variability between recording sessions while accommodating individual differences in participant physiology. Special attention was given to throat microphone placement, as this significantly impacts recording quality and consistency. All equipment was selected to ensure compatibility with the project's research goals, particularly the requirement for synchronized paired recordings suitable for deep learning applications.