American English Full-Duplex Conversational Dataset

Marketplace

Two-speaker American English conversations captured in full-duplex stereo, covering everyday topics with overlapping speech, backchannels, and natural disfluencies preserved.

Overview

Naturalistic, two-speaker American English conversations captured at studio quality in full-duplex stereo. Pairs of native American English speakers from across the continental United States, with balanced coverage of West Coast, Midwest, Southern, and Northeast accents discuss everyday topics for the full duration of the session — no read scripts, no scene cuts. Each recording preserves real overlapping speech, backchannels, hesitations, and code-switching, so downstream models train on the way American English actually sounds in the wild. Every clip is collected from paid contributors with explicit consent, scene-level provenance, and metadata for speaker demographics, dialect, and acoustic environment.

Key highlights

  • 01

    West Coast, Midwest, Southern, and Northeast accents balanced across paired sessions, with explicit dialect tags per speaker.

  • 02

    Casual American discourse markers ("like", "you know", "I mean"), filled pauses, and false starts preserved verbatim — not normalised away.

  • 03

    Sports references, pop culture, work culture, and political discourse drawn from real two-speaker conversations rather than read scripts.

  • 04

    Disfluencies — filled pauses (uh, um, hmm), false starts, self-repairs, hesitations, laughter, sighs, breath, and throat clears — are preserved with utterance-level timestamps rather than normalised away, so models can learn from them or filter them out as a first-class signal.

Technical specifications

Coverage

Hundreds of paired sessions from native American English speakers across the United States — coverage extends to bespoke dialects, age groups, and topical targets on request.

Capture specs

Stereo full-duplex audio at 48 kHz / 24-bit per channel from studio-grade microphones, with per-speaker channel isolation, calibrated noise floor, and continuous capture for the full lifespan of each session — not cherry-picked moments.

Annotations

Every session ships with rich speaker / contributor metadata (age, gender, region, dialect, native language, acoustic environment) plus an utterance-level annotation layer: emotion tags (joy, frustration, neutral, surprise, sadness, anger, amusement, empathy, and more), topic tags spanning everyday domains (work, family, sports, travel, health, finance, technology, food, pop culture, politics, education), intent labels (question, agreement, backchannel, hedge, interruption, repair, opinion), turn-taking markers (overlap onset/offset, gap, hold, yield), and prosody cues (pitch contour, stress, laughter, sighs, hesitation, code-switch boundaries). Custom annotation schemas — domain-specific intents, fine-grained emotion taxonomies, named-entity spans, sentiment scoring, or any task-specific labels — are available on request.

Use cases

  • Full-duplex conversational AI training and evaluation
  • Speaker diarization and American English ASR / TTS modelling
  • Turn-taking, backchannel, and overlap-handling research
  • Emotion-aware and intent-aware voice agent fine-tuning
  • Voice agent benchmarks for natural, multi-party conversation

Request samples

Share your use case and we'll send sample clips, pricing, and recommended next steps for your pipeline.

What are you interested in?
How do you plan to use the data?

More datasets

French Full-Duplex Conversational Dataset

Naturalistic French conversations between native speakers, captured in full-duplex stereo with overlapping speech and authentic turn-taking.

Languages

French

Countries

FranceBelgiumCanada

Mandarin Full-Duplex Conversational Dataset

Native-speaker Mandarin Chinese conversations recorded in full-duplex stereo across mainland and overseas dialect regions.

Languages

Chinese Mandarin

Countries

ChinaTaiwan
View all
Ready to bring AI into the real world?