Rethinking classification results based on read speech, or: why improvements do not always transfer to other speaking styles

Barbara Schuppler

doi:10.1007/s10772-017-9436-y

Rethinking classification results based on read speech, or: why improvements do not always transfer to other speaking styles

Barbara Schuppler^*

^*Corresponding author for this work

Institute of Signal Processing and Speech Communication (4420)

Research output: Contribution to journal › Article › peer-review

Abstract

With the growing interest among speech scientists in working with natural conversations also the popularity for using articulatory–acoustic features as basic unit increased. They showed to be more suitable than purely phone-based approaches. Even though the motivation for AF classification is driven by the properties of conversational speech, most of the new methods continue to be developed on read speech corpora (e.g., TIMIT). In this paper, we show in two studies that the improvements obtained on read speech do not always transfer to conversational speech. The first study compares four different variants of acoustic parameters for AF classification of both read and conversational speech using support vector machines. Our experiments show that the proposed set of acoustic parameters substantially improves AF classification for read speech, but only marginally for conversational speech. The second study investigates whether labeling inaccuracies can be compensated for by a data selection approach. Again, although an substantial improvement was found with the data selection approach for read speech, this was not the case for conversational speech. Overall, these results suggest that we cannot continue to develop methods for one speech style and expect that improvements transfer to other styles. Instead, the nature of the application data (here: read vs. conversational) should be taken into account already when defining the basic assumptions of a method (here: segmentation in phones), and not only when applying the method to the application data.

Original language	English
Pages (from-to)	699-713
Number of pages	15
Journal	International Journal of Speech Technology
Volume	20
Issue number	3
DOIs	https://doi.org/10.1007/s10772-017-9436-y
Publication status	Published - 1 Sept 2017

Keywords

Articulatory–acoustic features
Conversational speech
Pronunciation variability
Segments

ASJC Scopus subject areas

Software
Language and Linguistics
Human-Computer Interaction
Linguistics and Language
Computer Vision and Pattern Recognition

Access to Document

10.1007/s10772-017-9436-y

Cite this

@article{b00b6da6e1f04da2aed5dd7c7ab18b47,

title = "Rethinking classification results based on read speech, or: why improvements do not always transfer to other speaking styles",

abstract = "With the growing interest among speech scientists in working with natural conversations also the popularity for using articulatory–acoustic features as basic unit increased. They showed to be more suitable than purely phone-based approaches. Even though the motivation for AF classification is driven by the properties of conversational speech, most of the new methods continue to be developed on read speech corpora (e.g., TIMIT). In this paper, we show in two studies that the improvements obtained on read speech do not always transfer to conversational speech. The first study compares four different variants of acoustic parameters for AF classification of both read and conversational speech using support vector machines. Our experiments show that the proposed set of acoustic parameters substantially improves AF classification for read speech, but only marginally for conversational speech. The second study investigates whether labeling inaccuracies can be compensated for by a data selection approach. Again, although an substantial improvement was found with the data selection approach for read speech, this was not the case for conversational speech. Overall, these results suggest that we cannot continue to develop methods for one speech style and expect that improvements transfer to other styles. Instead, the nature of the application data (here: read vs. conversational) should be taken into account already when defining the basic assumptions of a method (here: segmentation in phones), and not only when applying the method to the application data.",

keywords = "Articulatory–acoustic features, Conversational speech, Pronunciation variability, Segments",

author = "Barbara Schuppler",

year = "2017",

month = sep,

day = "1",

doi = "10.1007/s10772-017-9436-y",

language = "English",

volume = "20",

pages = "699--713",

journal = "International Journal of Speech Technology",

issn = "1381-2416",

publisher = "Springer Science+Business Media B.V ",

number = "3",

}

TY - JOUR

T1 - Rethinking classification results based on read speech, or: why improvements do not always transfer to other speaking styles

AU - Schuppler, Barbara

PY - 2017/9/1

Y1 - 2017/9/1

N2 - With the growing interest among speech scientists in working with natural conversations also the popularity for using articulatory–acoustic features as basic unit increased. They showed to be more suitable than purely phone-based approaches. Even though the motivation for AF classification is driven by the properties of conversational speech, most of the new methods continue to be developed on read speech corpora (e.g., TIMIT). In this paper, we show in two studies that the improvements obtained on read speech do not always transfer to conversational speech. The first study compares four different variants of acoustic parameters for AF classification of both read and conversational speech using support vector machines. Our experiments show that the proposed set of acoustic parameters substantially improves AF classification for read speech, but only marginally for conversational speech. The second study investigates whether labeling inaccuracies can be compensated for by a data selection approach. Again, although an substantial improvement was found with the data selection approach for read speech, this was not the case for conversational speech. Overall, these results suggest that we cannot continue to develop methods for one speech style and expect that improvements transfer to other styles. Instead, the nature of the application data (here: read vs. conversational) should be taken into account already when defining the basic assumptions of a method (here: segmentation in phones), and not only when applying the method to the application data.

AB - With the growing interest among speech scientists in working with natural conversations also the popularity for using articulatory–acoustic features as basic unit increased. They showed to be more suitable than purely phone-based approaches. Even though the motivation for AF classification is driven by the properties of conversational speech, most of the new methods continue to be developed on read speech corpora (e.g., TIMIT). In this paper, we show in two studies that the improvements obtained on read speech do not always transfer to conversational speech. The first study compares four different variants of acoustic parameters for AF classification of both read and conversational speech using support vector machines. Our experiments show that the proposed set of acoustic parameters substantially improves AF classification for read speech, but only marginally for conversational speech. The second study investigates whether labeling inaccuracies can be compensated for by a data selection approach. Again, although an substantial improvement was found with the data selection approach for read speech, this was not the case for conversational speech. Overall, these results suggest that we cannot continue to develop methods for one speech style and expect that improvements transfer to other styles. Instead, the nature of the application data (here: read vs. conversational) should be taken into account already when defining the basic assumptions of a method (here: segmentation in phones), and not only when applying the method to the application data.

KW - Articulatory–acoustic features

KW - Conversational speech

KW - Pronunciation variability

KW - Segments

UR - http://www.scopus.com/inward/record.url?scp=85024483236&partnerID=8YFLogxK

U2 - 10.1007/s10772-017-9436-y

DO - 10.1007/s10772-017-9436-y

M3 - Article

AN - SCOPUS:85024483236

SN - 1381-2416

VL - 20

SP - 699

EP - 713

JO - International Journal of Speech Technology

JF - International Journal of Speech Technology

IS - 3

ER -

Rethinking classification results based on read speech, or: why improvements do not always transfer to other speaking styles

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

CLCS - Cross-layer pronunciation modeling for conversational speech/Cross-layer Aussprachemodelle für Spontansprache (FWF Hertha Firnberg Program T572)

Cite this

Rethinking classification results based on read speech, or: why improvements do not always transfer to other speaking styles

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Projects

CLCS - Cross-layer pronunciation modeling for conversational speech/Cross-layer Aussprachemodelle für Spontansprache (FWF Hertha Firnberg Program T572)

Cite this