XIII. SPEECH ANALYSIS Prof. M. Halle A. Cohen

advertisement
XIII.
SPEECH ANALYSIS
Prof. M. Halle
A. Cohen
G. W. Hughes
J.-P.
A. Radley
A.
INVESTIGATIONS
1.
Spectral Measurements
OF FRICATIVE CONSONANTS
The aim of the work reported here was to establish criteria by means of which it
would be possible to decide whether a fricative consonant belonged to one of the following
classes:
/f/
or /v/; /s/
or /z/; /f/
or /3 /.
(When talking of these classes in what
follows, we shall designate them by the voiceless member; thus /f/
will mean both /f/
and /v/. )
Available evidence indicates that the desired information is contained in the spectrum
of the sound and that relatively little is contributed by the behavior of the formants in the
adjacent vowels.
(See the experiment reported below.) The measurements were made
on 50-msec gated central portions of fricative
consonants in about 200 English words
spoken by four subjects (2 females and 2 males).
For purposes of filtering we used the set of low- and highpass filters with the very
steep skirts which was described in the Quarterly Progress Report of April 15, 1955.
The particular bands to be measured were selected after an examination of a large
number of detailed energy spectra of fricative
consonants.
To eliminate the voicing
component, all measurements were made with a 720-cps highpass filter.
measurements were actually made.
1.
The difference (in
2.
The difference (in
The following
db) between the total energy in the sound (720 cps to 10, 000
cps) and that in the band from 4200 cps to 10, 000 cps.
db) between the energy in the band from 720 cps to 6500 cps
and the energy in the band from 720 cps to 2170 cps.
3.
The frequency location of the maximum in the spectra between 1500 cps and 4000
4.
The difference
cps.
(in
db) between the energy in a 500-cps band centered about the
maximum (located in measurement 3) and that in the band from 720 cps to 1370 cps.
Measurement
1 indicates the presence or absence of a peak above 4 kc/sec.
the exception of samples from male speaker E,
Several /s/
all /s/
had peaks above 4 kc/sec.
of speaker E had peaks in the region between 3 and 4 kc/sec.
peaks in that region.
Some /f/
also had peaks above 4 kc/sec.
it was large, this would indicate that the sound was either /f/
either /s/
or /f/.
Small values for measurement
66-
or
No
/j/
had
Thus, if measurement
1 was small, this could be taken as an indication that the sound was either /s/
Sounds giving small values for measurement
With
or /f/;
if
/f/.
1 could be uniquely determined as being
2, reflecting a concentration of energy
(XIII.
SPEECH ANALYSIS)
in the frequency region below 2 kc/sec, were always /f/; large values, /s/.
Sounds giving a large value for measurement 1 could be uniquely determined as being
Large positive values for measurement 4, reflecting the presence of
either /f/ or /f /.
a strong maximum in the region between 1500 cps and 4000 cps, were always produced
by /f /; small positive or negative values were produced by /f/.
The difficulty with speaker E resulted from the fact that most of his spectra for /s/
were shifted downward by about 1 kc/sec (if compared with /s/ of other speakers), often
placing them in the region between 3 kc/sec and 4 kc/sec. If this fact is taken into consideration and measurement 1 is adjusted to compare the total energy in the sound with
that in the region above 3 kc/sec (instead of 4 kc/sec as for the remaining speakers),
then the data can be made to fit all the other criteria. That his speech is somewhat
at variance with that of the other speakers can be seen in the results of the perceptual
tests discussed in the following section.
M. Halle,
B.
PERCEPTUAL TEST
1.
Purposes and Procedure
G. W. Hughes
An experiment was set up to determine the identifiability of gated portions of voiceless fricatives and to establish correlations between certain spectral properties of the
sound presented and its identification by a group of listeners.
msec long, of 46 /s/,
32 /f/, and 22
/f/,
Gated portions, each 50
taken from isolated words spoken by 1 female
and 3 male speakers were recorded twice in random order on a test tape. This tape,
containing 200 samples in all, was played through a sound system having a reasonably
flat response up to 10 kc/sec and presented to a group of 10 listeners under conditions
of high S/N ratio, with instructions to identify every sample as one of the three fricatives /f/,
/s/, or
/f /.
The samples were spaced at 6-sec intervals with a 45-sec pause
after each group of 25.
2.
Learning
Since each stimulus was presented twice, the change in agreement among listeners
from the first presentation to the second presentation was taken as a rough indication of
the learning that had gone on in the interim.
In 51 out of 100 cases there was an increase
in agreement; in 23 cases there was a decrease in agreement, while in 26 cases there
was no change; thus we conclude that a certain amount of learning took place.
3.
Identification
The following table shows the degree of agreement between the judgments of the
listeners and the intention of the speaker.
-67-
(XIII.
SPEECH ANALYSIS)
In 26 cases
10 (all) listeners agreed with speaker
22
9
41
8
30
7
37
6
156
6 or more
In the remaining 44 instances there were 16 cases in which 6 or more listeners agreed
on the identity of a sound, but disagreed with the speaker.
We conclude from this that it is possible to obtain correct identifications of fricative
consonants in isolation.
4. Speakers
Table XIII-1 presents the responses of the listeners to the different phonemes uttered
by the speakers.
Table XIII-1
Stimuli
Speaker E
f
f
s
107
23
13
17
143
156
280
s
Responses
Total
Speaker T
Speaker R
Speaker H
f
f
s
112
26
22
7
13
111
11
71
75
160
100
200
f
f
s
184
52
22
6
26
97
14
61
10
180
80
220
Table XIII-i shows that with the exception of /s/
f
f
175
27
13
29
29
109
8
23
89
16
4
99
160
140
220
140
120
uttered by speaker E the great major-
ity of the judgments agreed with the speakers as to the identity of the phonemes.
The
reasons for the deviations in the case of speaker E are advanced in the first part of
this report.
/s/
this report).
is
5.
Identical reasons hold for the relatively high number of
/ f/
judgments for
uttered by speaker R (who was not among the speakers studied in the first part of
The high number of /s/
judgments for /f/
uttered by speaker T (female)
due to the presence of peaks in the region above 4 kc/sec in many of T's /f/.
Physical Properties of Stimuli
In this section we shall discuss the listeners'
properties of the stimuli as' represented
responses in terms of the physical
by the four measurements described in the
first part of this report.
-68-
(XIII.
a.
SPEECH ANALYSIS)
Measurement 1
Responses
Value of measurement 1
/f
/
Total
/f/
/s/
0
102
216
22
340
0.1-1.9
169
343
88
600
2.0-3.9
195
85
100
380
4.0-5.9
67
52
161
280
6.0-7.9
40
45
115
200
>, 8.0
28
42
130
200
It appears from this table that the percentage of
increase in measurement
/f/
responses increases with an
1, whereas the percentage of /s/
responses decreases.
The
responses increases initially, and after reaching a maximum value
The interpretation of the data is quite simple if
between 2. 0 and 3.9, decreases.
we recall that low values of measurement 1 show a concentration of energy in frequencies above 4 kc/sec, while high values show a concentration of energy below
percentage of /f/
4 kc/sec.
b.
Measurement 2
Responses
Value of measurement 2
< 5
/s/
/f//
Total
401
134
185
720
10
102
59
79
240
15
17
87
176
280
> 15 < 20
26
195
139
360
> 20
46
314
40
400
>55
> 10,
Predominance of energy in the low-frequency
measurement
/f/
2; /f/
region is indicated by small values of
was the most frequent response for stimuli where the energy was
concentrated in the area below 2000 cps.
For high values of measurement 2, where,
in other words, there was a great amount of energy in the high-frequency region, the
most frequent response was /s/.
For intermediate values of measurement 2,
the most frequent response.
-69-
///
was
(XIII.
c.
SPEECH ANALYSIS)
Measurement
3
Peak frequency
Responses
/f/
/f/
/s/
Total
> 1000 < 1500
180
44
56
280
> 1500 < 2000
35
19
46
100
> 2000 < 3000
19
30
151
200
> 3000 < 4000
38
117
225
380
As was to be expected from measurement 2, peak frequencies in the low region would
lead to frequent /f/
responses.
responses and peaks in intermediate
The absence of a large number of /s/
regions would lead to /f/
responses results from the fact that
peaks above 4 kc/sec were not considered in this tabulation (see discussion of measurement
d.
1).
Measurement 4
Value of measurement 4
Responses
/f/
< 0
> 0
10
> 10
20
> 20
/s/
/ f/
Total
178
39
23
240
53
32
95
180
15
46
159
220
26
93
201
320
Again we see that the predominance of energy in the low-frequency region favors /f/
judgment. A strong concentration of energy round the peak frequency region, indicated
by high values for this measurement, gives rise to a majority of
/f / judgments.
A.
-70-
Cohen
Download