Lesley Mearns PhD thesis Towards the automatic characterisation of

advertisement
Lesley Mearns PhD thesis Towards the automatic characterisation of music using high level
features
Comments from Alan Marsden, external examiner
Major comments
1. What evidence do you have for the claim that ‘harmony analysis is highly subjective’? My own
judgement, based on teaching harmony and on analyses I have seen , is that analysts probably
agree at least 95% of the time.
I have re-worded this in the thesis. My own experience of music analysis is that analysts agree
where there is little controversy but that opinions rapidly diverge in situations of ambiguity.
Consider for example the standard analytical fight over the theme of the first movement of
Mozart’s K331 - is the C# more important than the E or vice-versa?
In the thesis my statement is evidenced firstly by Krumhanl’s work on tonal hierachies which
compares the results of two musical experts in the perception of key strength, and by referring
the reader to the variation in opinion of musical experts about the chord tones present in bar 23
of Prelude No. 1 in C Major BWV 846. I also quote the work of Downie and the SALAMI project in the field of MIR the fact that two people tend not to wholly agree on a single ground truth is
highlighted as a problem for this type of work.
2. What is the function of sections 2.5 and 2.6?
2.5 has been re-worded to explain the reason for including this information.
2.6 has been removed.
3. Literature review 3.1 and 3.2. The items are described too much in isolation. Better to draw out
the issues which underlie this work (common themes, problems, etc.) rather than just describe
them one by one.
I have rephrased/rewritten a lot of this chapter to bring out common problems.
4. ‘However’ is not a conjunction.
I have been through the thesis and re-worded all of the sentences commencing with ‘However’.
************** Chapter 4 has been removed ***********
5. I am not convinced by the argument at the start of chapter 4. For one thing, I think there is more
than the two problems you identify. In fact, I think those are not necessarily separate problems
anyway. In many cases, once we can properly define a feature, we are most of the way to
knowing how to extract it automatically (though we must remember that it is well known that
there are well defined problems which are known to be uncomputable). More importantly, there
is a third problem which is not identified: once we have decided on features and how we are to
extract them, we need to know how to relate the features to composers. Later on you indicate
that you will use WEKA to determine this relation, but there are other possible solutions, and
there is not logically any perfect solution. At one extreme, a composer’s style is defined to consist
of only those pieces which he/she has actually written. At another extreme, if the style is defined
as anything the composer might have written, it is the universal set of music.
6. In the next paragraph of chapter 4, assertions are made about ‘a statistical approach’, but it is
not clear exactly what you mean by this. Furthermore, the assertion is not well supported. What
is a ‘high level characterisation’ of a class in this case? Is it the case that this particular study
does not yield this or is it that no statistical approach can ever yield this? What kind of ‘new
insights’ might come from ‘an approach embedded in music theory’ and why do you believe that
is the case?
7. The test corpus of kern scores has different kinds of pieces for each composer (e.g., chorales for
Bach, violin concertos for Vivaldi) so how will you know you are testing for, e.g., Bach style
rather than chorale style?
8. P.70 The weighted interval vector is not at all clear. Are you measuring simultaneous intervals
within a notegroup or successive intervals within a part? If the latter, are you taking the duration
of the second note or the first note to provide the weight? If the former, you are presumably
actually measuring interval classes, since you end up with a 12-element vector. Ah! Finally at the
bottom of p.71 we hear that this is the vertical interval content.
9. P.71 And do you determine the movement type and dissonance preparation and resolution for
every note pair in a group? And what is ‘other’ movement type? ... we discover on p.74, it is when
one voice is silent, so surely it should not be included at all in the movement type: it indicates the
degree to which all voices are present (if I interpret your method correctly) rather than the type
of contrapuntal movement. Are tied notes treated just like repeated notes in this measurement?
10. P.75. Traditionally, a fourth is only considered dissonant if it involves the bass. Did your method
follow this interpretation?
11. P.77 You acknowledge that the addition of Malmberg ranks is not perfect, but it is actually rather
problematic. A first-inversion major triad has a dissonance score of 5.38, higher than that for an
augmented triad at 4.65! Some evidence that it is OK for the configurations normally found in
your corpus would have been reassuring.
12. P.83. Is it not possible, even likely, that jSymbolic features will give better classification results
because there are more of them?
13. P.84. ‘We consider that descriptors that are musicologically apposite are better able to capture
stylistic similarities between composers and musical works at a deep and more satisfactory level
than statistical approaches based on superficial feature types.’ Maybe you do, but the evidence
you presented (the higher classification rate using the jSymbolic features) suggests that these
give better classifications, if ‘better’ means ‘more accurate’. You say ‘deep and more
satisfactory’, but we need to have a definition of this. Your next point about bringing musicology
and computer science (presumably) closer together is a good one, but if this is what you mean by
‘more satisfactory’, you should say so.
************** Chapter 4 has been removed ***********
14. P.93. Did you try deleting the on-beat notes which do not last the entire duration (and which
might be suspensions or accented passing notes) and take that as the chord if it matches with a
smaller edit distance?
No I didn’t do this.
15. P.94. Who did the hand-annotation?
Me – have stated this in the thesis (abstract, introduction, chapter 5, conclusions)
16. P.98. The Krumhansl results deserve closer investigation. What was the method for establishing
tonal context for this data? Is it possible that her subjects are actually giving ratings for chord
similarity rather than key-fittingness?
I have explained the methods used to derive the data and added the comment about key
fittingness v similarity and credited AM.
17. P.104. More detail is required to explain ‘These values were determined empirically’ for the sum
added to the diagonal of the key transition matrices.
Added a clearer description of how this was done.
18. P.107. It would have been useful to have a test with a key-transition matrix which is like the
neutral one except for having a value of 2.0 on the diagonal. This would enable you to test
whether the improvement is only a result of a preference for remaining in a key rather than the
different values for transition to other keys.
Added comment about expectation that music continues in current key.
19. P.108. How can Conc have a value greater than 1?
It is a % - have made this clearer in the text and it does state this in the description of the fig.
20. P.108. It would be useful to have an indication of the significance of differences in the values in
Table 5.8. You have 12 chorales in your test corpus, so 12 figures for each difference, and so
could do, e.g., a Student T-test on the mean differences.
Added in a table of Student TTest results comparing transcribed and MIDI data (p69).
21. P.110. The preference for major keys in the sevenths model is not explained clearly.
Have added a description of the issue of normalisation in relation to chord numbers
22. P.119. Is it possible that the low difference in accuracy between the MIDI and transcribed data
would disappear if the accuracy were higher , i.e., it might exist because the accuracy is generally
low.
It is not possible to say whether the alignment of results observed is due to the “low” accuracy of
the model or whether there really is a comparison of usefulness of the two data types. (Actually
the model outputs are an improvement on previous models.) I have softened our claim and
suggested that future research could ascertain a better answer to this question.
23. P.119. It would have been useful to make a comparison with other harmonic-analysis software
(Temperley, Maxwell, Winograd).
New experimental work is outside the scope of the corrections
24. P.129. Were the Riemann analyses used systematically to verify your own annotations?
Yes I referred to them for all of the excerpts that were available. There is some divergence due to
the beat segment annotation, stated in thesis, also there may be some differences due to
difficulties understanding Riemann’s notation.
25. P.153 and following. It would be useful to have these definitions expressed in a more formal
manner, to avoid any possible ambiguity or misinterpretation.
I have changed the structure and section titles to show that P153 gives the musical principles,
gleaned from musicology texts, and that the formal implementation of these concepts, or their
translation, computationally, is detailed in a later section.
26. P.166-7. The method seems to me to give the wrong result for the case discussed. The
demisemiquaver B flat at the end of bar 8 should be assigned voice 4 (i.e., it continues to the C in
the following bar after the rest). If you insist of that B flat going to A natural (which is a plausible
voice-assignment), then it should be considered as joining with the voice occupied by the
semibreve B flat. However, your conceptualisation of voices does not allow the possibility of
joining and splitting.
Corrected – it is voice 3 not 4, and improved diagram
27. P.168. Some comment on the results in Table 7.7 is warranted. It is quite striking that the results
for Prelude no. 21 is distinctly lower than for the others. What went wrong in this case? In fact
the voice-assignment seems generally very good, but it goes wrong in particular cases: besides
no. 21, one prelude has a value of about 96% and four others values somewhat over 97%. This
looks like a good case for examining failures in detail.
Additional paragraph of explanation and supporting figure added.
28. P.170. ‘Determined empirically’ again. Be precise.
Corrected and explained more clearly.
29. P.173-5. The interval-counting procedure is difficult to follow. Why not use pseudocode?
Furthermore, I cannot make sense of the interval score vectors in Table 7.9. This method of
identifying passing notes needs motivating. It is not self-evident that this is a good way of going
about it. Why try this method? I suspect a simple improvement would be to weight figures in the
interval vector by duration. Frankly, I would recommend cutting this section.
This is now motivated and linked back to the opening discussion about the harmonic
essential/inessential discussion at the beginning of the chpr. I can’t cut this – it’s a core part of
the method, and the results would significantly deteriorate if I did – I know this because it was a
late addition to improve the accuracy of passing note classification.
I have further explained the method and added a table to show how the intervals are calculated,
and how these intervals and then represented in a 0-11 vector. I have also more clearly shown
how the count of triadic intervals is done.
30. P.179-80. What is the basis for the assumption that more ‘important’ notes are more likely to be
harmonic ones?
Added the basis for the hypothesis to this section.
31. P.187. How are the Note-Importance and Tertian-Arrangement scores combined? Is it simple
addition?
Yes it is simple addition – now states this in the thesis.
32. P.191. The account of parameter optimisation is not clear. If a complete ‘grid search’ was not
performed, on what basis did you determine that ‘the best overall average score is arrived at’?
Gave a more detailed account of what was done and assumption of params being independent
33. P.210. You need to define what ‘closest intersection’ means. Giving an example is not sufficient.
Added definition and improved description of intersection and sym_diff
34. P.212. Do you have any guess for the expected best target accuracy for an automatic chordrecognition system? You have correctly acknowledged all through that human annotators will
recognise different chords in some situations, so it should be possible to establish an upper limit
on the accuracy we can expect from an automatic annotation system, if accuracy is measured by
comparison with a human annotation. Basically, you would have to get lots of human
annotations and measure the level of deviation among them.
I have not put in a guess as that is all it would be – a guess. I have commented that having such
an upper limit of accuracy would be useful and explained that there is no data providing this.
35. P.219. You claim that the fact that the results for key-finding from the transcribed data are as
good as the ones from MIDI is an indication that musicological research from audio rather than
score is realistic. As I have indicated above (22), I am not at all convinced by this because I do not
see the results to be as ‘good’ (not a term you use, but your implication) but rather as ‘bad’, and
I do not expect that a mechanism which found key better for the MIDI data would do so as well
for the transcribed data. I think this claim should be removed from here.
I have reworded this to remove this implication, and suggesting that there is hope for the future
that transcription will become a valid and useful data format with which to do musical research.
Minor comments
1. Pagination. It is usual to begin pagination with page 1 at the beginning of chapter 1. Pagination
of front matter then generally uses lower-case Greek numerals (i, ii, ...) excluding the title page.
corrected
2. P.21 para. 3 last line, delete ‘are’.
corrected
3. P.30 middle. All quotations need a page number in the reference.
done
[double check]
4. P.30 bottom quarter. The cantus firmus is actually most commonly in the tenor in real
compositions rather than exercises.
corrected
5. P.31. Should not the rules for third and fourth species mention that resolution of dissonances
must be by step.
corrected
6. P.33 middle. Delete ‘thought’ or ‘said’.
done
7. P.34 line 3. Insert ‘are’.
done
8. P.38 para 2 line 1. Delete ‘the’.
done
9. P.42 beginning. Are there some words missing between pages 41 & 42? At the moment it does
not make sense. Or should ‘of’ at the beginning be ‘to’?
Yes – changed of to to
10. P.42 bottom. Apostrophe and e acute needed in Traité de l’harmonie.
done
11. P.43 line 3. No apostrophe in ‘continue’s’.
done
12. P.43 para 1 last line. No apostrophe in ‘era’s’.
done
13. P.44 bottom. ‘Secondary dominants’ usually has a different and more precise meaning than the
one you give it. A secondary dominant is a chord function not a key. It is always a dominant
function in a key other than the tonic. So ‘IV of IV’ is not a secondary dominant, but it might be
described as a secondary subdominant (a term I have never seen in use).
I’ve cited the pages in Piston that I took my description from.
14. P.48 para 2 line 1. Insert ‘of’.
done
15. P.48 bottom quarter. The normal form of [0,3,2] is actually [0,1,2] because Forte considers
transposition and inversional equivalence also.
done
16. P.49. Page reference required for the quotation from GTTM.
done
17. P.49 middle. Delete ‘have been implemented’
done
18. P.50 middle. ‘Stamford’ should be ‘Stanford’.
done
19. P.54 bottom quarter. ‘MIDI’ or ‘Midi’; be consistent.
done
20. P.55 para 2 line 2. Delete comma after intervals.
done
21. P.55 para 2 last line. Add ‘and’.
done
22. P.60 line 3. Delete ‘one’.
done
23. P.61 last para. Delete ‘Pardo and Birmingham’.
done
24. P.64 penultimate line. ‘Cathe’ or ‘Cathé’?
done
25. P.66 penultimate para penultimate line. Parentheses or commas, not both.
done
26. P.68 para 2 last line. ‘Witten and E.’ should be ‘Witten and Frank’.
n/a
27. P.74. You often have “‘Other”, which makes no orthographic sense. Why an opening inverted
comma and no closing one? Why a capital ‘O’?
n/a
28. P.75 Figure 4.5. Use the names of composer rather than letter codes. The labels at the top could
have the text rotated to be vertical.
n/a
29. P.83 line 1. There seems to be a stray ‘Figure 4.8’ at the end of this line. We could expect to see a
reference to the figure at this point, and a paragraph break before the next sentence.
n/a
30. P.85 Table 4.4. Make the heading of the last column ‘Mean Consonance Value’. ‘Average’ can
mean ‘median’ or ‘mean’ (or ‘mode’, for that matter).
n/a
31. P.86 line -3. Delete ‘of’.
done
32. P.86-119. Several times here the authors name appears outside the brackets in references.
corrected
33. P.91 one third down. Replace ‘a a’ by ‘in a’.
done
34. P.93 line -5. Replace ‘on’ by ‘one’.
corrected
35. P.94 para 1. Replace ‘the which’ by ‘which’.
done
36. P.112 line 2. What are the ‘four ground truth outputs’? Do you mean instead the output of four
of the HMMs?
corrected
37. P.112 top quarter. How can a G and F be a semitone apart?
done
P.112 top third. Change ‘difference models’ to ‘different models’.
done
38. P.114 middle. Should be ‘tierce de picardie’.
done
39. P.122 middle. There is only one version of equal temperament. In equal temperament the
semitones are all equal (not approximately equal). There are other temperaments which exist in
different versions (well temperaments, mean temperaments, etc.), but they are not equal
temperaments.
done
40. P.123 Figure 6.1. The upper stave in bar 31 should have E and G also, since elsewhere you have
added implied notes (e.g., in the last bar) which are not explicitly specified in the figures.
Actually think the mistake is to include the F – think it should just be B and D to make a triad –
figure corrected
41. P.184 para 2. Please write more precisely. From what you say about metrical scores in 7.3.5, the
score for a note’s metrical position can never be less than 0, so it is inaccurate to say ‘the
negative values assigned to the inessential notes due to their position in the metrical hierarchy’.
If the notes have negative values, it is due to the score from Note Features. The metrical position
only contributes in the sense that it is not great enough to make the sum positive. (In fact, it
never could be, since the maximum metrical-position score is 1.)
Yes this is an error - corrected
42. P.186 last 2 lines. A tritone is 6 semitones, not 8. 8 semitones is a minor sixth or augmented fifth.
If your notes are arranged as a stack of thirds with some possibly missing, an interval of 8
semitones will be an augmented fifth, and on those grounds you would be right to ignore it
because no commonly used higher-order chords have an augmented triad as their basis.
corrected
43. P.192 & 194. The graphs in Figures 7.16 and 7.17 would be better if they put the preludes in the
same order, so that we could compare the number of inessential notes recognised to the
accuracy of the harmony-recognition.
Re-ordered graphs by prelude and re-imported figures
44. P.195 middle. Why say ‘20% higher than 50%’ rather than 70%? Or do you mean 60% (50% +
20% of 50%)?
corrected
45. P.195 middle. When you say ‘correlation of 0.6’, do you mean that the Pearson coefficient of
correlation is 0.6? If so, say so.
done
46. P.200 bottom quarter. The average is not shown in the final column of the graph but, I assume,
is the one labelled ‘t’ in the middle (where we would expect the average to be in a set put in
order!).
done
47. P.205. Figure 7.23 would be better as a scattergraph, and quotation of a coefficient of
correlation, rather than just asserting in the text (p.204) that ‘an unambiguous correlation is not
evident’.
Converted figure to scattergraph
48. P.206 top quarter. The example for G7 suggests that what you describe as ‘tonic weight’ might
be better called ‘root weight’. If I am wrong, put something in to explain better.
corrected
49. P.207 para 2. I cannot understand this example. Where have the pitches classes 5, 3 and 7 come
from?
corrected
50. P.208 last para. Delete ‘using’.
done
51. P.213 bottom of middle para. Say ‘60% of the input sets with non-chord tones’.
done
52. P.213 last para. Delete ‘contour’: this factor was not found to be effective and so is not used.
done
53. P.214 top quarter. Delete one of the repeated ‘overall’s.
done
54. P.214 top third. Delete ‘the’ before ‘either’.
done
55. P.214 bottom quarter. Change ‘drop of’ to ‘drop off’.
done
56. P.224 Figure 8.1. This figure needs to be reproduced larger, with the two blocks of music
notation rearranged to be one above the other. It is unreadable at its current size.
Done
57. References. References to Grove Music Online should state the article title, not just the authors.
corrected
58. Killian & Hoos. ‘IRCAM, 2002’ is not an adequate reference.
corrected
Download