Lesley Mearns PhD thesis Towards the automatic characterisation of music using high level features Comments from Alan Marsden, external examiner Major comments 1. What evidence do you have for the claim that ‘harmony analysis is highly subjective’? My own judgement, based on teaching harmony and on analyses I have seen , is that analysts probably agree at least 95% of the time. I have re-worded this in the thesis. My own experience of music analysis is that analysts agree where there is little controversy but that opinions rapidly diverge in situations of ambiguity. Consider for example the standard analytical fight over the theme of the first movement of Mozart’s K331 - is the C# more important than the E or vice-versa? In the thesis my statement is evidenced firstly by Krumhanl’s work on tonal hierachies which compares the results of two musical experts in the perception of key strength, and by referring the reader to the variation in opinion of musical experts about the chord tones present in bar 23 of Prelude No. 1 in C Major BWV 846. I also quote the work of Downie and the SALAMI project in the field of MIR the fact that two people tend not to wholly agree on a single ground truth is highlighted as a problem for this type of work. 2. What is the function of sections 2.5 and 2.6? 2.5 has been re-worded to explain the reason for including this information. 2.6 has been removed. 3. Literature review 3.1 and 3.2. The items are described too much in isolation. Better to draw out the issues which underlie this work (common themes, problems, etc.) rather than just describe them one by one. I have rephrased/rewritten a lot of this chapter to bring out common problems. 4. ‘However’ is not a conjunction. I have been through the thesis and re-worded all of the sentences commencing with ‘However’. ************** Chapter 4 has been removed *********** 5. I am not convinced by the argument at the start of chapter 4. For one thing, I think there is more than the two problems you identify. In fact, I think those are not necessarily separate problems anyway. In many cases, once we can properly define a feature, we are most of the way to knowing how to extract it automatically (though we must remember that it is well known that there are well defined problems which are known to be uncomputable). More importantly, there is a third problem which is not identified: once we have decided on features and how we are to extract them, we need to know how to relate the features to composers. Later on you indicate that you will use WEKA to determine this relation, but there are other possible solutions, and there is not logically any perfect solution. At one extreme, a composer’s style is defined to consist of only those pieces which he/she has actually written. At another extreme, if the style is defined as anything the composer might have written, it is the universal set of music. 6. In the next paragraph of chapter 4, assertions are made about ‘a statistical approach’, but it is not clear exactly what you mean by this. Furthermore, the assertion is not well supported. What is a ‘high level characterisation’ of a class in this case? Is it the case that this particular study does not yield this or is it that no statistical approach can ever yield this? What kind of ‘new insights’ might come from ‘an approach embedded in music theory’ and why do you believe that is the case? 7. The test corpus of kern scores has different kinds of pieces for each composer (e.g., chorales for Bach, violin concertos for Vivaldi) so how will you know you are testing for, e.g., Bach style rather than chorale style? 8. P.70 The weighted interval vector is not at all clear. Are you measuring simultaneous intervals within a notegroup or successive intervals within a part? If the latter, are you taking the duration of the second note or the first note to provide the weight? If the former, you are presumably actually measuring interval classes, since you end up with a 12-element vector. Ah! Finally at the bottom of p.71 we hear that this is the vertical interval content. 9. P.71 And do you determine the movement type and dissonance preparation and resolution for every note pair in a group? And what is ‘other’ movement type? ... we discover on p.74, it is when one voice is silent, so surely it should not be included at all in the movement type: it indicates the degree to which all voices are present (if I interpret your method correctly) rather than the type of contrapuntal movement. Are tied notes treated just like repeated notes in this measurement? 10. P.75. Traditionally, a fourth is only considered dissonant if it involves the bass. Did your method follow this interpretation? 11. P.77 You acknowledge that the addition of Malmberg ranks is not perfect, but it is actually rather problematic. A first-inversion major triad has a dissonance score of 5.38, higher than that for an augmented triad at 4.65! Some evidence that it is OK for the configurations normally found in your corpus would have been reassuring. 12. P.83. Is it not possible, even likely, that jSymbolic features will give better classification results because there are more of them? 13. P.84. ‘We consider that descriptors that are musicologically apposite are better able to capture stylistic similarities between composers and musical works at a deep and more satisfactory level than statistical approaches based on superficial feature types.’ Maybe you do, but the evidence you presented (the higher classification rate using the jSymbolic features) suggests that these give better classifications, if ‘better’ means ‘more accurate’. You say ‘deep and more satisfactory’, but we need to have a definition of this. Your next point about bringing musicology and computer science (presumably) closer together is a good one, but if this is what you mean by ‘more satisfactory’, you should say so. ************** Chapter 4 has been removed *********** 14. P.93. Did you try deleting the on-beat notes which do not last the entire duration (and which might be suspensions or accented passing notes) and take that as the chord if it matches with a smaller edit distance? No I didn’t do this. 15. P.94. Who did the hand-annotation? Me – have stated this in the thesis (abstract, introduction, chapter 5, conclusions) 16. P.98. The Krumhansl results deserve closer investigation. What was the method for establishing tonal context for this data? Is it possible that her subjects are actually giving ratings for chord similarity rather than key-fittingness? I have explained the methods used to derive the data and added the comment about key fittingness v similarity and credited AM. 17. P.104. More detail is required to explain ‘These values were determined empirically’ for the sum added to the diagonal of the key transition matrices. Added a clearer description of how this was done. 18. P.107. It would have been useful to have a test with a key-transition matrix which is like the neutral one except for having a value of 2.0 on the diagonal. This would enable you to test whether the improvement is only a result of a preference for remaining in a key rather than the different values for transition to other keys. Added comment about expectation that music continues in current key. 19. P.108. How can Conc have a value greater than 1? It is a % - have made this clearer in the text and it does state this in the description of the fig. 20. P.108. It would be useful to have an indication of the significance of differences in the values in Table 5.8. You have 12 chorales in your test corpus, so 12 figures for each difference, and so could do, e.g., a Student T-test on the mean differences. Added in a table of Student TTest results comparing transcribed and MIDI data (p69). 21. P.110. The preference for major keys in the sevenths model is not explained clearly. Have added a description of the issue of normalisation in relation to chord numbers 22. P.119. Is it possible that the low difference in accuracy between the MIDI and transcribed data would disappear if the accuracy were higher , i.e., it might exist because the accuracy is generally low. It is not possible to say whether the alignment of results observed is due to the “low” accuracy of the model or whether there really is a comparison of usefulness of the two data types. (Actually the model outputs are an improvement on previous models.) I have softened our claim and suggested that future research could ascertain a better answer to this question. 23. P.119. It would have been useful to make a comparison with other harmonic-analysis software (Temperley, Maxwell, Winograd). New experimental work is outside the scope of the corrections 24. P.129. Were the Riemann analyses used systematically to verify your own annotations? Yes I referred to them for all of the excerpts that were available. There is some divergence due to the beat segment annotation, stated in thesis, also there may be some differences due to difficulties understanding Riemann’s notation. 25. P.153 and following. It would be useful to have these definitions expressed in a more formal manner, to avoid any possible ambiguity or misinterpretation. I have changed the structure and section titles to show that P153 gives the musical principles, gleaned from musicology texts, and that the formal implementation of these concepts, or their translation, computationally, is detailed in a later section. 26. P.166-7. The method seems to me to give the wrong result for the case discussed. The demisemiquaver B flat at the end of bar 8 should be assigned voice 4 (i.e., it continues to the C in the following bar after the rest). If you insist of that B flat going to A natural (which is a plausible voice-assignment), then it should be considered as joining with the voice occupied by the semibreve B flat. However, your conceptualisation of voices does not allow the possibility of joining and splitting. Corrected – it is voice 3 not 4, and improved diagram 27. P.168. Some comment on the results in Table 7.7 is warranted. It is quite striking that the results for Prelude no. 21 is distinctly lower than for the others. What went wrong in this case? In fact the voice-assignment seems generally very good, but it goes wrong in particular cases: besides no. 21, one prelude has a value of about 96% and four others values somewhat over 97%. This looks like a good case for examining failures in detail. Additional paragraph of explanation and supporting figure added. 28. P.170. ‘Determined empirically’ again. Be precise. Corrected and explained more clearly. 29. P.173-5. The interval-counting procedure is difficult to follow. Why not use pseudocode? Furthermore, I cannot make sense of the interval score vectors in Table 7.9. This method of identifying passing notes needs motivating. It is not self-evident that this is a good way of going about it. Why try this method? I suspect a simple improvement would be to weight figures in the interval vector by duration. Frankly, I would recommend cutting this section. This is now motivated and linked back to the opening discussion about the harmonic essential/inessential discussion at the beginning of the chpr. I can’t cut this – it’s a core part of the method, and the results would significantly deteriorate if I did – I know this because it was a late addition to improve the accuracy of passing note classification. I have further explained the method and added a table to show how the intervals are calculated, and how these intervals and then represented in a 0-11 vector. I have also more clearly shown how the count of triadic intervals is done. 30. P.179-80. What is the basis for the assumption that more ‘important’ notes are more likely to be harmonic ones? Added the basis for the hypothesis to this section. 31. P.187. How are the Note-Importance and Tertian-Arrangement scores combined? Is it simple addition? Yes it is simple addition – now states this in the thesis. 32. P.191. The account of parameter optimisation is not clear. If a complete ‘grid search’ was not performed, on what basis did you determine that ‘the best overall average score is arrived at’? Gave a more detailed account of what was done and assumption of params being independent 33. P.210. You need to define what ‘closest intersection’ means. Giving an example is not sufficient. Added definition and improved description of intersection and sym_diff 34. P.212. Do you have any guess for the expected best target accuracy for an automatic chordrecognition system? You have correctly acknowledged all through that human annotators will recognise different chords in some situations, so it should be possible to establish an upper limit on the accuracy we can expect from an automatic annotation system, if accuracy is measured by comparison with a human annotation. Basically, you would have to get lots of human annotations and measure the level of deviation among them. I have not put in a guess as that is all it would be – a guess. I have commented that having such an upper limit of accuracy would be useful and explained that there is no data providing this. 35. P.219. You claim that the fact that the results for key-finding from the transcribed data are as good as the ones from MIDI is an indication that musicological research from audio rather than score is realistic. As I have indicated above (22), I am not at all convinced by this because I do not see the results to be as ‘good’ (not a term you use, but your implication) but rather as ‘bad’, and I do not expect that a mechanism which found key better for the MIDI data would do so as well for the transcribed data. I think this claim should be removed from here. I have reworded this to remove this implication, and suggesting that there is hope for the future that transcription will become a valid and useful data format with which to do musical research. Minor comments 1. Pagination. It is usual to begin pagination with page 1 at the beginning of chapter 1. Pagination of front matter then generally uses lower-case Greek numerals (i, ii, ...) excluding the title page. corrected 2. P.21 para. 3 last line, delete ‘are’. corrected 3. P.30 middle. All quotations need a page number in the reference. done [double check] 4. P.30 bottom quarter. The cantus firmus is actually most commonly in the tenor in real compositions rather than exercises. corrected 5. P.31. Should not the rules for third and fourth species mention that resolution of dissonances must be by step. corrected 6. P.33 middle. Delete ‘thought’ or ‘said’. done 7. P.34 line 3. Insert ‘are’. done 8. P.38 para 2 line 1. Delete ‘the’. done 9. P.42 beginning. Are there some words missing between pages 41 & 42? At the moment it does not make sense. Or should ‘of’ at the beginning be ‘to’? Yes – changed of to to 10. P.42 bottom. Apostrophe and e acute needed in Traité de l’harmonie. done 11. P.43 line 3. No apostrophe in ‘continue’s’. done 12. P.43 para 1 last line. No apostrophe in ‘era’s’. done 13. P.44 bottom. ‘Secondary dominants’ usually has a different and more precise meaning than the one you give it. A secondary dominant is a chord function not a key. It is always a dominant function in a key other than the tonic. So ‘IV of IV’ is not a secondary dominant, but it might be described as a secondary subdominant (a term I have never seen in use). I’ve cited the pages in Piston that I took my description from. 14. P.48 para 2 line 1. Insert ‘of’. done 15. P.48 bottom quarter. The normal form of [0,3,2] is actually [0,1,2] because Forte considers transposition and inversional equivalence also. done 16. P.49. Page reference required for the quotation from GTTM. done 17. P.49 middle. Delete ‘have been implemented’ done 18. P.50 middle. ‘Stamford’ should be ‘Stanford’. done 19. P.54 bottom quarter. ‘MIDI’ or ‘Midi’; be consistent. done 20. P.55 para 2 line 2. Delete comma after intervals. done 21. P.55 para 2 last line. Add ‘and’. done 22. P.60 line 3. Delete ‘one’. done 23. P.61 last para. Delete ‘Pardo and Birmingham’. done 24. P.64 penultimate line. ‘Cathe’ or ‘Cathé’? done 25. P.66 penultimate para penultimate line. Parentheses or commas, not both. done 26. P.68 para 2 last line. ‘Witten and E.’ should be ‘Witten and Frank’. n/a 27. P.74. You often have “‘Other”, which makes no orthographic sense. Why an opening inverted comma and no closing one? Why a capital ‘O’? n/a 28. P.75 Figure 4.5. Use the names of composer rather than letter codes. The labels at the top could have the text rotated to be vertical. n/a 29. P.83 line 1. There seems to be a stray ‘Figure 4.8’ at the end of this line. We could expect to see a reference to the figure at this point, and a paragraph break before the next sentence. n/a 30. P.85 Table 4.4. Make the heading of the last column ‘Mean Consonance Value’. ‘Average’ can mean ‘median’ or ‘mean’ (or ‘mode’, for that matter). n/a 31. P.86 line -3. Delete ‘of’. done 32. P.86-119. Several times here the authors name appears outside the brackets in references. corrected 33. P.91 one third down. Replace ‘a a’ by ‘in a’. done 34. P.93 line -5. Replace ‘on’ by ‘one’. corrected 35. P.94 para 1. Replace ‘the which’ by ‘which’. done 36. P.112 line 2. What are the ‘four ground truth outputs’? Do you mean instead the output of four of the HMMs? corrected 37. P.112 top quarter. How can a G and F be a semitone apart? done P.112 top third. Change ‘difference models’ to ‘different models’. done 38. P.114 middle. Should be ‘tierce de picardie’. done 39. P.122 middle. There is only one version of equal temperament. In equal temperament the semitones are all equal (not approximately equal). There are other temperaments which exist in different versions (well temperaments, mean temperaments, etc.), but they are not equal temperaments. done 40. P.123 Figure 6.1. The upper stave in bar 31 should have E and G also, since elsewhere you have added implied notes (e.g., in the last bar) which are not explicitly specified in the figures. Actually think the mistake is to include the F – think it should just be B and D to make a triad – figure corrected 41. P.184 para 2. Please write more precisely. From what you say about metrical scores in 7.3.5, the score for a note’s metrical position can never be less than 0, so it is inaccurate to say ‘the negative values assigned to the inessential notes due to their position in the metrical hierarchy’. If the notes have negative values, it is due to the score from Note Features. The metrical position only contributes in the sense that it is not great enough to make the sum positive. (In fact, it never could be, since the maximum metrical-position score is 1.) Yes this is an error - corrected 42. P.186 last 2 lines. A tritone is 6 semitones, not 8. 8 semitones is a minor sixth or augmented fifth. If your notes are arranged as a stack of thirds with some possibly missing, an interval of 8 semitones will be an augmented fifth, and on those grounds you would be right to ignore it because no commonly used higher-order chords have an augmented triad as their basis. corrected 43. P.192 & 194. The graphs in Figures 7.16 and 7.17 would be better if they put the preludes in the same order, so that we could compare the number of inessential notes recognised to the accuracy of the harmony-recognition. Re-ordered graphs by prelude and re-imported figures 44. P.195 middle. Why say ‘20% higher than 50%’ rather than 70%? Or do you mean 60% (50% + 20% of 50%)? corrected 45. P.195 middle. When you say ‘correlation of 0.6’, do you mean that the Pearson coefficient of correlation is 0.6? If so, say so. done 46. P.200 bottom quarter. The average is not shown in the final column of the graph but, I assume, is the one labelled ‘t’ in the middle (where we would expect the average to be in a set put in order!). done 47. P.205. Figure 7.23 would be better as a scattergraph, and quotation of a coefficient of correlation, rather than just asserting in the text (p.204) that ‘an unambiguous correlation is not evident’. Converted figure to scattergraph 48. P.206 top quarter. The example for G7 suggests that what you describe as ‘tonic weight’ might be better called ‘root weight’. If I am wrong, put something in to explain better. corrected 49. P.207 para 2. I cannot understand this example. Where have the pitches classes 5, 3 and 7 come from? corrected 50. P.208 last para. Delete ‘using’. done 51. P.213 bottom of middle para. Say ‘60% of the input sets with non-chord tones’. done 52. P.213 last para. Delete ‘contour’: this factor was not found to be effective and so is not used. done 53. P.214 top quarter. Delete one of the repeated ‘overall’s. done 54. P.214 top third. Delete ‘the’ before ‘either’. done 55. P.214 bottom quarter. Change ‘drop of’ to ‘drop off’. done 56. P.224 Figure 8.1. This figure needs to be reproduced larger, with the two blocks of music notation rearranged to be one above the other. It is unreadable at its current size. Done 57. References. References to Grove Music Online should state the article title, not just the authors. corrected 58. Killian & Hoos. ‘IRCAM, 2002’ is not an adequate reference. corrected