The MUSINET Project: A computer program that identifies musical style

The MUSINET Project: A computer program that identifies musical style

Christopher D. Green & Matthew Schmider

Department of Psychology

York University

Toronto, Ontario

Paper presented at the 1997 convention of the
American Psychological Association, Chicago, Illinois.

Summary

Connectionist models have been built to simulate a wide array of cognitive processes. Connectionist modeling of aesthetic cognition, however, has been relatively rare. Because aesthetic cognition is one of the most complex and least well-defined of cognitive activities, it would seem to be an excellent testing ground of connectionist aspirations in cognitive science.

As one example of human aesthetic abilities, humans seem to be able to learn to discriminate between musical styles on the basis of mere exposure to examples. In relatively short order the differences between, say, the works of Bach, Mozart, and Beethoven are easily recognized, and can even be correctly identified in works that have never previously been heard.

The present research was an attempt to model this cognitive capacity in a connectionist system using the error-backpropagation technique. The focus was on Western music from the Baroque, Classical, and Romantic periods, written in the sonata form. Ten pieces of music were selected from a variety of composers in each of the three periods. Each work was reduced to a fragment, consisting of just the melody and bass line of the first 16 bars. The works were also transposed into a common key. Dynamics, ornamentation, tempo, instrumentation, and other musical features were not encoded.

A four layer backpropagation network was used. The input layer consisted of four banks 256 units each. In the four banks were encoded the melody-pitch, melody-rhythm, bassline-pitch, and bassline-rhythm, respectively, of the work being processed. The first hidden layer consisted of 40 units. The second hidden layer consisted of 20 units. The output layer consisted of three units, corresponding to the Baroque, Classical, and Romantic periods. There was complete upward connectivity among units. The activation function employed was the hyperbolic tangent of the sum of the weighted activations of the incoming units. This function compresses the activity levels of the units to a range of -1.0 to +1.0. The learning function used was the generalized delta rule. The aim of the project was (1) to train the network learn to respond with a correct unit in the output layer when presented with each of the works in the input layer, and (2) after the training phase, to be able to correctly generalize its knowledge by identifying the period of origin of works to which it had never previously been exposed.

Eight of the ten pieces of music from each period were chosen at random for presentation to the network for the learning phase. The network was exposed to all 24 pieces in rotation (adjusting the weights after each exposure in accordance with the learning function) until it learned to correctly identify the periods of all of the pieces in the set. The remaining six pieces were held back for use in the generalization phase.

The network learned to correctly identify the periods of the 24 pieces of music in 65 cycles. It was then able to correctly generalize its knowledge to all six of the pieces of music to which it had not been exposed during training. One interesting feature of its performance was only discovered post hoc. During the training phase no piece in a minor key was included, but in the generalization phase, one was included. The network was able to correctly identify the period in which it was written. It is interesting to note that even with such crucial features of the music missing as tempo and dynamics, the backpropagation network was able to learn the stylistic differences between Baroque, Classical, and Romantic periods and to apply this learning to works that were not included in the training phase. It would seem that this connectionist representation could be a psychologically valid model of music cognition since it performs very similarly to the way we perform.

Research is continuing into whether or not this network can make finer discriminations of between musical styles. Currently we are looking at the Baroque period to determine if the network can discern the difference between the French, Italian, and German composers. We will also soon present the trained network with musical works on the borders between musical styles (such as Rococo) to see if the network displays some "confusion" about the style, just as humans do.