Automatic Music Transcription

What is automatic music transcription?

In a few words Automatic Music Transcription is a mathematical analysis of an audio recording (usually in WAV or MP3 format) and its conversion into musical notation (usually in MIDI format). This is a very hard artificial intelligence problem. For comparison, the problem of recognition of scanned text (OCR - Optical Character Recognition) is solved with 95% accuracy - it is an average exactitude of recognition of the programs of the given class. The programs of speech recognition already work with 80% accuracy, whereas the systems of music transcription work with 70% accuracy but only for a single voice melody (one note at a time). For polyphonic music the accuracy is even lower.

To create a MIDI sequence for a melody recorded in audio format (WAV, MP3, etc.) a musician must determine pitch, velocity and duration of each note being played and record these parameters into a sequence of MIDI events. A music transcribing software must do the same things. Even for a single instrument composition it is not a simple task, because an audio recording contains sampled waveform signals and doesn't contain any music specific data.

In general cases the variety of music timbres, harmonic constructions and transitions make it impossible to create a mathematical algorithm for precise reconstruction of a music score from the audio sources. It is hard to transcribe audio data which contains many instruments, drums and percussions or clipping signals, unstable pitch sounds and background noises. However, in many cases Akoff Music Composer will produce a MIDI notes that represents the melody line and basic chords of analyzed music.

The differences between audio and MIDI formats.

The difference between audio (WAV, MP3, OGG, etc.) and MIDI formats consists in representation of sound and music. Audio format is digital recording or sampling of any sound (including speech) and MIDI format is principally sequence of notes or MIDI events. The relations are approximately the same as between sounded speech and printed text.

Audio formats.

An audio file (WAV, MP3, OGG etc.) is the recording of a sound wave. It is the mix of all the given sounds (instruments, voices, background noises) you could have heard at the moment of recording. So you can record, for example, human voice in MP3 format, but you cannot edit any note or change any instrument in music recorded in an audio file. The Standard Windows PCM WAVE format contains only Pulse Code Modulation data without compression. PCM format is the only kind that saves the entire wave completely with no data loss.

There are many other formats for audio recording. They differ from each other by compression algorithms and can be referred to one group. The conversion from one format into another is very simple. There are many sound editors which allow one to do this.

MIDI format.

MIDI (Musical Instrument Digital Interface) format is a sequence of commands to control one or more pieces of musical hardware or software such as synthesizers or sequencers. These commands are not sounds, they are instructions to do something (mostly to generate sound). For example: select Instrument #1 (Acoustic Grand Piano), play Note #60 (C5) with Velocity #127. So you cannot represent, for example, human speech in MIDI format, but you can edit any note or change any instrument in music recorded in MIDI file.

MIDI to audio conversion.

Music recorded in MIDI format can be easily transformed to audio format. You can play MIDI files on an appropriate player and record reproduced music in a sound editor. The size of an audio file will be larger than the same music file represented in MIDI format. The quality of music will be determined by MIDI capabilities of your sound card and professionalism of the musician creating the source MIDI file. There are programs converting MIDI files into audio recordings using only their own timbres of MIDI instruments (WAVE-table synthesis).

Akoff Sound Labs