Piano Music Transcription Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Piano Music Transcription Systems

Description:

SONIC: A system for transcription of piano music ... SONIC is capable of detecting note onsets, lengths and loudness as well as a ... SONIC: Repeated notes ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 26
Provided by: Gre9198
Category:

less

Transcript and Presenter's Notes

Title: Piano Music Transcription Systems


1
Piano Music Transcription Systems
  • Presented by Greg Eustace

2
Overview
  • Introduction to polyphonic transcription systems
    including pioneering work, basic architecture,
    piano transcription systems and current problems.
  • Discussion of Marolts paper A Connectionist
    Approach to Automatic Transcription of Piano
    Music, including adaptive oscillator networks,
    Neural Networks and the SONIC system for piano
    transcription.

3
Polyphonic transcription
  • Transcription The extraction of symbolic
    (notational) information from music, including
    pitches, dynamic levels, onset times and
    durations of notes.
  • For automatic transcription systems the input is
    an audio file and the output could be a MIDI or
    score file.
  • Polyphonic transcription systems have been in use
    since the early 1970s beginning with the work of
    Moorer whose system assumed only two voices,
    separated in frequency range, having different
    timbres and restricted intervallic relationships.

4
Basic architecture of transcription systems
Front end processing
  • The audio signal is converted to a time-frequency
    representation such as provided by the short time
    Fourier transform (STFT).
  • Partials present in a frame are identified and
    their frequency and amplitude information is
    extracted. Peak picking is commonly achieved by
    setting an amplitude threshold.
  • Partial tracks are formed by connecting
    partials across frames based on their amplitude
    and frequency relationships.

5
Basic architecture Blackboard systems for note
identification
  • So called blackboard systems use various criteria
    to group partials together in order to identify
    notes.
  • Each criterion is represented by a knowledge
    source. These could include information from
    physics, psychoacoustics or music theory.
  • Blackboard system often unify top-down and
    bottom-up approaches.

6
Basic architecture Machine learning for note
Identification
  • Other systems use machine learning for pattern
    recognition, such as Hidden Markov Models and
    Neural networks. This necessarily involves a
    training stage in which input-output pairs from a
    data set are introduced to the network.

7
Assumptions of transcriptions systems
  • Many transcriptions systems assume a specific
    instrument as input. Thus, transcription problems
    which are not specific to the instrument need not
    be accounted for in the system.
  • Piano transcription systems dont need to deal
    with notes that modulate in frequency. Piano
    notes also have pronounced attacks making onset
    detection easier.

8
Current problems
  • Types of error Missed spurious notes.
  • Octave error (due to ambiguity between partials)
  • High polyphony
  • High low notes
  • Repeated notes
  • Short note durations
  • Masking of low amplitude notes

9
A Connectionist Approach to Automatic
Transcription of Piano Music
  • Matija Marolt

10
Auditory model time frequency analysis
  • The input audio is passed through a series of
    logarithmically spaced gammatone filters, with
    center frequencies between 70 and 6000 Hz.
  • The output from each filter is processed using
    Meddis model of hair cell transduction,
    involving half wave rectification, saturation and
    reduction. This results in a quasi-periodic
    impulsive signal that represents the firing
    patterns of hair cells. Dynamic compression is
    also inherent, meaning that low amplitude
    partials will be more detectable.

11
Auditory model time frequency analysis
  • Figure 1 Analysis of three partials of piano
    note F3 with the Auditory Model (Marolt, 2004).

12
Partial tracking using adaptive oscillators
  • Partial tracking is achieved using Large-Kolen
    adaptive oscillators which synchronize to the
    frequency and phase of the driving signal (i.e.
    the output from auditory model). In this way
    partials are identified if synchronization
    occurs.
  • Synchronization operates according to the
    modified gradient descent rule, minimizing an
    error function that describes the difference
    between input events and beginnings of
    oscillation cycles.
  • The initial frequency of the oscillator is set to
    that center frequency of the corresponding
    filter.
  • The oscillator attempts synchronization at the
    beginning of every cycle so lower frequencies are
    slower to synchronize.
  • The authors have shown that adaptive oscillators
    can successfully track frequency modulated
    partials.

13
Partial tracking using adaptive oscillators
  • Figure 2 Partial tracking with adaptive
    oscillators (Marolt, 2004).

14
Partial tracking Adaptive oscillator networks
  • Partial groups are tracked using 88 networks (up
    to ten) of adaptive oscillators. Increasingly
    smaller networks are used for higher frequency
    notes in correspondence with an upper bound
    specified at 6000 Hz for partial frequencies. The
    frequency of each oscillator in the network is
    initially set to an integer multiple of the
    fundamental.
  • An excitatory relationship between the
    oscillators in a network, allows synchronized
    oscillators to change the frequency of
    non-synchronized oscillators (based on harmonic
    relationships), thus achieving faster
    synchronization rates.
  • The output of a network is a weighted sum of the
    outputs of its oscillators. Oscillators are
    weighted according to their closeness to ideal
    frequencies. An oscillator that deviates strongly
    from the ideal contributes less to the output of
    the network.

15
Artificial Neural Networks
  • Artificial Neural Network (ANN) Models the
    neural structure of the brain. In simple terms,
    the ANN receives information from various sources
    through input neurons, combines or transforms
    the information in some way (handled by neurons
    in hidden layers) and outputs that information
    via the firing of output neurons.

16
Note detection using Neural Networks
  • The system uses 76 neural networks.
  • Each network is trained to recognize a particular
    note (A1 to C8).
  • The input to each network is accepted from a
    partial tracking module.
  • The output of the network is single neuron. A
    neuron with a high value represents the presence
    of the target note.
  • The data set for testing consisted of a
    synthesized piano pieces and piano chords, thus
    allowing for input-output patterns (300,000 in
    total) to be presented to the network.

17
Note detection using Neural Networks
  • Several different types of neural networks were
    tested, with time-delay NNs provided the best
    results.

18
Note detection using neural networks
  • The authors compared the result of using their
    partial tracking method as input for the
    time-delay NNs, with that of a time-frequency
    transform (similar to constant Q transform).
    Their partial tracking method performed better.
  • Table 2 Average performance of systems with and
    without partial tracking (Marolt, 2004).

19
SONIC A system for transcription of piano music
  • The partial tracking and note detection systems
    were incorporated in to the SONIC system for
    piano transcription. SONIC is capable of
    detecting note onsets, lengths and loudness as
    well as a particular pianos tuning and the
    presence of repeated notes.
  • SONIC is available for download at
  • http//lgm.fri.uni-lj.si/SONIC.html

20
SONIC A system for transcription of piano music
  • Figure 4 Structure of SONIC (Marolt, 2004).

21
SONIC Onset detection
  • The onset detector involves splitting the audio
    into 22 frequency bands. The outputs are filtered
    to give a positive value when the signal rises
    and negative value otherwise. The filter outputs
    control the activation of neurons which send
    impulses that indicate onsets. Multilayer
    perceptrons (MLP) are used to determine if the
    impulse represents an onset or some other type of
    amplitude disturbance.

22
SONIC Repeated notes
  • Detecting repeated notes poses a problem if notes
    which share the same partials are present in a
    chord containing a repeated note. SONIC uses MLPs
    for tracking repeated notes.

23
SONIC Tuning, note lengths and dynamics
  • The pianos tuning needs to be detected prior to
    transcription. Adaptive oscillators are used to
    detect partials. Tuning is then calculated as a
    weighted sum of the deviation of the partials
    from ideal frequencies.
  • Notes terminations (and therefore lengths) are
    indicated when the note activation networks fall
    below a threshold.
  • Dynamics are calculated using the amplitude
    envelope of the notes first harmonic.

24
Performance Statistics
  • Table 3 Performance statistics of transcriptions
    of 3 synthesized and 3 real piano recordings
    (Marolt, 2004).
  • Transcription results available at
  • http//lgm.fri.uni-lj.si/SONIC.html

25
Error Discussion
  • The majority of errors encountered were concerned
    with octave error and repeated notes. Additional
    sources of error include fast passages (such as
    arpeggios or thrills), masking of low amplitude
    notes, missed onsets, high polyphony or very low
    pitched notes.
Write a Comment
User Comments (0)
About PowerShow.com