From the same archive

Mettre en temps une structure musicale : l'activité de composition de Voi(rex) par Philippe Leroux - Nicolas Donin, Jacques Theureau

April 14, 2005 01 h 01 min

Mettre en temps une structure musicale : l'activité de composition de Voi(rex) par Philippe Leroux - Nicolas Donin, Jacques Theureau

April 14, 2005 24 min

L'estimation de fréquences fondamentales multiples

May 12, 2005 52 min

La harpe électroacoustique

February 4, 2005 01 h 18 min

Utilisation de Modalys pour le projet VoxStruments, lutherie numérique intuitive et expressive - Nicholas Ellis, Joël Bensoam

October 17, 2007 49 min

Présentation des travaux l'équipe PdS dans le cadre du projet européen CLOSED : "Closing the Loop of Sound Evaluation and Design" - Olivier Houix

June 27, 2007 01 h 12 min

Sparse overcomplete methods, matching pursuit and basis pursuit - Bob L. Sturm

July 11, 2007 48 min

Transformations de type et de nature de la voix - Snorre Farner, Axel Roebel, Xavier Rodet

September 12, 2007 01 h 07 min

Segmentations et reconnaissances automatiques de phonèmes de la voix, temps différé, temps réel - Pierre Lanchantin, Julien Bloit, Xavier Rodet

September 19, 2007 01 h 13 min

Synthèse de la parole à partir du texte et construction d'une base de données d'unités de la voix - Christophe Veaux, Grégory Beller, Xavier Rodet

September 26, 2007 01 h 00 min

Projet ECOUTE - Jerome Barthelemy, Nicolas Donin, Geoffroy Peeters, Samuel Goldszmidt

October 3, 2007 01 h 12 min

Projet MusicDiscover - David Fenech Saint Genieys

October 10, 2007 01 h 10 min

Projet CASPAR - Jerome Barthelemy, Alain Bonardi

October 24, 2007 50 min

Projet CONSONNES 1ère partie - René Caussé, Vincent Freour, David Roze

November 21, 2007 57 min

Pitch-Gesture Modeling Using Subband Autocorrelation Change Detection

0:00/0:00

Malcolm SLANEY, invité par l'équipe Représentation Musicale,
présente en anglais :

"Pitch-Gesture Modeling Using Subband Autocorrelation Change Detection"

Calculating speaker pitch (or f0) is typically the first computational step in modeling tone and intonation for spoken language understanding. Usually pitch is treated as a fixed, single-valued quantity. The inherent ambiguity judging the octave of pitch, as well as spurious values, leads to errors in modeling pitch gestures that propagate in a computational pipeline.
We present an alternative that instead measures changes in the harmonic structure using a subband autocorrelation change detector (SACD).
This approach builds upon new machine-learning ideas for how to integrate autocorrelation information across subbands. Importantly however, for modeling gestures, we preserve multiple hypotheses and integrate information from all harmonics over time. The benefits of SACD over standard pitch approaches include robustness to noise and amount of voicing. This is important for real-world data in terms of both acoustic conditions and speaking style.
We discuss applications in tone and intonation modeling, and demonstrate the efficacy of the approach in a Mandarin Chinese tone-classification experiment. Results suggest that SACD could replace conventional pitch-based methods for modeling gestures in selected spoken-language processing tasks.

  • """""" * """""" * * """""" * """""" * * """""" * """""" *

Biography

Malcolm Slaney (Fellow, IEEE) is a Principal Scientist in Microsoft Research's Conversational Systems Research Center in Mountain View, CA.
Before that he held the same title at Yahoo! Research, where he worked on multimedia analysis and music- and image-retrieval algorithms in databases with billions of items. He is also a (consulting) Professor at Stanford University's Center for Computer Research in Music and Acoustics (CCRMA), Stanford, CA, where he has led the Hearing Seminar for the last 20 years.
Before Yahoo!, he has worked at Bell Laboratory, Schlumberger Palo Alto Research, Apple Computer, Interval Research, and IBM's Almaden Research Center. For the last several years he has helped lead the auditory and attention groups at the NSF-sponsored Telluride Neuromorphic Cognition Workshop. He is a coauthor, with A. C. Kak, of the IEEE book Principles of Computerized Tomographic Imaging. This book was republished by SIAM in their Classics in Applied Mathematics series. He is coeditor, with S. Greenberg, of the book Computational Models of Auditory Function.
Prof. Slaney has served as an Associate Editor of the IEEE TRANSACTIONS ON AUDIO, SPEECH, AND SIGNAL PROCESSING, IEEE MULTIMEDIA MAGAZINE, the PROCEEDINGS OF THE IEEE, and the ACM Transactions on Multimedia Computing, Communications, and Applications.

speakers

information

Type
Conférence scientifique et/ou technique
performance location
Ircam, Salle Igor-Stravinsky (Paris)
duration
01 h 00 min
date
September 4, 2013

Pitch-Gesture Modeling Using Subband Autocorrelation Change Detection

Malcolm SLANEY, invité par l'équipe Représentation Musicale,
présente en anglais :

"Pitch-Gesture Modeling Using Subband Autocorrelation Change Detection"

Calculating speaker pitch (or f0) is typically the first computational step in modeling tone and intonation for spoken language understanding. Usually pitch is treated as a fixed, single-valued quantity. The inherent ambiguity judging the octave of pitch, as well as spurious values, leads to errors in modeling pitch gestures that propagate in a computational pipeline.
We present an alternative that instead measures changes in the harmonic structure using a subband autocorrelation change detector (SACD).
This approach builds upon new machine-learning ideas for how to integrate autocorrelation information across subbands. Importantly however, for modeling gestures, we preserve multiple hypotheses and integrate information from all harmonics over time. The benefits of SACD over standard pitch approaches include robustness to noise and amount of voicing. This is important for real-world data in terms of both acoustic conditions and speaking style.
We discuss applications in tone and intonation modeling, and demonstrate the efficacy of the approach in a Mandarin Chinese tone-classification experiment. Results suggest that SACD could replace conventional pitch-based methods for modeling gestures in selected spoken-language processing tasks.

  • """""" * """""" * * """""" * """""" * * """""" * """""" *
    Biography

Malcolm Slaney (Fellow, IEEE) is a Principal Scientist in Microsoft Research's Conversational Systems Research Center in Mountain View, CA.
Before that he held the same title at Yahoo! Research, where he worked on multimedia analysis and music- and image-retrieval algorithms in databases with billions of items. He is also a (consulting) Professor at Stanford University's Center for Computer Research in Music and Acoustics (CCRMA), Stanford, CA, where he has led the Hearing Seminar for the last 20 years.
Before Yahoo!, he has worked at Bell Laboratory, Schlumberger Palo Alto Research, Apple Computer, Interval Research, and IBM's Almaden Research Center. For the last several years he has helped lead the auditory and attention groups at the NSF-sponsored Telluride Neuromorphic Cognition Workshop. He is a coauthor, with A. C. Kak, of the IEEE book Principles of Computerized Tomographic Imaging. This book was republished by SIAM in their Classics in Applied Mathematics series. He is coeditor, with S. Greenberg, of the book Computational Models of Auditory Function.
Prof. Slaney has served as an Associate Editor of the IEEE TRANSACTIONS ON AUDIO, SPEECH, AND SIGNAL PROCESSING, IEEE MULTIMEDIA MAGAZINE, the PROCEEDINGS OF THE IEEE, and the ACM Transactions on Multimedia Computing, Communications, and Applications.

IRCAM

1, place Igor-Stravinsky
75004 Paris
+33 1 44 78 48 43

opening times

Monday through Friday 9:30am-7pm
Closed Saturday and Sunday

subway access

Hôtel de Ville, Rambuteau, Châtelet, Les Halles

Institut de Recherche et de Coordination Acoustique/Musique

Copyright © 2022 Ircam. All rights reserved.