Center%20for%20Language%20and%20Speech%20Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Center%20for%20Language%20and%20Speech%20Processing

Description:

Algorithms and processor architecture design for energy efficient acoustic, ... systems Object tracking in sensor networks Communication channels Microphone ... – PowerPoint PPT presentation

Number of Views:387
Avg rating:3.0/5.0
Slides: 32
Provided by: Sanje7
Learn more at: https://www.cs.jhu.edu
Category:

less

Transcript and Presenter's Notes

Title: Center%20for%20Language%20and%20Speech%20Processing


1
Language and Speech Processing at Johns Hopkins
University
  • March 5, 2010

2
The JHU Center for Language and Speech Processing
CLSP was established in 1992 with outside support
to promote research and education in the science
and technology of speech and language.
Electrical and Computer Engineering
Cognitive Science
Computer Science
CLSP
Applied Math Statistics
Biomedical Engineering
Human Language Tech.Center of Excellence
3
Speech and Language Faculty at JHU(does not list
senior research staff, postdocs, students, )
  • Electrical Computer Eng
  • Andreas Andreou
  • Mounya Elhilali
  • Hynek Hermansky
  • Frederick Jelinek (Director)
  • Damianos Karakos
  • Sanjeev Khudanpur
  • Computer Science
  • Chris Callison-Burch
  • Jason Eisner
  • David Yarowsky
  • Applied Math Statistics
  • Carey Priebe
  • Cognitive Science / Psychology
  • Justin Halberda
  • Geraldine Legendre
  • Kyle Rawlins
  • Paul Smolensky (Asst Dir)
  • Colin Wilson
  • Biomedical Engineering
  • Eric Young
  • Applied Physics Laboratory
  • James Mayfield
  • Christine Piatko
  • HLT Center of Excellence
  • Kenneth Church
  • Mark Dredze
  • Aren Jansen
  • Ben Van Durmé

4
CLSP Vision Statement
  • Understand how human language is used to
    communicate ideas/thoughts/information.
  • Develop technology for machine analysis,
    translation, and transformation of multilingual
    speech and text.

5
CLSP Mission Statement
  • Research
  • Advance state of the art in our interdisciplinary
    field
  • Focus on developing key algorithms and
    statistical models
  • Focus on strategic languages, including
    low-resource languages
  • Education
  • Attract the best students and train them to be
    leaders
  • Offer full spectrum of courses
  • Conduct annual international summer school at JHU
  • Outreach
  • Be responsive to government and industry problems
  • Serve as a hub for the HLT community
  • Organize international summer research workshop
    at JHU
  • Welcome short- and long-term visitors

6
Research Primary Areas
  • Natural Language Processing
  • Morphological analysis
  • Syntactic analysis (parsing)
  • Information extraction
  • Co-reference resolution
  • Machine Translation
  • Low-resource languages
  • Arabic and Chinese
  • Knowledge-Base Population
  • Automatic content extraction
  • Inference and learning
  • Machine Learning
  • Small-sample learning
  • Structured prediction
  • Minimally supervised learning
  • Speech Recognition
  • Acoustic processing
  • Acoustic-phonetic modeling
  • Pronunciation modeling
  • Language modeling
  • Speech Applications
  • Keyword spotting
  • Spoken term detection
  • Speaker verification
  • Language identification
  • Speech Science
  • Auditory physiology
  • Neuromorphic signal processing

7
Sponsored Research in Speech Language in WSE is
2.5M/year
P Investigator Project Title (Granting Agency) Period Amount
Jelinek Investigation of Meaning Representation in Language Understanding (NSF) 10/05-01/10 2.5 M
Jelinek Cross Cutting Research Workshops in Intelligent Information Systems (NSF) 09/07-08/11 830 K
Eisner Finite-State Machine Learning on Strings and Sequences (NSF) 02/04-01/10 500 K
Khudanpur Rosetta An Analyst Co-pilot (DARPA/IBM) 10/05-04/11 3.4 M
Eisner Learned Dynamic Prioritization (NSF) 09/10-08/14 1.2 M
Hager Gesture Induction for Manipulative and Interactive Tasks (NSF) 02/06-01/10 490 K
Smolensky Unifying the Science of Language (NSF IGERT) 05/06-04/11 3.0 M
Vice Provost Human Language Technology Center of Excellence (MPO) 01/07-01/17 50 M
Andreou Energy Efficient Organic Semiconductor Circuits (DOE) 05/07-04/10 660 K
Yarowsky Multi-Level Modeling of Language and Translation (NSF) 06/07-06/10 400 K
Karakos Novel Approaches to Unsupervised Classification via ISPDTs (NSF) 09/07-08/10 300 K
Callison-Burch DARPA Computer Science Study Group (DARPA) 02/08-02/09 93 K
Jelinek Research Workshops in Intelligent Information Systems (Google) 06/08-05/10 270 K
Khudanpur Self-Supervised Discriminative Training of Statistical Language Models (NSF) 09/08-08/09 137 K
Khudanpur Self-Training for ASR in Low Resource Languages (BBN) 09/09-08/10 101 K
8
CLSP Mission Statement
  • Research
  • Advance state of the art in our interdisciplinary
    field
  • Focus on developing key algorithms and
    statistical models
  • Focus on strategic languages, including
    low-resource languages
  • Education
  • Attract the best students and train them to be
    leaders
  • Offer full spectrum of courses
  • Conduct annual international summer school at JHU
  • Outreach
  • Be responsive to government and industry problems
  • Serve as a hub for the HLT community
  • Organize international summer research workshop
    at JHU
  • Welcome short- and long-term visitors

9
Education Interdisciplinary Environment
  • Who and where
  • PhD, MSE, and BS students from multiple depts.
  • Shared interdisciplinary offices in CSEB
  • Shared technical perspective and computing
    infrastructure
  • Coursework
  • Interdisciplinary core curriculum (extends dept.
    requirements)
  • Variety of other relevant courses (growing list,
    new plans)
  • International 2-week summer school
  • Research
  • Students do research from the start
  • Students work with faculty from multiple
    departments and HLTCOE
  • Other learning
  • Distinguished outside speaker every week
  • Student speaker and town meeting every week
  • Reading groups and conference travel

10
Sample Courses for an MSE in Human Language
Technology
Course Number Course Title Instructor
CS 600.465 Natural Language Processing Eisner
CS 600.466 Information Retrieval and Web Agents Yarowsky
CS 600.425 Declarative Methods Eisner
AMS 550.732 Pattern Recognition Priebe
COG 050.320 Introduction to the Syntax of Natural Language Legendre
COG 050.325 Sound Structure in Natural Language Burzio
COG 050.825 Optimality Theory Smolensky
ECE 520.445 Introduction to Speech and Audio Processing Elhilali
ECE 520.447 Introduction to Information Theory and Coding Jelinek
ECE 520.651 Random Signal Analysis Khudanpur
ECE 520.666 Information Extraction from Speech and Text Jelinek
ECE 520.674 Information Theoretic Methods in Statistics Khudanpur
ECE 520.682 Computational Systems Neuroscience Elhilali
ECE 520.735 Sensory Information Processing Andreou
11
Education Track Record
  • CLSP PhDs presently hold senior
    technical/research positions at
  • Apptek
  • BBN
  • Convergys
  • e-Scription
  • Fair Isaac
  • Google (several)
  • Microsoft (several)
  • MITRE
  • IBM (several)
  • NSA (several)
  • Nuance (several)
  • SRI International
  • WSE has the 2nd largest university group in the
    U.S. working on Human Language Technology
  • 38 PhDs awarded, many more MSEs
  • CLSP PhDs presently hold research/faculty
    positions at
  • Carnegie Mellon University
  • U. of Massachusetts, Amherst
  • Swarthmore College
  • Michigan State University
  • Hong Kong Polytechnic Univ.
  • Bogazici University (Turkey)
  • U. of Karlsruhe (Germany)
  • Saarland University (Germany)

12
CLSP Mission Statement
  • Research
  • Advance state of the art in our interdisciplinary
    field
  • Focus on developing key algorithms and
    statistical models
  • Focus on strategic languages, including
    low-resource languages
  • Education
  • Attract the best students and train them to be
    leaders
  • Offer full spectrum of courses
  • Conduct annual international summer school at JHU
  • Outreach
  • Serve as a hub for the HLT community
  • Be responsive to government and industry problems
  • Organize international summer research workshop
    at JHU
  • Welcome short- and long-term visitors

13
JHU Summer Workshops in HLTIntegrating Research
and Education
  • Organized by JHU on behalf of the Human Language
    Technology field
  • 3 teams per summer (since 1995)
  • selected refined from ? 25 proposals by
    interactive peer review
  • each team comes to JHU for 8 weeks of intense
    collaborative research
  • Mixed teams of senior and student researchers
  • Team 3 academics, 1 industry, 1 govt, 2-3 grad
    students, 2 undergrads
  • 30 participants ? 8 weeks ? 15 years
  • More than 160 star students trained in HLT
    research (19982007)
  • Outcomes
  • Numerous research breakthroughs
  • New, long-term collaborations, tangible knowledge
    transfer
  • Diverse expertise, research infrastructure, data
    resources

14
JHU Summer Workshops in HLTIntegrating Research
and Education
  • 2010 Topics
  • Speech Recognition with Segmental CRFs
  • Spatial and Temporal Localization of Objects and
    Actions in Videos Using Text and Video Analysis
  • Models of Synchronous Grammar Induction for
    Statistical Machine Translation
  • 2009 Topics
  • Parsing the Web Large-Scale Syntactic Processing
  • Speech Recognition for New Languages and Domains
  • Unsupervised Acquisition of Lexical Knowledge
    from n-Grams

15
A Few of ManyWorkshop Accomplishments
  • A small sample of research results and their
    wider impact
  • Statistical Machine Translation (1999)
  • GIZA is extensively used to build SMT systems
    even today
  • MEAD Multilingual Multi-document Summarization
    (2001)
  • 100s of worldwide users, active developers in the
    community
  • SuperSID High-level information for Speaker-ID
    (2002)
  • Major breakthrough in speaker recognition
    technology
  • Factored Language Models (2002)
  • Improved ASR technology for conversational Arabic
  • Moses Machine Translation Repository (2006)
  • The de facto standard in statistical machine
    translation
  • More than 100 refereed publications
  • Detailed technical reports also available on CLSP
    web-site

16
Human Language TechnologyCenter of Excellence at
JHU
  • Long-term research mission Automatically analyze
    a wide range of speech, text, and document images
    in multiple languages.
  • Founded with government support in 2007
  • Has brought many new researchers and research
    challenges into the CLSP community
  • Aggressively hiring the top new Ph.D.s nationally

17
Human Language TechnologyCenter of Excellence at
JHU
Sponsor RD Leadership
JHU Provost
Whiting School of Engineering
Executive Director
Administrative Staff
Sponsor Technical Board
Security Staff
Director of Research
Researchers
Sponsor Researchers
Center for Language and Speech Processing
18
Prof. Andreas G. Andreou Sensory Information
Processing in Natural and Synthetic Systems
  • Applications
  • Algorithms for robust ASR
  • Robust acoustic feature representation and
    dimensionality reduction
  • Algorithms and architecture optimization for Chip
    Multi Processors (CMP) in Exascale systems
  • Multimodal scene analysis
  • Active and passive processing for scene analysis
    (visual auditory)
  • Acoustic and EM micro-Doppler imaging
  • Bio-inspired systems
  • Energy efficient microsystems for processing what
    and where in natural environments.
  • Research
  • Principles of sensory information processing in
    biology.
  • Sensory communication.
  • Algorithms and processor architecture design for
    energy efficient acoustic, speech and vision
    processing.
  • Physics of sensing and computation.

19
Prof. Chris Callison-Burch Statistical Machine
Translation
  • Research
  • Statistical machine translation
  • Syntactic translation models
  • Low resource languages
  • Data-driven paraphrasing
  • Evaluation measures, creation of shared data
    resources

20
Prof. Kenneth ChurchHuman Language Technology
(HLT) at Scale
  • Applications
  • Web search
  • Cloud computing
  • Language modeling
  • Text analysis
  • Spelling correction
  • Word-sense disambiguation
  • Terminology
  • Translation
  • Lexicography
  • Compression
  • Speech recognition and synthesis
  • OCR
  • Research
  • Speech Processing at Scale
  • Language Processing at Scale
  • Web Search at Scale
  • Mining Speech/Language with Zero Linguistic
    Resources

21
Prof. Mark Dredze Applications of Machine
Learning to Real-World Text Processing
  • Applications
  • Domain adaptation
  • Extending NLP models to new datasets
  • Cross-domain learning
  • Applying NLP techniques to languages with few
    resources
  • Knowledge base population
  • Building large high precision knowledge bases
    from text
  • Intelligent email
  • Improved email clients by aiding the user with
    artificial intelligence
  • Research
  • Adaptation of machine learning algorithms between
    text domains
  • Large scale information processing and learning
  • Intelligent user interfaces for information
    management

22
Prof. Jason Eisner Algorithms and Models for
Language Processing
  • Applications
  • Parsing sentence structure
  • Faster and more accurate algorithms
  • Unsupervised or cross-lingual learning
  • Machine translation
  • Model syntax, structure, word order
  • Combinatorial methods for translation and for
    training models
  • Morphology / phonology
  • Word spelling and pronunciation
  • Variant word forms (conjugation, transliteration,
    misspelling, )
  • Information integration
  • Truth maintenance
  • Deductive databases
  • Reasoning from facts in text
  • Research
  • Novel algorithms for NLP
  • Bayesian statistical models of linguistic
    structure
  • Machine learning (structured prediction, novel
    training objectives)
  • Declarative formalisms for grammars and
    algorithms

23
Prof. Mounya Elhilali Reverse Engineering the
Neurobiology of Speech and Audio Processing
  • Applications
  • Speech intelligibility in noise and distortions
  • Auditory scene analysis and speaker segregation
  • Speech enhancement
  • Hearing prostheses
  • Adaptive audio systems
  • Robotics and autonomous systems
  • Object tracking in sensor networks
  • Communication channels
  • Microphone Design
  • Research Goals
  • Information representation and computational
    strategies employed by the brain
  • Sound perception in distorted or complex acoustic
    environments

24
Prof. Hynek Hermansky Robust Acoustic Speech
Processing
  • Applications
  • Speech recognition
  • what has been said?
  • Speaker identification
  • who is speaking?
  • Speaker verification
  • is the talker the one claimed to be?
  • Language identification
  • which language is being used?
  • Speech and audio coding
  • how to store/transmit the signal efficiently?
  • Enhancement of degraded speech
  • how to make noise or reverberated speech easier
    listening to?
  • Technology
  • Proprietary techniques based on temporal cues in
    the signal and on artificial neural net
    post-processing
  • Emulations of auditory processing in biology

25
Prof. Aren JansenKnowledge-based Approaches to
Speech Processing
  • Applications
  • Noise-Robust Speech Recognition
  • Invariance and efficiency through sparsity
  • Low-Resource Speech Recognition
  • What can be done with little or no transcribed
    training data?
  • Spoken Term Detection and Discovery
  • Google for speech documents
  • Query-by-example vs. text queries
  • Large-Scale Speech Processing
  • Scaling speech technology to massive problem sizes
  • Research
  • Pursuit of more invariant representations of
    speech
  • Unsupervised/semi-supervised learning of speech
    units
  • Sparse representations and models
  • Computational models of human speech perception

26
Prof. Frederick JelinekStatistical Speech
Recognition and Machine Translation
  • Research
  • Statistical aspects of Automatic Speech
    Recognition (ASR)
  • Language Modeling
  • Predicting next word given the past
  • Reconstruction of ASR output
  • Create a grammatical sentence preserving the
    speakers intended meaning
  • Rescoring of ASR output alternatives
  • Search algorithms for ASR and Machine Translation
  • Interests
  • Statistical grammar and parsing
  • Signals and systems
  • ASR treatment of out-of-vocabulary words and
    phrases
  • Machine translation

27
Prof. Damianos Karakos Statistical Aspects of
Speech and Language
  • Applications
  • Speech recognition
  • Adaptation to the speech topic
  • Error corrective techniques
  • Machine translation
  • System combination
  • Language modeling
  • Document categorization
  • Automatic clustering into meaningful categories
  • Detection of topics of interest
  • Technology
  • Data fusion and dimensionality reduction for
    improved inference in text classification.
  • Novel language modeling techniques for speech
    recognition.

28
Prof. Sanjeev KhudanpurStatistical Modeling for
Information Processing
  • Applications
  • Automatic speech recognition
  • Domain and genre adaptation
  • Pronunciation variability modeling
  • Machine translation (text speech)
  • Output language word ordering
  • Context dependent translation
  • Multimedia search and retrieval
  • Searching large speech archives
  • Content-based image/video search
  • Robotic minimally invasive surgery
  • Automated skill assessment
  • Automated surgical training
  • Basic Research
  • Stochastic Modeling of Signals and Systems
  • Parameter Estimation
  • Model Structure Estimation
  • Information Theory and Statistics

29
Prof. Benjamin Van Durme Computational
Semantics and Large-Scale Text Processing
  • Applications
  • Knowledge Acquisition
  • Enable everyday reasoning
  • Formal interpretation of generic sentences (e.g.,
    dictionary definitions)
  • Deep Information Extraction
  • Infer implicit relations
  • Semantic language modeling
  • Recognize higher order modification of factoids
  • Organizing Social Media
  • Dynamic clustering of authors, documents, feeds
  • Research
  • Application of theoretical semantics to problems
    in language technology
  • Streaming algorithms for efficient processing of
    large text collections

30
Prof. David YarowskyMinimally Supervised
Learning for Low-Resource Languages
  • Applications
  • Machine Translation
  • Translation discovery without aligned bilingual
    text
  • Exploiting language universals and language
    family relationships
  • Natural Language Processing
  • Word sense disambiguation
  • Inflectional and derivational morphology
  • Information Extraction
  • Biographic fact extraction
  • Characterizing communicants
  • Informal genres
  • Basic Research
  • Cross-language information projection
  • Cross-domain knowledge transfer
  • Co-training
  • Active learning and human computation
  • Creative bootstrapping from multiple knowledge
    sources

31
Linguistics and Human Language Processing
Prof. Paul Smolensky Architecture of Universal
Grammar
Prof. Colin Wilson Theoretical, Experimental,
Computational Phonology
Prof. Geraldine Legendre Syntax, Morphology,
Acquisition
Prof. Kyle Rawlins Formal Computational
Semantics
Prof. Justin Halberda Word Learning in Children
a new professor Human Sentence Processing
32
Lots of Great Ph.D. StudentsThe Next Big Things!
Welcome!
Write a Comment
User Comments (0)
About PowerShow.com