CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humaniti - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humaniti

Description:

Social sciences and humanities researchers are often not aware of the potential ... scholars of all disciplines, especially social sciences and humanities (SSH) ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 21
Provided by: Mart254
Category:

less

Transcript and Presenter's Notes

Title: CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humaniti


1
CLARIN Common Language Resources and Technology
Infrastructure for the Social Sciences and
Humanities
  • Steven Krauwer
  • Utrecht institute of Linguistics UiL-OTS (NL)
  • INFuture, Zagreb
  • Nov 7 2007

2
Overview
  • Problem Mission
  • Some why-questions
  • Approach
  • How we work and who we are
  • Why this talk
  • Summing up

3
The problem
  • Much data in digital archives language based
  • Many archives only known to local insiders and
    mostly unconnected
  • Every archive has its own standards for storage
    and access, normally only simple retrieval of
    files (text, audio or video documents)
  • Social sciences and humanities researchers are
    often not aware of the potential benefits of
    using language and speech technology tools, and
    these tools are hard to use for non-specialist

4
The CLARIN Mission
  • What
  • Create an infrastructure that makes language
    resources and technology (LRT)available to
    scholars of all disciplines, especially social
    sciences and humanities (SSH)
  • How
  • Unite existing digital archives into a federation
    of connected archives with unified web access
  • Provide language and speech technology tools as
    web services operating on language data in
    archives

5
Why a European infrastructure?
  • too much fragmentation
  • lack of coordination
  • lack of visibility
  • lack of interoperability
  • lack of sustainability
  • expertise exists but not in all countries
  • language independent tools can be shared
  • language dependent tools can often be ported
  • most countries not able to bear the cost

6
Why now?
  • Exponential growth of digital data
  • Maturity of language and speech technology
  • allows for high speed processing
  • allows for large volumes
  • allows for new research questions
  • Growing interest at EU level in research
    infrastructures (RI) for the ERA
  • ESFRI RI Roadmap published in 2006 includes 34
    proposals for RIs
  • all of them will get EC funding for a 1-3 year
    preparatory phase

7
Overall plan for CLARIN
  • Preparatory phase 2008 2010
  • Put everything in place to get started for real
  • Build prototype
  • Budget in preparatory phase
  • 4.1 M from EC
  • ??? M from participating countries
  • Construction phase 2011 2015
  • Build and populate with tools and resources
  • Exploitation phase 2016 - .
  • CLARIN in full service
  • Overall budget 2008 - 2020 ca 200 M

8
4-dimensional approach for the prep phase
  • The technical dimension
  • The language dimension
  • The user dimension
  • The governance and legal dimension

9
Technical
  • Technical specification of the infrastructure
  • Construction of a prototype
  • Validation on rich variety of
  • languages (gt20)
  • resources
  • services
  • based on existing resources and tools (i.e. not a
    digitization or tools creation project)
  • Strong focus on interoperability standards
  • Conversion of existing resources
  • Encapsulation of existing tools

10
  • Strong sustainable centers

11
Languages
  • Intention to cover all languages spoken or
    studied in participating countries
  • Representational and descriptive standards should
    be adequate and validated for all languages
  • Same minimal coverage of basic resources and
    tools for all languages is to be defined (and
    implemented if additional funds are available)

12
Language activities
  • Survey of resources and tools, including
  • encoding and annotation data
  • quality indicators
  • agreeing on taxonomies and ontologies
  • agreeing on common standards
  • Focus on
  • integration of tools
  • interoperability
  • usage scenarios
  • if possible creation of missing essential
    resources
  • validating specifications and prototype

13
User
  • Users are SSH scholars
  • Do WE know what they need?
  • Do THEY know what they need?
  • Actions
  • analyze past and ongoing SSH projects
  • user consultation
  • launch typical example projects to show potential
  • create expertise centers
  • awareness actions

14
Governance, fundingand legal issues
  • Agree on e.g.
  • Who is going to pay for the construction and
    exploitation of the infrastructure
  • How will the costs be shared
  • How will it be managed
  • How will it be coordinated with national policies
  • Actions
  • Analyse best practice in funding and management
    of transnational projects
  • Prepare agreement between (now) 22 countries
    about long term joint funding of CLARIN
  • Set up IPR framework

15
How we work
  • Most tasks executed in Working Groups
  • WGs consist of project partners other experts
    (CLARIN is open for contributions by others!)
  • Some WGs do work (e.g. build prototype), others
    create consensus
  • Participation by others essential as e.g.
    standards cannot be imposed by a small group
  • Unfortunately no funding available for WG
    participation by others only influence!

16
Who we are
  • The CLARIN consortium has 32 partners from 22 EU
    and associated countries, including Croatia
    (FFZG)
  • The CLARIN community has 92 members in 32
    countries (Nov 07)
  • Leading partners are
  • Utrecht University (Steven Krauwer coordinator)
  • Max Planck Institute Nijmegen (Peter Wittenburg)
  • Hungarian Academy of Sciences (Tamas Varadi)

17
National vs EC funding
  • EC funds managed by consortium, will pay for
  • generic tasks (e.g. research, prototyping,
    coordination, dissemination)
  • participation by a single national coordination
    point in every country (in HR FFZG Zagreb)
  • National funds to be managed nationally, will pay
    for
  • participation by other sites in the country
  • taking care of own language and priorities
    (standards, validation, adaptation of tools
    resources)
  • carrying out example humanities projects
  • (hopefully) participating in Working Groups

18
Why this talk?
  • Invitation to join CLARIN
  • We need user involvement
  • We need archives willing to join the federation
  • We need experts for our centers of expertise
  • We need example humanities projects for the
    preparatory phase

19
Summing up (1)
  • CLARIN is about to embark on its 3 year
    Preparatory Phase project aimed at designing and
    building an LRT infrastructure for the SSH
  • It can only work with support from the whole SSH
    community, both inside and outside the EU
  • Please join us if you feel you can and want to
    contribute. We dont pay you but dont charge you
    either its free!
  • Contact
  • http//www.clarin.eu, steven.krauwer_at_let.uu.nl
  • or your national contact point

20
Summing up (2)
  • One day any SSH scholar should be able to ask
    without any difficulty
  • List all uses of enthusiasm in 19th century
    English novels written by women
  • Find all video clips of Tony Blair on BBC in
    2007
  • Summarize Le Monde of October 7th 2007 in
    Croatian
Write a Comment
User Comments (0)
About PowerShow.com