Helsinki Institute for Information Technology Scientific Advisory Board Meeting November 1517 , 2004 - PowerPoint PPT Presentation

1 / 113
About This Presentation
Title:

Helsinki Institute for Information Technology Scientific Advisory Board Meeting November 1517 , 2004

Description:

Second meeting of the Scientific Advisory Board ... Palaeontology, ecology, paleoecology. Climate studies. Linguistic applications ... – PowerPoint PPT presentation

Number of Views:414
Avg rating:3.0/5.0
Slides: 114
Provided by: martt153
Category:

less

Transcript and Presenter's Notes

Title: Helsinki Institute for Information Technology Scientific Advisory Board Meeting November 1517 , 2004


1
Helsinki Institute for Information
TechnologyScientific Advisory Board
MeetingNovember 15-17 , 2004
2
Participants
  • Prof. Alberto Apostolico
  • Prof. Christos Faloutsos
  • Prof. Bengt Jonsson
  • Prof. Randy Katz
  • Prof. Martin Kersten
  • (Prof. Kari-Jouko Räihä)
  • Prof. Mart Saarma
  • Prof. John Shawe-Taylor
  • Prof. Jukka Paakki
  • Prof. Olli Simula
  • Dr. Patrik Floréen
  • Dr. Aapo Hyvärinen
  • Prof. Heikki Mannila
  • Prof. Petri Myllymäki
  • Prof. Martti Mäntylä
  • Prof. Kimmo Raatikainen
  • Prof. Hannu Toivonen
  • Prof. Esko Ukkonen
  • Prof. Eero Hyvönen
  • Dr. Marko Turpeinen
  • Dr. Giulio Jacucci
  • Dr. Greger Lindén

3
Goals of the meeting
  • Second meeting of the Scientific Advisory Board
  • Obtain feedback from the Scientific Advisory
    Board on the relevance, quality, and impact of
    the current research
  • Obtain feedback and suggestions on plans for the
    research themes, applications, collaborations,
    etc.
  • A written evaluation of each group
  • Scientific quality, innovativeness, productivity
    and impact
  • Quality and quantity of industrial and societal
    impact
  • Feasibility and innovativeness of future plans
  • Competence and expertise of the team
  • Main strengths and weaknesses
  • Overall evaluation and suggestions from the SAB

4
Agenda for Monday 15 Nov 04
  • 14.00 Welcome, introductions, overview of HIIT
    (Martti Mäntylä and Esko Ukkonen)
  • Basic Research Unit activites
  • 16.00 Data Mining General (Heikki Mannila)
  • 16.30 Data Mining Applications (Hannu Toivonen)
  • 17.00 Neuroinformatics (Aapo Hyvärinen)
  • 17.30 Adaptive Computing Systems (Patrik Floréen)
  • 18.00-18.30 SAB internal discussions
  • 20.00 Dinner at Restaurant Kappeli,
    Eteläesplanadi 1

5
Agenda for Tuesday 16 Nov 04
  • Basic Research Unit, at Kumpula campus
  • 9.30 Demonstrations and discussion with
    researchers
  • 11.30 Lunch at Kumpula campus, Chemicum
  • 12.30 Transportation to HTC

6
Agenda for Tuesday 16 Nov 04
  • Advanced Research Unit Activites, HTC
  • 13.00 Mobile Computing (Kimmo Raatikainen)
  • 13.30 Semantic Computing (Eero Hyvönen)
  • 14.00 User Experience Research (Martti Mäntylä,
    Giulio Jacucci)
  • 1430 Break and refreshments
  • 14.45 Complex Systems Computation Group (Petri
    Myllymäki)
  • 15.15 Digital Contents Communities Group (Marko
    Turpeinen)
  • 15.45 Digital Economy (Jukka Kemppinen)
  • 1615 Break and refreshments
  • 16.30 Demonstrations and discussion with
    researchers
  • 20.30 Dinner at Restaurant George, Kalevankatu
    17, Helsinki

7
Agenda for Wednesday 17 Nov 04
  • At HTC
  • 9.30 A la carte (SAB may request discussions,
    interviews, further demonstrations)
  • 10.30 SAB internal discussions
  • 12.00 Lunch at Aqua restaurant in High Tech
    Center
  • 13.00 Feedback from SAB and discussion

8
Helsinki Institute for Information Technology
  • Joint research institute of University of
    Helsinki and Helsinki University of Technology
  • Goals strategic research in information
    technology and related topics, aiming at high
    scientific, industrial, and societal impact
  • Main themes mobile computing, user experience,
    intelligent systems, semantic Internet, societal
    media, digital economy, adaptive computation,
    data mining, bioinformatics, computational
    neuroscience

9
Organization two parts
  • Advanced Research Unit (ARU) 1999?
  • 2-3 year industry co-funded strategic research
    projects, CEC research, basic research
  • Located primarily in HTC in Ruoholahti
  • Martti Mäntylä, Research Director
  • Basic Research Unit (BRU) 2002?
  • Long-term research in areas relevant to other
    sciences and to industry
  • Located in the premises of the departments of
    computer science of the University of Helsinki
    and Helsinki University of Technology
  • Esko Ukkonen, Research Director

10
UH Comp. Sci.
TKK CS
BRU
ARU
11
Organization of HIIT
  • Joint board, scientific advisory board, and
    industrial advisory board
  • The senior researchers typically have positions
    also in one of the departments of computer
    science
  • No permanent positions

12
other sciences
companies
HIIT BRU
HIIT ARU
Industry co-funded research
Basic Research, Core Projects
Industrial RD
Advanced Research
Level of risk
Advanced Development
RD
0-2 years
2-5 years
5 years
Time
13
Basic Research Unit (BRU)
  • established 2002
  • basic funding from UH
  • main location in the premises of CS Dept of UH
    (new Exactum building at Kumpula Campus)
  • activities also in Otaniemi campus of TKK, CS
    Dept
  • infrastructure of CS Dept
  • Directors Heikki Mannila / Esko Ukkonen 9/2004
    -gt

14
Mode of operation
  • high-quality basic research of computer science
    on areas that have application potential in other
    sciences or in industry
  • collaboration between universitites
  • close co-operation with CS departments
  • participation in teaching
  • international networking, international recruiting

15
Personnel profile (2003)
  • senior researchers 5
  • prof. H. Mannila, prof. H. Toivonen, prof. J.
    Hollmen, doc. P. Floréen, doc. A. Hyvärinen
  • PhDs 8
  • PhD students 17
  • students 12
  • adm 1
  • current total about 50 (TKK 10)
  • from abroad 6 - 8

16
Funding profile (2004)
  • basic funding from UH (33) 660 kE
  • Academy of Finland (32) 645 kE includes
    research grants academy professor position
    senior researcher position 3 postdoc positions
  • industry projects (incl TEKES) 226 kE
  • graduate schools (Ministry of Education) 259 kE
  • European Union 116 kE
  • TKK 70 kE TOTAL 1976 kE

17
Research programme
  • Theory and applications of data mining (Heikki
    Mannila, Hannu Toivonen)
  • Neuroinformatics (Aapo Hyvärinen)
  • Adaptive computing (Patrik Floreen)

18
Goals for BRU for 2005-2006
  • Expanding and strengthening of the network of
    collaboration in Finland and internationally
  • Strong recruiting from abroad
  • More emphasis on software distribution
  • Emphasis on high-quality international research
  • Possibly opening a new research theme

19
Advanced Research Unit (ARU)
  • Established in 1999
  • About 80 researchers and staff
  • Main themes
  • Future Internet
  • Intelligent Systems
  • Network Society
  • Funding (2004) National Technology Agency (59),
    companies (15), European Union (6), Academy of
    Finland (9), universities (9), total some 3,8
    M
  • Director Prof. Martti Mäntylä
  • Primarily located in High Tech Center, Ruoholahti

20
Mode of Operation
  • Focusing on a few industrially relevant,
    strategic, long-term research areas with high
    potential impact
  • Core and basic research projects with long-term
    vision (5 years)
  • Medium term (3-5 years) projects focusing on new
    products, services, and technologies
  • Complementary impact-related projects (1-2 years)
  • Multidisciplinary and cross-disciplinary research
    founded on computer science competence
  • Networking with complementary research groups
  • Strong liaison with ICT and media companies
  • International cooperation

21
Personnel
  • 5 principal scientists
  • Prof. Martti Mäntylä, Prof. Kimmo Raatikainen,
    Prof. Petri Myllymäki, Prof. Jukka Kemppinen,
    Prof. Eero Hyvönen
  • 11 senior researchers
  • Dr. Pekka Nikander, Dr. Ken Rimey, Dr. Timo
    Saari, Prof. Henry Tirri, Dr. Wray Buntine, Dr.
    Jorma Rissanen, Dr. Patrik Floréen, Dr. Pekka
    Himanen, Dr. Marko Turpeinen, Dr. Timo Saari, Dr.
    Markku Stenborg
  • 2 post docs
  • Dr. Andrei Gurtov, Dr. Giulio Jacucci
  • 45 Ph.D. students
  • 10 M.Sc. students
  • 7 staff

22
Goals for 2005-2006
  • Maintain, upgrade, and expand competences of
    research groups (post-docs, senior researchers)
  • Increase further co-operation between research
    groups
  • Build new alliances with Finnish research units
  • Strengthen further existing international
    partnerships (UCB, Tsinghua), launch new ones
    (MLE, Waseda, KTH, )
  • Expand CEC funded research
  • Commence work on one or two new thematic areas
  • Caring of researchers and their careers
  • Operational excellence
  • Approximately 85-90 researchers

23
Challenges
  • Bottlenecks limiting the impact
  • Load of senior researchers
  • Inadequate processes and instruments for
    end-to-end research
  • Weak link with interesting users and user
    communities
  • Slow reaction speed, low risk tolerance
  • Red tape in recruiting foreign researchers
  • Funding and funding instruments
  • Less than 10 basic funding is not sustainable
  • Inadequate instruments for research testbeds

24
HIIT 2008 strategy draft
  • the first HIIT contract between UH and TKK for
    the period 1999 2004
  • new contract planned from 1 Aug 2005
  • strategy paper presented to the Board of HIIT in
    Sept 2004

25
Main principles
  • highest international level in research
  • one organization
  • requires balanced basic funding from UH and TKK
  • selection of research programmes (4-6) in
    co-operation between the Board, SAB, Industrial
    advisory board, and the research groups working
    in HIIT regular evaluation
  • co-operation between UH and TKK
  • collaboration with the CS Departments at UH and
    TKK, with industry, and with research
    institutions in Finland and abroad

26
Main principles (cont.)
  • internationalisation, international recruiting
  • participation in teaching
  • caring of researchers and their careers
  • no permanent positions
  • location in Kumpula (UH) and in Otaniemi (TKK)
  • long-term funding, competition based

27
Actions
  • Only modest growth. But should be large enough
    also for multicomponent projects.
  • Participation in teaching and other collaboration
    with mother departments will grow.
  • Stronger role for the SAB and industrial advisory
    board in choosing research themes

28
Actions (cont.)
  • Reforming the organization by combining ARU and
    BRU
  • ARU will relocate to Kumpula and Otaniemi,
    schedule depending on the availability of
    suitable locations. Kumpula V finished in autumn
    2007.
  • Financing relies too much on short-term external
    funding. More long-term basic funding is needed,
    for example for post-doc positions

29
Research programmes
  • HIIT starts and maintains certain research
    programmes
  • a programme started on Boards decision
  • For each programme, HIIT will fund
  • research director/principal scientist (can be
    part-time)
  • senior researcher/postdoc positions (0-2)
  • some seed money for other positions
  • the programme will seek additional funding from
    other sources (Academy, TEKES, EU, industry,)
  • a programme has several groups links to partners
  • selection of senior researchers/groups using
    competition evaluation

30
Data mining general
  • Heikki Mannila
  • Academy professor
  • HIIT Basic Research Unit
  • Helsinki University of Technology and University
    of Helsinki

31
Data mining
  • Data analysis is becoming more important in other
    sciences and in industry
  • New measurement methods
  • Ability to store data
  • High-dimensional large data sets
  • Non-traditional forms (e.g., strings, trees,
    graphs)
  • Data analysis lags behind

32
Data mining
  • Has emerged as a major research area in the
    interface of computer science and statistics
  • Machine learning, databases, algorithms
  • Data analysis questions are increasingly visible
    in database and algorithms research
  • Theory and practice interact
  • Fits very well within the overall mission of HIIT
  • Basic research in computer science
  • Fast applicability, possibility of impact

33
Goals
  • Develop novel data analysis techniques for the
    use of other sciences and industry
  • How?
  • Look at data analysis problems arising in
    practice
  • Abstract new computational concepts from them
  • Analyse the concepts and develops new
    computational methods
  • Take the results into practice
  • Theoretical work in algorithms and foundations of
    data analysis can have fast impact in the
    application areas
  • The applications feed interesting novel questions
    to theoretical research

34
Data mining research in HIIT
  • Three senior researchers (Mannila, Toivonen,
    Hollmén)
  • Operates on the two campuses (UH Kumpula, HUT
    Otaniemi)
  • Research groups with no strict borders, lots of
    interaction
  • Interaction with the adaptive computing systems,
    neuroinformatics, and complex systems computation
    groups

35
Data mining groups in HIIT
  • Mannila (UH HUT) (10-13 persons)
  • theory of data mining, discrete methods,
    segmentation etc.
  • genome structure, paleontology, linguistics
  • Toivonen (UH) (6-8 persons)
  • pattern discovery, algorithms
  • gene mapping, haplotyping, context awareness,
    paleoecology etc.
  • Hollmén (HUT) (5-9 persons)
  • mixture modeling, pattern discovery, bootstrap
    methods, sparse regression
  • gene expression, environmental modeling

36
Events in 2004
  • Very good success in obtaining funds from the
    Academy of Finland
  • Two postdoc positions
  • Two projects in the SysBio program
  • Academy professorship largish grant
  • Good success in international recruiting
  • Panayiotis Tsaparas, Alexander Hinneburg,
    (Aristides Gionis in 2003)
  • EU funding (April II, MobiLife)
  • Industrial projects
  • Gene mapping (Tekes)
  • Phenotype clustering (direct industrial funding)
  • Visiting students

37
Researchers
  • Heikki Mannila, Hannu Toivonen, Jaakko Hollmén,
    Aristides Gionis, Panayiotis Tsaparas, Ella
    Bingham, Alexander Hinneburg, Marko Salmenkivi,
    Mikko Koivisto, Saara Hyvönen, Petteri Sevon
  • Postdoc education!
  • 12 Ph.D. students
  • Good international visibility

38
Ph.D. theses since last SAB
  • Ella Bingham
  • Mikko Koivisto
  • Petteri Sevon
  • Kari Vasko
  • Matti Kääriäinen
  • Real soon now
  • Taneli Mielikäinen
  • Jouni Seppänen

39
Publications in major conferences in 2004
  • SIGMOD
  • PODS
  • VLDB
  • ISC
  • KDD
  • PSB
  • ICDE
  • ICDM
  • PKDD
  • EDBT
  • ICDE
  • ...

40
Choice of research areas
  • Foundational interest applicability
  • Relevance of the methods
  • Relevance of the application areas
  • Impact for the cooperating groups in other
    sciences
  • Methods that will be used (by collaborators)
  • Methods influencing the research agendas data
    gathering practices of the partners
  • Novel questions
  • Industrial impact relevant problems, useful
    solutions

41
Major themes in methods
  • Pattern discovery
  • Pattern discovery and probabilistic modelling
  • Methods for sequence decomposition
  • Similarity of complex objects
  • High-dimensional spatial data
  • Decomposition of discrete data

42
Application areas
  • Genome structure
  • Gene mapping
  • Gene expression data analysis
  • Ubiquitous computing (adaptive computing)
  • Palaeontology, ecology, paleoecology
  • Climate studies
  • Linguistic applications
  • Onomastics, study of variation in language,
    dialect studies

43
Examples of current work on theory of data mining
  • Random walks on databases (Geerts, Terzi,
    Mannila) ?
  • Distance measures between data sets (Tatti) ?
  • Clustering aggregation (Tsaparas, Gionis,
    Mannila) ?
  • Approximating a collection of frequent sets
    (Afrati, Gionis,Mannila) ?
  • (k,h)-segmentation (Gionis, Mannila, Haiminen,
    Terzi)
  • Vocabularies from sequences (Gionis, Tsaparas,
    Wexler, Mannila)
  • Subspace discovery (Seppänen, Gionis, Tsaparas,
    Hinneburg)
  • Segmentation distances (Terzi)
  • Tiles from 0/1 data (Seppänen, Gionis, Mannila)
  • Condensed representations (Mielikäinen Toivonen)
  • Metric labeling and spatial data (Salmenkivi,
    Tsaparas, Papadimitriou, Leino, Gionis, Mannila,
    Terzi)

44
Example random walks on databases
  • How to generalize HITS etc. to work on databases
    (and not just graphs)
  • Given a class of queries
  • State space partial tuples from the database
  • Transitions t?u, if there is a query Q such that
    u belongs to Q(t)
  • Use to rank query answers etc.
  • Quite nice results
  • Geerts, Mannila, Terzi, VLDB 2004

45
Example clustering aggregation
  • Given k clusterings C1, C2,..., Ck, , find a
    clustering C that minimizes the sum of the number
    of disagreements between the clusterings Ci and C
  • Robustness of clustering algorithms
  • Clustering categorical data
  • Detecting outliers

46
Correlation clustering
  • Given distances Xuv between objects in V
  • Find a partition C minimizing
  • Algorithms and approximation guarantees
  • Gionis, Mannila, Tsaparas, ICDE 2005

47
Example results
48
Example metric labeling
  • High-dimensional observation vectors in 1 or 2
    dimensions
  • How to take into account the underlying topology
    of the observational points?
  • Minimize

49
Example Distances between data sets
  • Nikolaj Tatti, HM
  • Given two datasets D1 and D2 over the same set of
    variables
  • What is their distance?
  • Given a collection of statistics f1 s(D1) and
    f2 s(D2)
  • E.g., certain marginal frequencies
  • What is the distance between D1 and D2 from the
    viewpoint of these statistics?

50
Distances between data sets
  • w1 the distribution having statistics f1 and
    maximal entropy
  • w2 the distribution having statistics f2 and
    maximal entropy
  • Distance I K-L(w2, w2)
  • Difficult to compute
  • 2nd order approximations
  • Distance II (f1 - f2)T cov-1(s) (f1 - f2)
  • Under certain assumptions the only choice
  • Very promising initial results!

51
Example Approximating a collection of frequent
sets
  • Existing frequent set mining algorithms output
    too many sets
  • many of the sets look quite similar
  • difficult to obtain a global understanding
  • Goal describe a transaction database using few
    sets
  • necessarily resort to approximations
  • which is OK since support threshold is arbitrary

52
The main idea
53
Theoretical results
  • Afrati, Gionis, Mannila, KDD 2004
  • Formalize notion of approximation
  • Distinguish concrete problem variants
  • Establish NP-hardness and develop algorithms

54
Experimental results
  • Course data set
  • Collection 1637 sets, Border 268 sets, support
    25

55
Future work
  • Theory and practice interact a lot!
  • (Almost) all theoretical directions are motivated
    by practical issues
  • The combination of continuous and combinatorial
    methods
  • Application areas selected by theoretical
    interest and potential impact (industrial
    scientific)
  • Use by collaborators vs. general distribution
  • Publications, collaborations, software releases

56
Future work
  • I Concepts and algorithms for describing
    structure of sequences
  • Segment structure vocabularies inference of
    order
  • II Methods for pattern discovery in and modelling
    of spatiotemporal data
  • Metric labeling spatial rules
  • Similarity of complex objects
  • Foundational issues in pattern discovery (e.g.,
    logical form of patterns and the difficulty in
    discovering them)
  • Mixture modelling and pattern discovery

57
Multilevel description of discrete sequences
  • Discrete sequences
  • Haplotypes, genomes, telecommunication alarms,
    words in documents,
  • Such sequences typically have a block structure
  • Underlying process has several different states,
    each with different characteristics
  • Structure can also be hierarchical
  • Describe the sequence in a useful way
  • For prediction, clustering, rule discovery,
    description,

58
Multilevel description of discrete sequences
  • Three linked main parts of the research program
  • Segment structure of sequences
  • Vocabulary of a sequence
  • Order from unordered sets
  • Rule discovery in sequences Time-series
    similarity Bayesian methods for piecewise
    constant approximation of event sequences
    (k,h)-segmentation Block and mosaic structure in
    haplotypes Clustering segmentations Vocabulary
    of a sequence Fragments of order Inference of
    partial orders Gene expression and chromosomal
    location

59
Segment structure of sequences
  • Segment structure of sequences
  • (k,h)-segmentation
  • Grammar inference
  • Hierarchical analysis
  • Applications genome, several genomes,
    haplotypes, telecom

dynamic programming, approximations
Aristides Gionis et al.
ACTAACGACG ACAATCCGCT TATACCAGAT CCAAATCAAC
grammar inference
G?(E F) E?EBE
(k,h)-segmentation
B
F
F
E
E
hierarchical description
60
Vocabulary of a sequence
  • Find a good set of recurrent words
  • Motif discovery
  • Segment structure on the basis of vocabularies

greedy algorithm on submodular functions, string
algorithms
Call me Ishmael. Some years ago never mind how

Panayiotis Tsaparas et al.
inthesea, harpoon, ...
61
Order from unordered data
  • Matrix reordering
  • Fragments of order
  • Inference of partial orders

Frequent patterns, spectral methods, mixture
modeling on combinatorial objects
Heikki Mannila et al.
B
ABCD ABCD ACBD ACBD
D
A
C
62
Methods for pattern discovery in and modelling of
spatiotemporal data
  • Data where observations have a location
  • Biodiversity data grid cells 2-d
  • Place names, dialect usage 2-d
  • Genome location in one dimension 1-d, pieces
  • Telecom alarms location in the network graph
  • Lots of interest from the application areas
  • How to take the underlying topology into account?
  • 2003?

63
Links on a fragmented 1-d space comparative
genomics
  • Orthologous genes in different species

64
Methods for spatial data
Antti Leino et al.
  • Rule discovery
  • spatial association rules
  • Mixture models MCMC
  • piecewise constant models
  • Clustering, metric labeling
  • algorithms approximations
  • Spatial statistics interaction with the math
    department

Marko Salmenkivi et al.
Aristides Gionis et al.
65
Summary of future plans
  • Theory and practice!
  • Structure of sequences
  • Spatiotemporal data
  • Foundations of pattern discovery similarity ...
  • Applications

66
Data Mining Applications
  • Hannu Toivonen
  • Professor

67
Application areas
  • Bioinformatics, especially medical genetics
  • gene mapping, haplotyping
  • discovery of genome structure
  • gene expression data analysis
  • Paleontology, paleoecology
  • Linguistics
  • Ubiquitous computing

68
Collaborations in applications
  • Computational methods for genome structure
  • Leena Peltonen (KTL), Juha Kere (Karolinska),
    Anu Jalanko (KTL)
  • Gene mapping
  • Leena Peltonen, Juha Kere
  • Jurilab Ltd.
  • Geneos Ltd.
  • Phenotype clustering
  • Orion Pharma

69
Collaborations (cont.)
  • Linguistics
  • R.-L. Pitkänen (Research center for the Languages
    of Finnish)
  • Terttu Nevalainen (Department of English)
  • Paleontology
  • Mikael Fortelius (Dept. of Geology), Jukka
    Jernvall (Institute of Biotechnology)
  • Paleoecology
  • Atte Korhola (Dept. of Ecology)
  • Environmental studies
  • Markku Kulmala (Dept. of Physics)

70
Researchers
  • Profs. Hollmen, Mannila, Toivonen
  • Postdocs
  • Ella Bingham genome structure, paleontology
  • Aristides Gionis genome structure, paleontology,
    spatial data
  • Alexander Hinneburg spatial data (linguistics)
  • Saara Hyvönen environmental modeling
  • Mikko Koivisto phenotype clustering, genome
    structure
  • Päivi Onkamo genetics
  • Marko Salmenkivi linguistics, spatial data
  • Petteri Sevon genetics, bioinformatics
  • Panayiotis Tsaparas web graphs, spatial data
  • 10 PhD students

71
Medical genetics
  • Important applications
  • locating disease predisposing genes is essential
    for understanding the aetiology of complex common
    diseases, such as heart disease or asthma
  • Focus on selected topics where
  • we can have a significant impact
  • we can combine our own expertise with the unique
    research on medical genetics in Finland
  • Collaboration with leading groups in medical
    genetics
  • Prof. Leena Palotie (Public Health Institute)
  • Prof. Juha Kere (Karolinska Institutet)

72
Gene mapping
marker locus
haplotype (chromosome)
case 1 4 8 2 2 1 2 6 2 case
2 4 3 7 3 2 8 4 2 case 4 5 2 4
5 5 2 6 4 case 7 2 3 7 5 4 5 2
2 case 5 2 4 6 2 4 2 6 1 case
3 4 3 7 3 1 3 3 4 case 1 2 1
5 2 5 2 6 2 case 5 3 3 7 3 2 1
4 3 control 2 4 7 1 3 4 1 4
8 control 7 3 7 7 5 7 8 6 6 control
3 4 3 2 5 3 2 3 2 control 2 5 2
4 3 1 3 6 2 control 3 3 1 2 4 2 1
4 2 control 1 6 4 5 5 5 9 1
3 control 4 2 8 4 2 3 5 2 5 control
2 2 4 9 5 4 4 2 4
allele
73
Gene mapping
case 1 4 8 2 2 1 2 6 2 case
2 4 3 7 3 2 8 4 2 case 4 5 2 4
5 5 2 6 4 case 7 2 3 7 5 4 5 2
2 case 5 2 4 6 2 4 2 6 1 case
3 4 3 7 3 1 3 3 4 case 1 2 1
5 2 5 2 6 2 case 5 3 3 7 3 2 1
4 3 control 2 4 7 1 3 4 1 4
8 control 7 3 7 7 5 7 8 6 6 control
3 4 3 2 5 3 2 3 2 control 2 5 2
4 3 1 3 6 2 control 3 3 1 2 4 2 1
4 2 control 1 6 4 5 5 5 9 1
3 control 4 2 8 4 2 3 5 2 5 control
2 2 4 9 5 4 4 2 4 pattern 1 (3)(4) 3
7 (3)(2) pattern 2 (5) 2 6 (2)
74
Highlights gene mapping
  • Formulation of gene mapping as a mixture of
    pattern discovery and classification
  • Concepts and methods from computer science
  • Haplotype Pattern Mining (HPM)
  • haplotype patterns
  • efficient algorithms for finding relevant
    patterns
  • a number of variants as follow-up
  • successful in gene mapping
  • patents licensing

75
Highlights gene mapping
  • Tree Disequilibrium Test (TreeDT)
  • looks for tree structured haplotype patterns
  • patterns reflect possible recombination histories
  • gene localization based on the pattern that best
    explains the disease status
  • efficient algorithms
  • new solutions to multiple, nested permutation
    tests

A B C D E F
M1 M2 M3 M4 M5 M6 M7 M8 M9 A 2 3 1 2 2 1 2
2 1 A B 3 1 1 2 1 1 2 1 1 B C 4 1
2 1 2 1 4 3 3 C D 1 2 1 2 3 1 4 1
4 D E 2 2 3 1 3 2 4 3 2 E F 1 2 1
3 1 2 1 4 2 F
76
Highlights haplotyping
  • Find the highest probability strings (haplotypes)
    explaining sequences of pairs (genotypes)
  • 1,2 1,1 1,2 ? 111 112 211 212 212 211
    112 111
  • Exponential for each genotype
  • HaploRec Markovian models efficient algorithms
  • variable length Markov chains P(H) P(H1)
    ?igt1 P(Hi) Hsi, i-1), where si mins
    Hs, i is statistically relevant
  • probabilities P(Hs,i) are estimated with EM
  • unique scalability

?
77
Highlights mosaic structure of haplotypes
  • Defining and utilising haplotype block structure
    of the human genome describing and finding the
    possible mosaic-like structure of haplotypes (and
    genotypes)

78
Highlights simulation studies
  • Population and marker simulation tools
  • Large simulation studies to test mapping
    methodologies
  • an unexpected result population-based haplotypes
    are as powerful as true haplotypes

79
Other applications
  • Medical genetics
  • analysis of phenotypic datasets finding robust
    clusters that have genetic explanations
  • Paleontology and paleoecology
  • finding good estimates of the ages of fossil
    sites finding matrix re-orderings that
    approximate the consecutive ones property

80
Other applications
  • Linguistics
  • finding spatial structure of the distribution of
    place names and words high-dimensional
    clustering
  • preliminary results on pattern discovery and
    mixture modelling techniques for large onomastic
    data sets
  • Ubiquitous computing
  • learning to recognize typical device contexts
    on-line clustering of stream data.

81
Future directions
  • Genome structure Computational tools for
    describing the variation between individuals and
    between species
  • haplotype blocks and mosaics
  • identification of rearrangements, duplications,
    and other large-scale variations
  • comparative genomics several species!
  • segment structure, reversal distances etc.
  • vocabulary of the genome
  • function and structure

82
Future directions (continued)
  • Mining biological databases
  • analysis of the rich, heterogeneous public
    databases
  • how to find patterns in complex irregular
    structures
  • discovery of similarities and analogies
  • producing plausible biological relationships and
    hypothesis

83
Future directions (continued)
  • Spatial and temporal variation in language
  • Spatiotemporal issues in paleontology and ecology
  • methods to detect patterns of variance in species
    abundances
  • methods to correlate paleoecological time series
    data, to find features in such data
  • Recognition of contexts in mobile applications

84
Summary
  • Application problem ? new computational concepts
    ? novel methods ? practical applications
  • Important data analysis problems
  • Successfully fielded applications
  • A wide network of excellent applied collaborators
  • Development of novel techniques
  • combinations discrete and probabilistic
    approaches
  • HIIT mode of working collaboration between
    groups, universities, other disciplines, and
    industry

85
Posters and demos
  • Genetic mapping studies Asthma and allergy.
    Päivi Onkamo
  • Mining Athmospheric Data. Saara Hyvönen
  • Geometric and combinatorial tiles in 0-1 data.
    (DEMO) Aristides Gionis, Heikki Mannila, Jouni
    Seppänen
  • Dimension induced clustering. (DEMO) Aristides
    Gionis, Alexander Hinneburg, Spiros
    Papadimitriou, Panayiotis Tsaparas.
  • Spatial Analysis of Area Data. Case Finnish lake
    names. Marko Salmenkivi, Saara Hyvönen, Antti
    Leino.
  • What was the Finnish hiisi? A case study on place
    name data. Marko Salmenkivi, Antti Leino, Saara
    Hyvönen.
  • Clustering aggregation. Aristides Gionis,
    Panayiotis Tsaparas, Heikki Mannila.
  • Spatially coherent clustering. Aristides Gionis,
    Heikki Mannila, Spiros Papadimitriou, Panayiotis
    Tsaparas
  • Spectral ordering. Aristides Gionis, Heikki
    Mannila, Mikael Fortelius, Jukka Jernvall
  • Genome puzzle. (DEMO) Mikko Koivisto, Teemu
    Kivioja, Heikki Mannila, Pasi Rastas, Esko
    Ukkonen.
  • Segmentation-based analysis of genomic
    sequences.(DEMO) Niina Haiminen, Evimaria Terzi,
    Aristides Gionis, Heikki Mannila.
  • Mining non-redundant association rules. (DEMO)
    Juho Muhonen
  • HaploRec Population-based reconstruction of
    haplotypes.(DEMO) Lauri Eronen
  • TreeDT Gene mapping by tree disequilibrium test.
    Petteri Sevon, Hannu Toivonen
  • An efficient method for association mapping in
    phase-unknown genotype data. Petteri Sevon, Päivi
    Onkamo, Hannu Toivonen
  • Techniques for simulating populations and marker
    data. Petteri Hintsanen, Petteri Sevon
  • Efficient population-based reconstruction of
    haplotypes. Lauri Eronen
  • Integrating the tools Power simulations for gene
    mapping studies. Petteri Hintsanen, Petteri
    Sevon, Päivi Onkamo.
  • Mapping susceptibility genes for familial glioma.
    Päivi Onkamo

86
Neuroinformatics
  • Dr. Aapo Hyvärinen

87
Scope of Neuroinformatics
  • Interface of brain research and information
    technology
  • Functional models of brain
  • Signal processing methods
  • Databases
  • Our specialization Multivariate statistical
    models
  • Principal component analysis
  • Independent component analysis
  • Extensions of ICA
  • New models (see later)

88
Researchers in Neuroinformatics
  • Aapo Hyvärinen, leader
  • Two post-docs
  • Patrik Hoyer
  • Jarmo Hurri
  • 5 PhD students (some partly)
  • Funding from Academy of Finland, Univ of
    Helsinki, foreign foundations

89
Research goals
  • Models of sensory processing in the brain,based
    on statistical analysis of natural stimuli
  • New biologically-inspired data analysis methods
  • Advanced statistical analysis of neuroscientific
    data
  • Common theme is multivariate data analysis

90
Reliability analysis of ICA
(Neuroimage, 2004)
  • ICA can find underlying factors that are
    independent and nongaussian
  • Results contain statistical and computational
    errors, which components are good?
  • A software package on the Web
  • Same approach works for comparison of individuals
    (NeuroImage, in press)

In cooperation with Universities of Naples and
Maastricht
91
Learning high-level features
  • ICA gives linear features in images
  • Extensions of ICA give features in 2nd layer
  • We estimate third layer by ICA of outputs of 2nd
    layer(submitted ms.)

92
Learning segmentation (1)
  • A new principle for multivariate data analysis
  • Given very high-dimensional random vector
  • Can we partition variables in each observation
  • Based on the statistical structure
  • With no prior knowledge of segments
  • Visual system has learned this for its input

93
Learning segmentation (2)
  • Use correlations to find out which variables
    belong together (submitted ms.)
  • Basic idea each segment should be such that
    observed variables follow typical correlation
    structure
  • Then, we can segment even data that has a weird
    correlation structure

94
New kinds of feature extraction
  • We can try to find features that characterize
    whole images (Proc Int Conf Pattern Recogn 2004)
  • Compute histograms of low-level features
  • Analyze these histograms, e.g. by ICA
  • Features from natural language (text) data
    (Proc. Int. Joint Conf Neural Netw 2004)
  • Compute the context histograms of words (which
    words are typically together)
  • Perform e.g. ICA on these histograms

In cooperation with HUT and FDK
95
Exploration of causality
(Proc. Factor Analysis Cent. Symp. 2004)
  • Classic methods can say x and y are correlated,
    but which causes which?
  • Using nongaussianity, we can find causal
    ordering
  • Closely related to ICA estimation an example of
    a post-processing method.

In cooperation with University of Osaka
96
Blind source separation
  • Separation of underlying sources, e.g. in brain
    activity
  • ICA
  • Sources are independent
  • No time structure
  • We developed methods which
  • are able to separate dependent sources (Signal
    Processing, 2004)
  • Utilize time structure (Sig Proc, in press)

97
Classification images
  • Estimation of templates used in the human visual
    system by linear regression
  • We attempt to develop nonlinear versions
  • Basic approach changes in linear templates as a
    function of context

In cooperation with Dept of Psychology, UH
98
Non-negative sparse representations
  • Non-negative matrix factorization (NMF) claimed
    to give local, parts-based representations
  • Locality much better achieved when combined with
    sparseness (J. Mach. Learn. Res, in press)

99
Future questions
  • Connection between segmentation and independent
    components
  • Further post-processing methods for ICA
  • Classification methods and ICA
  • General multilayer models for natural images
  • Nonlinear ICA
  • Estimation of complex statistical models

100
Adaptive Computing
  • Dr. Patrik Floréen

101
Premises of the research
  • Adaptive computing refers to solutions that adapt
    to their environment
  • Linked to the ubiquitous / pervasive / proactive
    computing vision
  • We focus on some central topics to realise this
    vision
  • Context-awareness and adaptation is central to
    user-friendly ubiquitous applications and ad hoc
    networking (incl. sensor networks) may in the
    future provide infrastructure for many ubiquitous
    applications

102
Our Research Environment
  • Draws on existing competence in data mining,
    probabilistic reasoning, algorithmics and
    language technology
  • At the intersection of many of the research
    groups of HIIT many of our research groups deal
    with context-awareness, personalisation and
    adaptation
  • This presentation is about the AC groups at BRU
  • Group of Prof. Hannu Toivonen (Kari Laasonen,
    Renaud Petit, Mika Raento)
  • Group of Doc. Patrik Floréen (Greger Lindén,
    Jukka Kohonen, Yevgeniya Kulikova, Petteri Nurmi,
    Michael Przybilski, Jukka Suomela)
  • Small groups, short history (2003-)

103
Present Research Issues (1/2)
  • Context analysis
  • Analysis of context information and its use in
    proactive adaptivity on mobile devices, in
    particular recognising and predicting locations
    under limited resources CONTEXT Toivonen,
    Laasonen, Petit, Raento
  • Reasoning about (mobile) context using data
    mining and machine learning techniques, e.g.
    segmentation, time series analysis MobiLife
    Floréen, Nurmi, Przybilski, Raento, Suomela

104
Present Research Issues (2/2)
  • Architectural issues for context-aware systems
  • Context-aware selection of software components on
    mobile terminals with an architecture solution
    based on a blackboard approach Space4U
    Floréen, Przybilski
  • Context Management Framework a software
    architecture for future context-aware mobile
    systems MobiLife Floréen, Nurmi, Przybilski
  • Ad hoc and sensor networks
  • Topology control and routing problems under
    energy constrains
  • Self-organisation of ad hoc networks using a
    game-theoretical approach
  • NAPS Floréen, Kohonen, Nurmi

105
Summary of Ongoing Projects
  • Group of Toivonen
  • CONTEXT Academy of Finland, 11/02-12/05, with
    ARU
  • Group of Floréen
  • NAPS, Academy of Finland, 01/03-12/05, with HUT
  • Space4U EUREKA/ITEA, Nokia subcontract,
    07/03-06/05, also HUT
  • MobiLife EU IST IP, Nokia coordinator,
    09/04-12/06, also ARU
  • In addition
  • PROACT coordination, Academy of Finland,
    01/02-05/06 Programme Director Heikki Mannila,
    Programme Coordinator Greger Lindén

106
Highlights of Recent Achievements CONTEXT
  • The software developed for location recognition
    and prediction is published as open source and is
    used by other research groups and for presence
    service and annotation of photographs

107
Annotation of Photographs
108
Highlights of Recent Achievements NAPS
  • NP-hardness results and algorithms for maximizing
    multicast lifetime under energy constraints, by
    dynamically choosing transmission power levels
  • Modelling of routing in ad hoc networks using
    dynamic Bayesian games Nurmi
  • Balanced data gathering in sensor networks,
    including an approximation algorithm based on the
    Garg Könemann fractional packing approximation
    algorithm (FOCS98)
  • Energy limited sensor nodes
  • Utility function F? (1-?) avgi?S qi ? mini?S
    qi
  • 36 randomly placed sensors in figures that follow

109
No balancing (?0)
Maximizing F0 avg qi
110
Strict balancing (?1)
Maximizing F1 min qi
111
Moderate balancing (?0.5)
Maximizing F0.5 0.5 avg qi 0.5 min qi
112
Future Directions
  • Emphasis more on continuing present topics than
    on enlarging to new areas
  • Successful present principles to be continued
  • Diverse funding sources
  • Theory and practical implementation together
  • There is potential for developing the activities
    through
  • even more collaboration with other groups (inside
    and outside of HIIT)
  • TEKES projects
  • attention to recruitment of postdocs

113
Future Research Issues
  • Context reasoning and the use of context
  • Combining context-awareness and component
    architectures
  • Trust and privacy issues of the users in
    context-aware applications
  • Modelling and algorithms for topology control and
    routing in ad hoc networks and data gathering in
    sensor networks
  • Application of game theory to problems in ad hoc
    networking and context-aware computing
Write a Comment
User Comments (0)
About PowerShow.com