Prsentation PowerPoint - PowerPoint PPT Presentation

About This Presentation
Title:

Prsentation PowerPoint

Description:

Huge amounts of data must be analysed by the users who are not specialists in ... Data to be analysed must be retrieved from the operational databases and quickly ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 18
Provided by: zk47
Category:

less

Transcript and Presenter's Notes

Title: Prsentation PowerPoint


1
Using R as enterprise-wide data analysis
platform
Zivan Karaman
2
Limagrain
  • Our profession improvement and valorization of
    plants
  • Our mission innovate in order to create
    varieties that meet the expectations of farmers,
    market gardeners, industrialists and consumers

FIELD SEEDS
VEGETABLE SEEDS AND GARDEN PRODUCTS
CEREAL INGREDIENTS AND BAKERY PRODUCTS
3
Limagrain research
Annual budget 102 million
12 of professional sales
1 200 researchers
73 research centers
Europe 41 centers
Asia Pacific 10 centers
4
Context
  • Plant breeding aims at creating new varieties
    stable forms with desirable agronomic properties
    - from the existing genetic diversity. It is a
    long and resource-consuming activity.
  • Many field trials and laboratory experiments are
    needed to evaluate the tested plant material
  • Huge amounts of data must be analysed by the
    users who are not specialists in statistics
    computing
  • and it must be done quickly!

5
Needs
  • Data to be analysed must be retrieved from the
    operational databases and quickly processed
  • Most end users are geographically dispersed with
    no local support for data analysis
  • Some types of analysis require long and complex
    computations
  • client/server architecture with computations
    being done on the server side (minimise WAN
    traffic) Web interface to routine analyses
  • but
  • Some users need (much) more flexibility
  • and we all want to use the same tool

6
Users
  • End users
  • occasional routine analyses
  • ease of use/GUI (Web interface)
  • Power users
  • regular more flexible, interactive analyses
  • ease of use/GUI (desktop application)
  • Developers
  • develop tools for the users
  • software engineering tools (IDE, source code
    mgt.)
  • Expert users (statisticians)
  • develop test new statistical methodology
  • require flexible programming language

7
Requirements
  • Rich function set for statistical data analysis
    and flexible graphics
  • Possibility to extend the built-in functions
  • Database connectivity and access to file system
  • Integration with other software
  • Handling large problems (upsizing)
  • Capacity to build user-friendly interfaces (GUI)
  • Capacity to be used over the Web (server)
  • Standard software development tools
  • Ease of deployment

8
Rich function set extendibility
  • R programming environment is an invitation to
    explore the data and create own functions the
    only limit being users imagination
  • R provides rich set of functions for statistical
    data analysis and extremely flexible graphics
    capabilities
  • limited built-in support for interactive graphics
    (linked views) - is Rggobi the way to go?
  • Graphlets - useful S-PLUS feature that we miss

9
Database connectivity file system
  • Database access
  • RODBC provides a wide range of possibilities,
    including access to Excel files
  • cant handle multiple result set queries (list of
    data frames), which would be helpful
  • File system access
  • excellent set of functions for accessing local
    files system and even the files over the internet
  • can handle zip files, but
  • full support for zip-file management (create,
    list contents, add/remove files, etc.) would be
    nice

10
Integration with other software
  • R provides excellent built-in support for
    integrating existing Fortran or C code
  • Communication protocols exist for directly
    integrating R with Java and other software, both
    as client and server
  • On , any COM compliant software can
    be used to drive R (GUI front-end, for example)
  • Finally, through the rich set of functions for
    accessing operating system files and possibility
    to invoke system shell, any program that can read
    and write text files in the batch mode can be
    easily interfaced with R

11
Upsizing
  • Microsoft Windows is our common platform
  • Some problems require more than 4 Gb of memory
    that standard Windows can manage
  • We hope to be able to handle them on 64-bits
    Linux
  • R code can be painlessly moved from 32-bits
    Windows to 64-bits Linux (can it?), providing a
    straightforward way for upsizing
  • Long-running simulations several R packages
    provide support for parallel computing

12
User-friendly interfaces
  • Several GUI toolkits are available as add-on
    packages
  • Providing a standard set of tools for building
    user interfaces as a part of the core
    distribution would be very helpful
  • Common data analysis functionscould be
    implemented through this standard GUI toolkit
    (like in GenStat or S-PLUS)
  • Another way is to use excellent integration
    capabilities of R to develop user interface in
    Java, VB, or other tool but this requires
    resorting to another, completely different
    programming language

13
Web server
  • Several implementation of R Web servers are
    available
  • They use different technologies, and offer
    different sets of functionalities
  • We have in-house built Web portal and distributed
    computing platform that is currently using
    S-PLUS Server from Insightful
  • We plan to integrate R using the R/DCOM interface
  • Having a feature like Insightful Graphlets would
    allow us to implement some user interaction in
    the Web application

14
Software development tools
  • IDE
  • Tinn-R on Windows
  • StatET Eclipse plug-in
  • why not provide a standard IDE (probably
    Eclipse-based) as a part of the core
    distribution?
  • Debugger, profiler
  • good tools are available
  • integration with IDE (graphical debugging)
  • Source code management
  • subversion
  • integration with IDE

15
Deployment
  • Keeping users computers with up to date versions
    of software is system administrators nightmare
  • R package installation/update system provides
    everything one would ever need to keep an R-based
    software up and running!

16
Conclusions
  • R provides an excellent platform for delivering
    data analytical functions enterprise-wide
  • broad range of statistical methods included
  • highly flexible graphics
  • ease of extending existing code
  • great database and file system connectivity
  • built-in facilities for package updates
  • Possible improvements
  • include standard, multi-platform IDE and (at
    least) some form of GUI toolkit in core
    distribution

17
Thank you for your attention
Write a Comment
User Comments (0)
About PowerShow.com