Grid computing performance prediction based in historical information - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Grid computing performance prediction based in historical information

Description:

Challenges to get through: 1- Models built from data semantically diverse. Problems ... Challenges to get through: 2- Data learning models reflecting stochastic ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 12
Provided by: thie47
Category:

less

Transcript and Presenter's Notes

Title: Grid computing performance prediction based in historical information


1
Grid computing performance prediction based in
historical information
WP6 Resource Management and Scheduling Task 8
Performance Prediction
  • Francesc Guim
  • Computer Architecture Department
  • Universitat Politècnica de Catalunya
  • fguim_at_ac.upc.edu
  • Ariel Goyeneche
  • Centre for Parallel Computers
  • University of Westminster
  • goyenea_at_wmin.ac.uk
  • Coregrid Integration Workshop, Pisa 28th-30th
    November 2005.
  • http//www.coregrid.net

2
Performance in Grid Computing
  • Grid Computing Coordinated resource sharing and
    problem solving in dynamic, multi-institutional
    virtual organizations (Sharing is not primarily
    file exchange, but rather access to computers,
    software, data, and other resources, as is
    required by a range of collaborative problem)
  • Problems in mapping traditional performance
    analysis to Grid computing
  • Machine specific
  • Source code
  • Extremely slow and memory intensive restrictions
  • Non-dynamic analysis for supporting heterogeneous
    and not reliable environments
  • Social aspects (Time, days, Holidays, etc)
  • Intrusive and security aspects
  • Proposed solution
  • We suggest that the historical data related to
    applications, resources and users can provide an
    adequate amount of information for modeling and
    predicting Grid components behaviors.

3
Challenges to get through 1- Models built from
data semantically diverse
  • Problems
  • Internal structure and query interfaces of
    different VO and center are different.
  • The provided data is semantically diverse and
    heterogeneous.
  • Solutions
  • Use a query-centric system for knowledge
    acquisition from distributed semantically
    heterogeneous data sources (that may employs
    ontologies (See INDUS)) to provide a collection
    of such data sources as though they were a
    collection of tables structured according to
    those rules supplied.
  • Issues to tread
  • Definition of such set of types, properties, and
    relationship types.

4
Challenges to get through 2- Data learning
models reflecting stochastic values
  • Problems
  • If the solution uses conventional point-valued
    performance parameters and prediction models the
    solution may be inaccurate since they can only
    represent one point in a range of possible
    behaviours .
  • Solutions
  • Try to characterize application and system
    through a set of possible values and their
    probabilities (stochastic values ) (See
    Performance prediction in production
    environments)
  • Issues to tread
  • Calculation of stochastic values corrected from
    comparison prediction performance vs. real
    measurements , is there any balance to find the
    best probability distribution for a given value?

5
Challenges to get through 3- Prediction
mechanisms
  • Problems
  • How to use the normalized and pondered data to
    predict performance behaviours?.
  • Solutions
  • Intuition and previous works (see ref in paper)
    indicates that similar applications are more
    likely to have similar run times than application
    that do not have nothing in common
  • Issues to tread
  • How do we define similar?
  • Hints
  • Define good templates for a particular work-loads
    (ANL)
  • Transforming those templates into more
    sophisticated dynamic services and resource
    models
  • Add potential social aspect .

6
The big picture
VO1
GS
GS
GS
GS
GS
GS
GS
GS
GS
GS
VO2
GS
GS
GS
GS
GS
GS
GS
VO3
GS
GS
GS
GS
GS
GS
GS
GS
7
First approach
  • We focused on the three challenges explained
    before in the simplest scenario ? one VO
  • We studied the workloads of the UPC system
  • It was already unified (challenge 1)
  • We tried to represent the workload variables with
    statistic descriptors and study their
    characteristics (challenge 2)
  • We implement a simple historical predictors based
    on the previous work (challenge 3)
  • Using statistical estimators
  • Mean and standard deviation, Median and
    Interquartile difference and trimmed mean
  • Using last value.

8
Our predictor example
  • The input predictor variables are
  • User, group, number of requested processes,
    submitted executable.
  • The output prediction variables are
  • Total time, user time, system time, memory used
  • Predictors base their predictions on
  • The historical information for the given user,
    executable and number of tasks

9
Result
  • We obtained good results
  • 35 of all the applications can be predicted
    90-100 of their executions.
  • 30 of all the applications can be predicted
    40-60 of their executions.
  • The rest of applications
  • Do not have clear patterns neither relations.
  • However predictors have to be still improved

10
Next steps
  • Studying information coming from
  • National Grid Service (NGS) UK
  • UPC
  • In order to solve problems related to
  • Heterogeneous internal structure and semantically
    diverse
  • Identifying relevant suitable workload variables
    to be used in prediction.
  • New predictor based on previous information.

11
Questions?
  • You can contact us on
  • Francesc Guim, fguim_at_ac.upc.edu,
    http//francesc.guim.net
  • Ariel Goyeneche, goyenea_at_wmin.ac.uk,
    http//www.cscs.wmin.ac.uk/goyenea/.
Write a Comment
User Comments (0)
About PowerShow.com