Roberto Pereira, Miguel Erazo - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Roberto Pereira, Miguel Erazo

Description:

... large-scale Grid application performance evaluation platform based on PRIME ... We compute the memory time taking into account the number of accesses due to L2 ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 34
Provided by: luo6
Category:

less

Transcript and Presenter's Notes

Title: Roberto Pereira, Miguel Erazo


1
PRIME/GreenLight project Progress Report
  • Roberto Pereira, Miguel Erazo
  • Florida International University

December 2009
2
Outline
  • Motivation and Objectives
  • PRIME overview
  • Installation
  • Methodology
  • Future work

3
Motivation and Objectives
4
Motivation
  • The information technology industry consumes as
    much energy and has roughly the same carbon
    footprint as the airline industry
  • Every dollar spent on power for IT equipment
    requires that another dollar be spent on cooling

5
Objectives
  • Provide the scientific community useful
    guidelines regarding the energy consumption of
    distributed simulations/emulations of network
    models
  • Develop a large-scale Grid application
    performance evaluation platform based on PRIME

6
PRIME overview
7
The PRIME network simulator
  • Simulator /Emulator of computer networks based
    on the SSF specification
  • Able to simulate from tens of thousand to
    millions of nodes
  • Emulation is supported via OpenVPN
  • Distributed simulation/emulation supported
    through MPI

8
The PRIME network simulator
Network model
Emulation infrastructure
Distributed simulation
9
A specific deployment
The network model topology, traffic, and
applications
Define alignments, partition the network and map
to physical machines
10
Installation
11
Platform
  • PRIME installed in Lincoln, Abe and QueenBee in
    Teragrid
  • Simple network models run using PBS scheduler
  • A number of useful tools were used and tested,
    i.e. Perfsuite

12
Perfsuite
  • Collection of tools, utilities, and libraries for
    software performance analysis
  • Uses the Performance Application Programming
    Interface (PAPI)
  • Installed in Abe and QueenBee

13
Utilities
  • psrun is used to gather hardware performance
    information
  • psprocess is used to post-process the results of
    a performance analysis experiment

14
Methodology
15
The approach
  • Measure the time that an application, i.e. PRIME,
    uses each computing resource and then derive the
    energy consumption by extracting from the
    specifications the power signature of each these
    resources

16
CPU
  • We use Perfsuite for measuring CPU time
  • We consider two states for the CPU

17
Memory
Basic block diagram of a CPU
CPU
18
Memory
  • When There is a cache miss 2 things happen
  • 1 )The data requested by the CPU is fetched.
  • 2) There is also a pre-fetch.

19
Memory
  • If data/instructions are not found in caches, the
    main memory is accessed.
  • The PAPI event PAPI_PRF_DM (Data prefetch cache
    misses) is not available in the infrastructure
    provided by Abe in Teragrid
  • We compute the memory time taking into account
    the number of accesses due to L2 cache misses
    only

20
Memory
  • We will be Using Synchronous DDR2 DRAM at 667MHz
    with internal array cells of 8 bits.

21
Memory
  • Second generation of DDR, improvement in bus
    width.

22
Memory
  • Array cells of 8 bits.
  • Dual Data Rate, transmits twice per cycle.
  • Second generation, bus width of 4.
  • Data per access (bits) (bus width) (clock
    multiplier).
  • 64 bits in our case.

23
Memory
3
2
5
4
1
1) The correct row is activated.
2) Delay between row activation and column
activation (tRCD).
3) The correct column is activated.
4) The data is retrieved from the array (CL).
5) The data is sent to the memory controller
(tDPD).
24
Memory
  • The manufacturers bandwidth assumes the best
    case, so we will need to make a more accurate
    approximation.
  • We use the Total Access Time Address Transport
    Time, the Data Access Time, and the Data
    Transport Time
  • The memory is Synchronous so the Address
    Transport time equals a clock cycle.

25
Memory
  • tRCD Is the Row to Column access Delay.
  • CL is the Column Access time. (Clock cycles)
  • tAC Is the minimum Access time.
  • tDPD Is the Data Propagation Delay.
  • BMM is number of subsequent accesses in burst
    mode.

26
Disk
  • For the Hard disk drive we will use the Internal
    Sustained Transfer Rate (ISTR).
  • ISTR depends on the track the files are located.
  • The transfer is slower is the files are
    fragmented.

27
Disk
  • Outer tracks have more
  • sectors per track.
  • We will approximate an average position.
  • ISTR optimal for files in
  • adjacent tracks and sectors.

28
Disk
  • We will use the command pidstat from SYSSTAT.
  • Includes page faults, cache misses and direct
    accesses.
  • With the total number of bytes read/written and
    the Internal Sustained Transfer Rate we can
    calculate the total time.

29
Future work
30
Future activities cont.
  • Find a suitable methodology for approximating the
    energy consumption of the network
  • Pick a network model to be used for the
    experiments
  • Run the experiments on Teragrid

31
Future activities cont.
  • Process results
  • Compose the paper

32
Timeline
33
References
  • 1 Kansal, A., and Zhao, F. "Fine-grained energy
    profiling for power-aware application design" In
    Workshop on Hot Topics in Measurement and
    Modeling of Computer Systems (2008)KANSAL, A.,
    AND ZHAO, F.
  • 2 X. Feng, R. Ge, and K. Cameron, "Power and
    energy profiling of scientific applications on
    distributed systems" Proc. 19th Intl Parallel
    Distributed Processing Symp. (IPDPS 05), Apr.
    2005.
  • 3 R. Joseph and M. Martonosi, "Run-time Power
    Estimation in High Performance Microprocessors"
    Proceedings of the 2001 international symposium
    on Low power electronics and Design (ISLPED01)
    2001
  • 4 V. Shnayder, M. Hempstead, B. rong Chen, G.
    Werner-Allen, and M. Welsh, Simulating the power
    consumption of large-scale sensor network
    applications, in Proceedings of the Second ACM
    Conference on Embedded Networked Systems (SenSys?
    ), Nov. 2004.
  • 5 R. Jain, D. Molnar, and Z. Ramzan, "Towards
    understanding algorithmic factors affecting
    energy consumption switching complexity,
    randomness, and preliminary experiments" In Proc.
    of the 2005 joint workshop on foundations of
    mobile computing, pages 7079. ACM, 2005.
  • 6 F. Bellosa, "The Benefits of Event-Driven
    Accounting in Power-Sensitive Systems". In
    Proceedings of the SIGOPS European Workshop,
    September 2000.
  • 7 Perfsuite http//perfsuite.ncsa.uiuc.edu/
  • 8 PAPI http//icl.cs.utk.edu/papi/
  • 9 SYSSTAT http//pagesperso-orange.fr/sebastien.
    godard/
  • 10 G. Torres, "Understanding RAM Timings"
    http//www.hardwaresecrets.com/article/26/
  • 11 Kingston Memory Module Specification
    KVR667D2D8F5?
  • 12 DDR2 http//www.hardwaresecrets.com/article/1
    67 and 10
  • 13 SDRAM latency http//en.wikipedia.org/wiki/SD
    RAM_latency
  • 14 CAS Latency (page 200) http//books.google.co
    m/books?idHLpTtLjEXqcClpgPA200otsAMDTH6D5HUd
    qSDRAM2020latency20formulapgPA200vonepage
    qftrue
  • 15 Calculating SDRAM cache-line-fill latency
    http//www.dewassoc.com/performance/memory/hampel_
    rambus.htm
  • 16 DRAM Normal Access Mode http//ieeexplore.iee
    e.org/stamp/stamp.jsp?tparnumber332332isnumber
    7848
  • 17 DRAM Operation http//www.ece.cmu.edu/ece548
    /localcpy/dramop.pdf
  • 18 DRAM Specifications http//www.cs.albany.edu/
    sdc/CSI404/dramperf.pdf
  • 19 Hard Disk Performance http//www.storagerevie
    w.com/guide2000/ref/hdd/perf/perf/spec/index.html
Write a Comment
User Comments (0)
About PowerShow.com