Roberto Pereira, Miguel Erazo presentation

About This Presentation

Transcript and Presenter's Notes

Title: Roberto Pereira, Miguel Erazo

1
PRIME/GreenLight project Progress Report

Roberto Pereira, Miguel Erazo
Florida International University

December 2009
2
Outline

Motivation and Objectives
PRIME overview
Installation
Methodology
Future work

3
Motivation and Objectives
4
Motivation

The information technology industry consumes as
much energy and has roughly the same carbon
footprint as the airline industry
Every dollar spent on power for IT equipment
requires that another dollar be spent on cooling

5
Objectives

Provide the scientific community useful
guidelines regarding the energy consumption of
distributed simulations/emulations of network
models
Develop a large-scale Grid application
performance evaluation platform based on PRIME

6
PRIME overview
7
The PRIME network simulator

Simulator /Emulator of computer networks based
on the SSF specification

Able to simulate from tens of thousand to
millions of nodes

Emulation is supported via OpenVPN

Distributed simulation/emulation supported
through MPI

8
The PRIME network simulator
Network model
Emulation infrastructure
Distributed simulation
9
A specific deployment
The network model topology, traffic, and
applications
Define alignments, partition the network and map
to physical machines
10
Installation
11
Platform

PRIME installed in Lincoln, Abe and QueenBee in
Teragrid
Simple network models run using PBS scheduler
A number of useful tools were used and tested,
i.e. Perfsuite

12
Perfsuite

Collection of tools, utilities, and libraries for
software performance analysis
Uses the Performance Application Programming
Interface (PAPI)
Installed in Abe and QueenBee

13
Utilities

psrun is used to gather hardware performance
information
psprocess is used to post-process the results of
a performance analysis experiment

14
Methodology
15
The approach

Measure the time that an application, i.e. PRIME,
uses each computing resource and then derive the
energy consumption by extracting from the
specifications the power signature of each these
resources

16
CPU

We use Perfsuite for measuring CPU time
We consider two states for the CPU

17
Memory
Basic block diagram of a CPU
CPU
18
Memory

When There is a cache miss 2 things happen
1 )The data requested by the CPU is fetched.
2) There is also a pre-fetch.

19
Memory

If data/instructions are not found in caches, the
main memory is accessed.
The PAPI event PAPI_PRF_DM (Data prefetch cache
misses) is not available in the infrastructure
provided by Abe in Teragrid
We compute the memory time taking into account
the number of accesses due to L2 cache misses
only

20
Memory

We will be Using Synchronous DDR2 DRAM at 667MHz
with internal array cells of 8 bits.

21
Memory

Second generation of DDR, improvement in bus
width.

22
Memory

Array cells of 8 bits.
Dual Data Rate, transmits twice per cycle.
Second generation, bus width of 4.
Data per access (bits) (bus width) (clock
multiplier).
64 bits in our case.

23
Memory
3
2
5
4
1
1) The correct row is activated.
2) Delay between row activation and column
activation (tRCD).
3) The correct column is activated.
4) The data is retrieved from the array (CL).
5) The data is sent to the memory controller
(tDPD).
24
Memory

The manufacturers bandwidth assumes the best
case, so we will need to make a more accurate
approximation.
We use the Total Access Time Address Transport
Time, the Data Access Time, and the Data
Transport Time
The memory is Synchronous so the Address
Transport time equals a clock cycle.

25
Memory

tRCD Is the Row to Column access Delay.
CL is the Column Access time. (Clock cycles)
tAC Is the minimum Access time.

tDPD Is the Data Propagation Delay.
BMM is number of subsequent accesses in burst
mode.

26
Disk

For the Hard disk drive we will use the Internal
Sustained Transfer Rate (ISTR).
ISTR depends on the track the files are located.
The transfer is slower is the files are
fragmented.

27
Disk

Outer tracks have more
sectors per track.
We will approximate an average position.
ISTR optimal for files in
adjacent tracks and sectors.

28
Disk

We will use the command pidstat from SYSSTAT.
Includes page faults, cache misses and direct
accesses.
With the total number of bytes read/written and
the Internal Sustained Transfer Rate we can
calculate the total time.

29
Future work
30
Future activities cont.

Find a suitable methodology for approximating the
energy consumption of the network
Pick a network model to be used for the
experiments
Run the experiments on Teragrid

31
Future activities cont.

Process results
Compose the paper

32
Timeline
33
References

1 Kansal, A., and Zhao, F. "Fine-grained energy
profiling for power-aware application design" In
Workshop on Hot Topics in Measurement and
Modeling of Computer Systems (2008)KANSAL, A.,
AND ZHAO, F.
2 X. Feng, R. Ge, and K. Cameron, "Power and
energy profiling of scientific applications on
distributed systems" Proc. 19th Intl Parallel
Distributed Processing Symp. (IPDPS 05), Apr.
2005.
3 R. Joseph and M. Martonosi, "Run-time Power
Estimation in High Performance Microprocessors"
Proceedings of the 2001 international symposium
on Low power electronics and Design (ISLPED01)
2001
4 V. Shnayder, M. Hempstead, B. rong Chen, G.
Werner-Allen, and M. Welsh, Simulating the power
consumption of large-scale sensor network
applications, in Proceedings of the Second ACM
Conference on Embedded Networked Systems (SenSys?
), Nov. 2004.
5 R. Jain, D. Molnar, and Z. Ramzan, "Towards
understanding algorithmic factors affecting
energy consumption switching complexity,
randomness, and preliminary experiments" In Proc.
of the 2005 joint workshop on foundations of
mobile computing, pages 7079. ACM, 2005.
6 F. Bellosa, "The Benefits of Event-Driven
Accounting in Power-Sensitive Systems". In
Proceedings of the SIGOPS European Workshop,
September 2000.
7 Perfsuite http//perfsuite.ncsa.uiuc.edu/
8 PAPI http//icl.cs.utk.edu/papi/
9 SYSSTAT http//pagesperso-orange.fr/sebastien.
godard/
10 G. Torres, "Understanding RAM Timings"
http//www.hardwaresecrets.com/article/26/
11 Kingston Memory Module Specification
KVR667D2D8F5?
12 DDR2 http//www.hardwaresecrets.com/article/1
67 and 10
13 SDRAM latency http//en.wikipedia.org/wiki/SD
RAM_latency
14 CAS Latency (page 200) http//books.google.co
m/books?idHLpTtLjEXqcClpgPA200otsAMDTH6D5HUd
qSDRAM2020latency20formulapgPA200vonepage
qftrue
15 Calculating SDRAM cache-line-fill latency
http//www.dewassoc.com/performance/memory/hampel_
rambus.htm
16 DRAM Normal Access Mode http//ieeexplore.iee
e.org/stamp/stamp.jsp?tparnumber332332isnumber
7848
17 DRAM Operation http//www.ece.cmu.edu/ece548
/localcpy/dramop.pdf
18 DRAM Specifications http//www.cs.albany.edu/
sdc/CSI404/dramperf.pdf
19 Hard Disk Performance http//www.storagerevie
w.com/guide2000/ref/hdd/perf/perf/spec/index.html

Write a Comment

User Comments (0)

About PowerShow.com

Roberto Pereira, Miguel Erazo PowerPoint PPT Presentation