An Overview of X10 2.0 David Grove, Igor Peshansky, Vijay Saraswat IBM Research http://x10-lang.org - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

An Overview of X10 2.0 David Grove, Igor Peshansky, Vijay Saraswat IBM Research http://x10-lang.org

Description:

{ const epsilon = 1.0e-12; val fun:(double)= double; ... if (Math.abs(alr a) epsilon) return alr; val resHolder = new resHolder ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 42

Provided by: distCo

Category:

more less

Transcript and Presenter's Notes

Title: An Overview of X10 2.0 David Grove, Igor Peshansky, Vijay Saraswat IBM Research http://x10-lang.org

1
An Overview of X10 2.0David Grove, Igor
Peshansky, Vijay SaraswatIBM Researchhttp//x10
-lang.org

SC 2009 PGAS Languages Tutorial

Based on material from previous X10 Tutorials by
Christoph von Praun, Vivek Sarkar, Nate
Nystrom This material is based upon work
supported in part by the Defense Advanced
Research Projects Agency under its Agreement No.
HR0011-07-9-0002. Please see x10-lang.org for
the most up-to-date version of these slides and
sample programs.
2
X10 Tutorial Overview

Why X10?
X10 By Example
X10 2.0 in a Nutshell
X10 Implementation/Tool Chain
Core Sequential Language
Concurrency
Distribution
Arrays
X10DT Overview/Demo
Extended example variations on a 2D-stencil
Conclusions

3
What is X10?

X10 is a new language developed in the IBM PERCS
project as part of the DARPA program on High
Productivity Computing Systems (HPCS)
X10 is an instance of the APGAS framework in the
Java family
X10
Is more productive than current models
Can support high levels of abstraction
Can exploit multiple levels of parallelism and
non-uniform data access
Is suitable for multiple architectures, and
multiple workloads.

4
Language goals

Simple
Start with a well-accepted programming model,
build on strong technical foundations, add few
core constructs
Safe
Eliminate possibility of errors by design, and
through static checking
Powerful
Permit easy expression of high-level idioms
And permit expression of high-performance programs

Scalable
Support high-end computing with millions of
concurrent tasks
Universal
Present one core programming model to abstract
from the current plethora of architectures.

5
(No Transcript)
6
(No Transcript)
7
Parallel HelloWorld
import x10.io.Console class HelloWorldPar
public static def main(argsRailString)void
finish ateach (p in Dist.makeUnique())
Console.OUT.println("Hello World from Place"
p) (1) x10c -o HelloWorldPar -O
HelloWorldPar.x10 (2) mpirun -n 4
HelloWorldPar Hello World from Place(0) Hello
World from Place(2) Hello World from
Place(3) Hello World from Place(1) (3)
8
Integration via Guassian Quadrature
class Integrate const epsilon 1.0e-12
val fun(double)gtdouble static final class
resHolder var valuedouble def
computeArea(leftdouble, rightdouble)
return recEval(left, fun(left), right,
fun(right), 0) def recEval(ldouble,
fldouble, rdouble, frdouble, adouble)
val h (r l) / 2 val hh h / 2
val c l h val fc fun(c)
val al (fl fc) hh val ar (fr fc)
hh val alr al ar if (Math.abs(alr
a) lt epsilon) return alr val resHolder new
resHolder() var expr2double 0 finish
async resHolder.value recEval(c, fc,
r, fr, ar) expr2 recEval(l, fl, c,
fc, al) return resHolder.value
expr2
9
X10 Tutorial Overview

Why X10?
X10 By Example
X10 2.0 in a Nutshell
X10 Implementation/Tool Chain
Core Sequential Language
Concurrency
Distribution
Arrays
X10DT Overview/Demo
Extended example variations on a 2D-stencil
Conclusions

10
(No Transcript)
11
X10 Project Status

X10 is an open source project (Eclipse Public
License)
Documentation, releases, mailing lists, code,
etc. all publicly available via
http//x10-lang.org
XRX X10 Runtime in X10 (14kloc and growing)
X10 1.7.x releases throughout 2009 (Java C)
X10 2.0 released November 6, 2009
Java any platform with Java 5 Single process
(all places in 1 JVM)
C Multi-process (1 place per process)
aix, linux, cygwin, solaris
x86, x86_64, PowerPC, Sparc
x10rt APGAS runtime (binary only) or MPI (open
source)

12
Overview of Features

Many sequential features of Java inherited
unchanged
Classes (w/ single inheritance)
Interfaces, (w/ multiple inheritance)
Instance and static fields
Constructors, (static) initializers
Overloaded, over-rideable methods
Garbage collection
Structs
Closures
Points, Regions, Distributions, Arrays

Substantial extensions to the type system
Dependent types
Generic types
Function types
Type definitions, inference
Concurrency
Fine-grained concurrency
async (p,l) S
Atomicity
atomic (s)
Ordering
L finish S
Data-dependent synchronization
when (c) S

13
Classes

Classes
Single inheritance, multiple interfaces
May have mutable instance fields
Values of class types may be null
Heap allocated
Distributed Object Model
Remote references with global identity
Rooted state lives in place where object was
created
Global state
programmer specified subset of immutable state
serialized with object available anywhere that
has remote ref
methods may be global as well (access only global
state)

Global/Rooted new in X10 2.0
14
Structs

User defined primitives
No inheritance
May implement interfaces
All fields are final
All methods are final
Allocated inline in containing
object/array/variable
Headerless
Instances of structs may be freely copied from
place to place

struct Complex val realdouble val img
double def this(rdouble, idouble)
real r img i def operator
(thatComplex) return Complex(real
that.real, img that.img)
.... val x ArrayComplex(Dist).make
New in X10 2.0
15
Points and Regions

A point is an element of an n-dimensional
Cartesian space (ngt1) with integer-valued
coordinates e.g., 5, 1, 2,
A point variable can hold values of different
ranks e.g.,
var p Point 1 p 2,3 ...
Operations
p1.rank
returns rank of point p1
p1(i)
returns element (i mod p1.rank) ifi lt 0 or i gt
p1.rank
p1 lt p2, p1 lt p2, p1 gt p2, p1 gt p2
returns true iff p1 is lexicographically lt, lt,
gt, or gt p2
only defined when p1.rank and p2.rank are equal

Regions are collections of points of the same
dimension
Rectangular regions have a simple representation,
e.g. 1..10, 3..40
Rich algebra over regions is provided

16
Distributions and Arrays

Distributions specify mapping of points in a
region to places
E.g. Dist.makeBlock(R)
E.g. Dist.makeUnique()
Arrays are defined over a distribution and a base
type
AArrayT
AArrayT(d)
Arrays are created through initializers
Array.makeT(d, init)
Arrays are mutable (considering immutable arrays)

Array operations
A.rank dimensions in array
A.region index region (domain) of array
A.dist distribution of array A
A(p) element at point p, where p belongs to
A.region
A(R) restriction of array onto region R
Useful for extracting subarrays

17
Generic classes

Classes and interfaces may have type parameters
class RailT
Defines a type constructor Rail
and a family of types Railint, RailString,
RailObject, RailC, ...
RailC as if Rail class is copied and C
substituted for T
Can instantiate on any type, including primitives
(e.g., int)

public abstract value class RailT (length
int) implements Indexableint,T,
Settableint,T private native def this(n
int) RailTlengthn public native def
get(i int) T public native def apply(i
int) T public native def set(v T, i int)
void
18
Dependent Types

Classes have properties
public final instance fields
class Region(rank int, zeroBased boolean, rect
boolean) ...
Can constrain properties with a boolean
expression
Regionrank3
type of all regions with rank 3
ArrayintregionR
type of all arrays defined over region R
R must be a constant or a final variable in scope
at the type

Dependent types are checked statically.
Dependent types used to statically check locality
properties (place types)
Dependent type system is extensible
See OOPSLA 08 paper.

19
Function Types

(T1, T2, ..., Tn) gt U
type of functions that take arguments Ti and
returns U
If f (T) gt U and x T
then invoke with f(x) U
Function types can be used as an interface
Define apply method with the appropriate
signature
def apply(xT) U

Closures
First-class functions
(x T) U gt e
used in array initializers
Array.makeint( 0..4, (p point) gt p(0)p(0)
)
the array 0, 1, 4, 9, 16
Operators
int., boolean., ...
sum a.reduce(int., 0)

20
Type inference

Field, local variable types inferred from
initializer type
val x 1
x has type intself1
val y 1..2
y has type Regionrank1
Method return types inferred from method body
def m() ... return true ... return false ...
m has return type boolean

Loop index types inferred from region
R Regionrank2
for (p in R) ...
p has type Pointrank2
Place type inference implemented in X10 2.0

21
async
Stmt async(p,l) Stmt

async S
Creates a new child activity that executes
statement S
Returns immediately
S may reference final variables in enclosing
blocks
Activities cannot be named
Activity cannot be aborted or cancelled

cf Cilks spawn
// Compute the Fibonacci // sequence in
parallel. def run() if (r lt 2) return val
f1 new Fib(r-1), f2 new
Fib(r-2) finish async f1.run()
f2.run() r f1.r f2.r
22
finish
Stmt finish Stmt

L finish S
Execute S, but wait until all (transitively)
spawned asyncs have terminated.
Rooted exception model
Trap all exceptions thrown by spawned activities.
Throw an (aggregate) exception if any spawned
async terminates abruptly.
implicit finish at main activity
finish is useful for expressing
synchronous operations on
(local or) remote data.

cf Cilks sync
// Compute the Fibonacci // sequence in
parallel. def run() if (r lt 2) return val
f1 new Fib(r-1), f2 new
Fib(r-2) finish async f1.run()
f2.run() r f1.r f2.r
23
at
Stmt at(p) Stmt

at(p) S
Execute statement S at place p
Current activity is blocked until S completes

// Copy field f from a to b def
copyRemoteFields(a, b) at (b.loc) b.f at
(a.loc) a.f // Increment field f of obj def
incField(obj, inc) at (obj.loc) obj.f
inc // Invoke method m on obj def invoke(obj,
arg) at (obj.loc) obj.m(arg)
24
atomic
Stmt atomic Statement MethodModifier
atomic

atomic S
Execute statement S atomically
Atomic blocks are conceptually executed in a
single step while other activities are suspended
isolation and atomicity.
An atomic block body (S) ...
must be nonblocking
must not create concurrent activities
(sequential)
must not access remote data (local)

// target defined in lexically // enclosing
scope. atomic def CAS(oldObject,
nObject) if (target.equals(old)) target
n return true return false
// push data onto concurrent // list-stackval
node new Node(data)atomic node.next
head head node
25
when
Stmt WhenStmt WhenStmt when ( Expr )
Stmt WhenStmt or (Expr)
Stmt

when (E) S
Activity suspends until a state inwhich the
guard E is true.
In that state, S is executed atomically and in
isolation.
Guard E is a boolean expression
must be nonblocking
must not create concurrent activities
(sequential)
must not access remote data (local)
must not have side-effects (const)
await (E)
syntactic shortcut for when (E)

class OneBuffer var datumObject null var
filledBoolean false def send(vObject)
when ( !filled ) datum v filled
true def receive()Object when (
filled ) val v datum datum
null filled false return v
26
(No Transcript)
27
Clocks Main operations

var c Clock.make()
Allocate a clock, register current activity with
it. Phase 0 of c starts.
async() clocked (c1,c2,) S
ateach() clocked (c1,c2,) S
foreach() clocked (c1,c2,) S
Create async activities registered on clocks c1,
c2,

c.resume()
Nonblocking operation that signals completion of
work by current activity for this phase of clock
c
next
Barrier suspend until all clocks that the
current activity is registered with can advance.
c.resume() is first performed for each such
clock, if needed.
next can be viewed like a finish of
all computations under way in the
current phase of the clock

28
Fundamental X10 Property

Programs written using async, finish, at, atomic,
clock cannot deadlock
Intuition cannot be a cycle in waits-for graph

29
X10 Tutorial Overview

Why X10?
X10 By Example
X10 2.0 in a Nutshell
X10 Implementation/Tool Chain
Core Sequential Language
Concurrency
Distribution
Arrays
X10DT Overview/Demo
Extended example variations on a 2D-stencil
Conclusions

30
X10DT Overview

More information at http//x10-lang.org

31
2D Heat Conduction Problem

Based on the 2D Partial Differential Equation
(1), 2D Heat Conduction problem is similar to a
4-point stencil operation, as seen in (2)

(1)
Because of the time steps, Typically, two grids
are used
y
(2)
x
32
(No Transcript)
33
Heat transfer in X10

X10 permits smooth variation between multiple
concurrency styles
High-level ZPL-style (operations on global
arrays)
Chapel global view style
Expressible, but relies on compiler magic for
performance
OpenMP style
Chunking within a single place
MPI-style
SPMD computation with explicit all-to-all
reduction
Uses clocks
OpenMP within MPI style
For hierarchical parallelism
Fairly easy to derive from ZPL-style program.

34
Heat Transfer in X10 ZPL style
class Stencil2D static type RealDouble
const n 6, epsilon 1.0e-5 const BigD
Dist.makeBlock(0..n1, 0..n1, 0), D
BigD 1..n, 1..n, LastRow 0..0,
1..n as Region const A Array.makeReal(BigD
, (pPoint)gt(LastRow.contains(p)?10)) const
Temp Array.makeReal(BigD) def run()
var deltaReal do finish ateach
(p in D) Temp(p) A(p.stencil(1)).reduce
(Double., 0.0)/4 delta
(A(D)Temp(D)).lift(Math.abs).reduce(Math.max,
0.0) A(D) Temp(D) while (delta gt
epsilon)
35
Heat Transfer in X10 ZPL style

Cast in fork-join style rather than SPMD style
Compiler needs to transform into SPMD style
Compiler needs to chunk iterations per place
Fine grained iteration has too much overhead

Compiler needs to generate code for distributed
array operations
Create temporary global arrays, hoist them out of
loop, etc.
Uses implicit syntax to access remote locations.

Simple to write tough to implement efficiently
36
Heat Transfer in X10 II
def run() val D_Base Dist.makeUnique(D.place
s()) var deltaReal do finish ateach
(z in D_Base) for (p in D here)
Temp(p) A(p.stencil(1)).reduce(Double.,
0.0)/4 delta (A(D) Temp(D)).lift(Math.abs
).reduce(Math.max, 0.0) A(D) Temp(D)
while (delta gt epsilon)

Flat parallelism Assume one activity per place
is desired.
D.places() returns ValRail of places in D.
Dist.makeUnique(D.places()) returns a unique
distribution (one point per place) over the given
ValRail of places
D x returns sub-region of D at place x.

Explicit Loop Chunking
37
Heat Transfer in X10 III
def run() val D_Base Dist.makeUnique(D.place
s()) val blocks DistUtil.block(D, P) var
deltaReal do finish ateach (z in
D_Base) foreach (q in 1..P) for (p
in blocks(here,q)) Temp(p)
A(p.stencil(1)).reduce(Double., 0.0)/4
delta (A(D)Temp(D)).lift(Math.abs).reduce(Math.m
ax, 0.0) A(D) Temp(D) while (delta gt
epsilon)

Hierarchical parallelism P activities at place
x.
Easy to change above code so P can vary with x.
DistUtil.block(D,P)(x,q) is the region allocated
to the qth activity in place x. (Block-block
division.)

Explicit Loop Chunking with Hierarchical
Parallelism
38
(No Transcript)
39
Heat Transfer in X10 V
def run() finish async val c
clock.make() val D_Base Dist.makeUnique(D.p
laces()) val diff Array.makeReal(D_Base),
scratch Array.makeReal(D_Base)
ateach (z in D_Base) clocked(c) foreach (q
in 1..P) clocked(c) var myDiffReal 0
do if (q1) diff(z) 0.0
myDiff 0 for (p in blocks(here,q))
Temp(p) A(p.stencil(1)).reduce(Dou
ble., 0.0)/4 myDiff
Math.max(myDiff, Math.abs(A(p) Temp(p)))
atomic diff(z) Math.max(myDiff,
diff(z)) next
A(blocks(here,q)) Temp(blocks(here,q))
if (q1) reduceMax(z, diff, scratch)
next myDiff diff(z)
next while (myDiff gt epsilon)
OpenMP within MPI style
40
Heat Transfer in X10 VI