An Overview of X10 2.0 David Grove, Igor Peshansky, Vijay Saraswat IBM Research http://x10-lang.org - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

An Overview of X10 2.0 David Grove, Igor Peshansky, Vijay Saraswat IBM Research http://x10-lang.org

Description:

{ const epsilon = 1.0e-12; val fun:(double)= double; ... if (Math.abs(alr a) epsilon) return alr; val resHolder = new resHolder ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 42
Provided by: distCo
Category:

less

Transcript and Presenter's Notes

Title: An Overview of X10 2.0 David Grove, Igor Peshansky, Vijay Saraswat IBM Research http://x10-lang.org


1
An Overview of X10 2.0David Grove, Igor
Peshansky, Vijay SaraswatIBM Researchhttp//x10
-lang.org
  • SC 2009 PGAS Languages Tutorial

Based on material from previous X10 Tutorials by
Christoph von Praun, Vivek Sarkar, Nate
Nystrom This material is based upon work
supported in part by the Defense Advanced
Research Projects Agency under its Agreement No.
HR0011-07-9-0002. Please see x10-lang.org for
the most up-to-date version of these slides and
sample programs.
2
X10 Tutorial Overview
  • Why X10?
  • X10 By Example
  • X10 2.0 in a Nutshell
  • X10 Implementation/Tool Chain
  • Core Sequential Language
  • Concurrency
  • Distribution
  • Arrays
  • X10DT Overview/Demo
  • Extended example variations on a 2D-stencil
  • Conclusions

3
What is X10?
  • X10 is a new language developed in the IBM PERCS
    project as part of the DARPA program on High
    Productivity Computing Systems (HPCS)
  • X10 is an instance of the APGAS framework in the
    Java family
  • X10
  • Is more productive than current models
  • Can support high levels of abstraction
  • Can exploit multiple levels of parallelism and
    non-uniform data access
  • Is suitable for multiple architectures, and
    multiple workloads.

4
Language goals
  • Simple
  • Start with a well-accepted programming model,
    build on strong technical foundations, add few
    core constructs
  • Safe
  • Eliminate possibility of errors by design, and
    through static checking
  • Powerful
  • Permit easy expression of high-level idioms
  • And permit expression of high-performance programs
  • Scalable
  • Support high-end computing with millions of
    concurrent tasks
  • Universal
  • Present one core programming model to abstract
    from the current plethora of architectures.

5
(No Transcript)
6
(No Transcript)
7
Parallel HelloWorld
import x10.io.Console class HelloWorldPar
public static def main(argsRailString)void
finish ateach (p in Dist.makeUnique())
Console.OUT.println("Hello World from Place"
p) (1) x10c -o HelloWorldPar -O
HelloWorldPar.x10 (2) mpirun -n 4
HelloWorldPar Hello World from Place(0) Hello
World from Place(2) Hello World from
Place(3) Hello World from Place(1) (3)
8
Integration via Guassian Quadrature
class Integrate const epsilon 1.0e-12
val fun(double)gtdouble static final class
resHolder var valuedouble def
computeArea(leftdouble, rightdouble)
return recEval(left, fun(left), right,
fun(right), 0) def recEval(ldouble,
fldouble, rdouble, frdouble, adouble)
val h (r l) / 2 val hh h / 2
val c l h val fc fun(c)
val al (fl fc) hh val ar (fr fc)
hh val alr al ar if (Math.abs(alr
a) lt epsilon) return alr val resHolder new
resHolder() var expr2double 0 finish
async resHolder.value recEval(c, fc,
r, fr, ar) expr2 recEval(l, fl, c,
fc, al) return resHolder.value
expr2
9
X10 Tutorial Overview
  • Why X10?
  • X10 By Example
  • X10 2.0 in a Nutshell
  • X10 Implementation/Tool Chain
  • Core Sequential Language
  • Concurrency
  • Distribution
  • Arrays
  • X10DT Overview/Demo
  • Extended example variations on a 2D-stencil
  • Conclusions

10
(No Transcript)
11
X10 Project Status
  • X10 is an open source project (Eclipse Public
    License)
  • Documentation, releases, mailing lists, code,
    etc. all publicly available via
    http//x10-lang.org
  • XRX X10 Runtime in X10 (14kloc and growing)
  • X10 1.7.x releases throughout 2009 (Java C)
  • X10 2.0 released November 6, 2009
  • Java any platform with Java 5 Single process
    (all places in 1 JVM)
  • C Multi-process (1 place per process)
  • aix, linux, cygwin, solaris
  • x86, x86_64, PowerPC, Sparc
  • x10rt APGAS runtime (binary only) or MPI (open
    source)

12
Overview of Features
  • Many sequential features of Java inherited
    unchanged
  • Classes (w/ single inheritance)
  • Interfaces, (w/ multiple inheritance)
  • Instance and static fields
  • Constructors, (static) initializers
  • Overloaded, over-rideable methods
  • Garbage collection
  • Structs
  • Closures
  • Points, Regions, Distributions, Arrays
  • Substantial extensions to the type system
  • Dependent types
  • Generic types
  • Function types
  • Type definitions, inference
  • Concurrency
  • Fine-grained concurrency
  • async (p,l) S
  • Atomicity
  • atomic (s)
  • Ordering
  • L finish S
  • Data-dependent synchronization
  • when (c) S

13
Classes
  • Classes
  • Single inheritance, multiple interfaces
  • May have mutable instance fields
  • Values of class types may be null
  • Heap allocated
  • Distributed Object Model
  • Remote references with global identity
  • Rooted state lives in place where object was
    created
  • Global state
  • programmer specified subset of immutable state
  • serialized with object available anywhere that
    has remote ref
  • methods may be global as well (access only global
    state)

Global/Rooted new in X10 2.0
14
Structs
  • User defined primitives
  • No inheritance
  • May implement interfaces
  • All fields are final
  • All methods are final
  • Allocated inline in containing
    object/array/variable
  • Headerless
  • Instances of structs may be freely copied from
    place to place

struct Complex val realdouble val img
double def this(rdouble, idouble)
real r img i def operator
(thatComplex) return Complex(real
that.real, img that.img)
.... val x ArrayComplex(Dist).make
New in X10 2.0
15
Points and Regions
  • A point is an element of an n-dimensional
    Cartesian space (ngt1) with integer-valued
    coordinates e.g., 5, 1, 2,
  • A point variable can hold values of different
    ranks e.g.,
  • var p Point 1 p 2,3 ...
  • Operations
  • p1.rank
  • returns rank of point p1
  • p1(i)
  • returns element (i mod p1.rank) ifi lt 0 or i gt
    p1.rank
  • p1 lt p2, p1 lt p2, p1 gt p2, p1 gt p2
  • returns true iff p1 is lexicographically lt, lt,
    gt, or gt p2
  • only defined when p1.rank and p2.rank are equal
  • Regions are collections of points of the same
    dimension
  • Rectangular regions have a simple representation,
    e.g. 1..10, 3..40
  • Rich algebra over regions is provided

16
Distributions and Arrays
  • Distributions specify mapping of points in a
    region to places
  • E.g. Dist.makeBlock(R)
  • E.g. Dist.makeUnique()
  • Arrays are defined over a distribution and a base
    type
  • AArrayT
  • AArrayT(d)
  • Arrays are created through initializers
  • Array.makeT(d, init)
  • Arrays are mutable (considering immutable arrays)
  • Array operations
  • A.rank dimensions in array
  • A.region index region (domain) of array
  • A.dist distribution of array A
  • A(p) element at point p, where p belongs to
    A.region
  • A(R) restriction of array onto region R
  • Useful for extracting subarrays

17
Generic classes
  • Classes and interfaces may have type parameters
  • class RailT
  • Defines a type constructor Rail
  • and a family of types Railint, RailString,
    RailObject, RailC, ...
  • RailC as if Rail class is copied and C
    substituted for T
  • Can instantiate on any type, including primitives
    (e.g., int)

public abstract value class RailT (length
int) implements Indexableint,T,
Settableint,T private native def this(n
int) RailTlengthn public native def
get(i int) T public native def apply(i
int) T public native def set(v T, i int)
void
18
Dependent Types
  • Classes have properties
  • public final instance fields
  • class Region(rank int, zeroBased boolean, rect
    boolean) ...
  • Can constrain properties with a boolean
    expression
  • Regionrank3
  • type of all regions with rank 3
  • ArrayintregionR
  • type of all arrays defined over region R
  • R must be a constant or a final variable in scope
    at the type
  • Dependent types are checked statically.
  • Dependent types used to statically check locality
    properties (place types)
  • Dependent type system is extensible
  • See OOPSLA 08 paper.

19
Function Types
  • (T1, T2, ..., Tn) gt U
  • type of functions that take arguments Ti and
    returns U
  • If f (T) gt U and x T
  • then invoke with f(x) U
  • Function types can be used as an interface
  • Define apply method with the appropriate
    signature
  • def apply(xT) U
  • Closures
  • First-class functions
  • (x T) U gt e
  • used in array initializers
  • Array.makeint( 0..4, (p point) gt p(0)p(0)
    )
  • the array 0, 1, 4, 9, 16
  • Operators
  • int., boolean., ...
  • sum a.reduce(int., 0)

20
Type inference
  • Field, local variable types inferred from
    initializer type
  • val x 1
  • x has type intself1
  • val y 1..2
  • y has type Regionrank1
  • Method return types inferred from method body
  • def m() ... return true ... return false ...
  • m has return type boolean
  • Loop index types inferred from region
  • R Regionrank2
  • for (p in R) ...
  • p has type Pointrank2
  • Place type inference implemented in X10 2.0

21
async
Stmt async(p,l) Stmt
  • async S
  • Creates a new child activity that executes
    statement S
  • Returns immediately
  • S may reference final variables in enclosing
    blocks
  • Activities cannot be named
  • Activity cannot be aborted or cancelled

cf Cilks spawn
// Compute the Fibonacci // sequence in
parallel. def run() if (r lt 2) return val
f1 new Fib(r-1), f2 new
Fib(r-2) finish async f1.run()
f2.run() r f1.r f2.r
22
finish
Stmt finish Stmt
  • L finish S
  • Execute S, but wait until all (transitively)
    spawned asyncs have terminated.
  • Rooted exception model
  • Trap all exceptions thrown by spawned activities.
  • Throw an (aggregate) exception if any spawned
    async terminates abruptly.
  • implicit finish at main activity
  • finish is useful for expressing
  • synchronous operations on
  • (local or) remote data.

cf Cilks sync
// Compute the Fibonacci // sequence in
parallel. def run() if (r lt 2) return val
f1 new Fib(r-1), f2 new
Fib(r-2) finish async f1.run()
f2.run() r f1.r f2.r
23
at
Stmt at(p) Stmt
  • at(p) S
  • Execute statement S at place p
  • Current activity is blocked until S completes

// Copy field f from a to b def
copyRemoteFields(a, b) at (b.loc) b.f at
(a.loc) a.f // Increment field f of obj def
incField(obj, inc) at (obj.loc) obj.f
inc // Invoke method m on obj def invoke(obj,
arg) at (obj.loc) obj.m(arg)
24
atomic
Stmt atomic Statement MethodModifier
atomic
  • atomic S
  • Execute statement S atomically
  • Atomic blocks are conceptually executed in a
    single step while other activities are suspended
    isolation and atomicity.
  • An atomic block body (S) ...
  • must be nonblocking
  • must not create concurrent activities
    (sequential)
  • must not access remote data (local)

// target defined in lexically // enclosing
scope. atomic def CAS(oldObject,
nObject) if (target.equals(old)) target
n return true return false
// push data onto concurrent // list-stackval
node new Node(data)atomic node.next
head head node
25
when
Stmt WhenStmt WhenStmt when ( Expr )
Stmt WhenStmt or (Expr)
Stmt
  • when (E) S
  • Activity suspends until a state inwhich the
    guard E is true.
  • In that state, S is executed atomically and in
    isolation.
  • Guard E is a boolean expression
  • must be nonblocking
  • must not create concurrent activities
    (sequential)
  • must not access remote data (local)
  • must not have side-effects (const)
  • await (E)
  • syntactic shortcut for when (E)

class OneBuffer var datumObject null var
filledBoolean false def send(vObject)
when ( !filled ) datum v filled
true def receive()Object when (
filled ) val v datum datum
null filled false return v
26
(No Transcript)
27
Clocks Main operations
  • var c Clock.make()
  • Allocate a clock, register current activity with
    it. Phase 0 of c starts.
  • async() clocked (c1,c2,) S
  • ateach() clocked (c1,c2,) S
  • foreach() clocked (c1,c2,) S
  • Create async activities registered on clocks c1,
    c2,
  • c.resume()
  • Nonblocking operation that signals completion of
    work by current activity for this phase of clock
    c
  • next
  • Barrier suspend until all clocks that the
    current activity is registered with can advance.
    c.resume() is first performed for each such
    clock, if needed.
  • next can be viewed like a finish of
  • all computations under way in the
  • current phase of the clock

28
Fundamental X10 Property
  • Programs written using async, finish, at, atomic,
    clock cannot deadlock
  • Intuition cannot be a cycle in waits-for graph

29
X10 Tutorial Overview
  • Why X10?
  • X10 By Example
  • X10 2.0 in a Nutshell
  • X10 Implementation/Tool Chain
  • Core Sequential Language
  • Concurrency
  • Distribution
  • Arrays
  • X10DT Overview/Demo
  • Extended example variations on a 2D-stencil
  • Conclusions

30
X10DT Overview
  • More information at http//x10-lang.org

31
2D Heat Conduction Problem
  • Based on the 2D Partial Differential Equation
    (1), 2D Heat Conduction problem is similar to a
    4-point stencil operation, as seen in (2)

(1)
Because of the time steps, Typically, two grids
are used
y
(2)
x
32
(No Transcript)
33
Heat transfer in X10
  • X10 permits smooth variation between multiple
    concurrency styles
  • High-level ZPL-style (operations on global
    arrays)
  • Chapel global view style
  • Expressible, but relies on compiler magic for
    performance
  • OpenMP style
  • Chunking within a single place
  • MPI-style
  • SPMD computation with explicit all-to-all
    reduction
  • Uses clocks
  • OpenMP within MPI style
  • For hierarchical parallelism
  • Fairly easy to derive from ZPL-style program.

34
Heat Transfer in X10 ZPL style
class Stencil2D static type RealDouble
const n 6, epsilon 1.0e-5 const BigD
Dist.makeBlock(0..n1, 0..n1, 0), D
BigD 1..n, 1..n, LastRow 0..0,
1..n as Region const A Array.makeReal(BigD
, (pPoint)gt(LastRow.contains(p)?10)) const
Temp Array.makeReal(BigD) def run()
var deltaReal do finish ateach
(p in D) Temp(p) A(p.stencil(1)).reduce
(Double., 0.0)/4 delta
(A(D)Temp(D)).lift(Math.abs).reduce(Math.max,
0.0) A(D) Temp(D) while (delta gt
epsilon)
35
Heat Transfer in X10 ZPL style
  • Cast in fork-join style rather than SPMD style
  • Compiler needs to transform into SPMD style
  • Compiler needs to chunk iterations per place
  • Fine grained iteration has too much overhead
  • Compiler needs to generate code for distributed
    array operations
  • Create temporary global arrays, hoist them out of
    loop, etc.
  • Uses implicit syntax to access remote locations.

Simple to write tough to implement efficiently
36
Heat Transfer in X10 II
def run() val D_Base Dist.makeUnique(D.place
s()) var deltaReal do finish ateach
(z in D_Base) for (p in D here)
Temp(p) A(p.stencil(1)).reduce(Double.,
0.0)/4 delta (A(D) Temp(D)).lift(Math.abs
).reduce(Math.max, 0.0) A(D) Temp(D)
while (delta gt epsilon)
  • Flat parallelism Assume one activity per place
    is desired.
  • D.places() returns ValRail of places in D.
  • Dist.makeUnique(D.places()) returns a unique
    distribution (one point per place) over the given
    ValRail of places
  • D x returns sub-region of D at place x.

Explicit Loop Chunking
37
Heat Transfer in X10 III
def run() val D_Base Dist.makeUnique(D.place
s()) val blocks DistUtil.block(D, P) var
deltaReal do finish ateach (z in
D_Base) foreach (q in 1..P) for (p
in blocks(here,q)) Temp(p)
A(p.stencil(1)).reduce(Double., 0.0)/4
delta (A(D)Temp(D)).lift(Math.abs).reduce(Math.m
ax, 0.0) A(D) Temp(D) while (delta gt
epsilon)
  • Hierarchical parallelism P activities at place
    x.
  • Easy to change above code so P can vary with x.
  • DistUtil.block(D,P)(x,q) is the region allocated
    to the qth activity in place x. (Block-block
    division.)

Explicit Loop Chunking with Hierarchical
Parallelism
38
(No Transcript)
39
Heat Transfer in X10 V
def run() finish async val c
clock.make() val D_Base Dist.makeUnique(D.p
laces()) val diff Array.makeReal(D_Base),
scratch Array.makeReal(D_Base)
ateach (z in D_Base) clocked(c) foreach (q
in 1..P) clocked(c) var myDiffReal 0
do if (q1) diff(z) 0.0
myDiff 0 for (p in blocks(here,q))
Temp(p) A(p.stencil(1)).reduce(Dou
ble., 0.0)/4 myDiff
Math.max(myDiff, Math.abs(A(p) Temp(p)))
atomic diff(z) Math.max(myDiff,
diff(z)) next
A(blocks(here,q)) Temp(blocks(here,q))
if (q1) reduceMax(z, diff, scratch)
next myDiff diff(z)
next while (myDiff gt epsilon)
OpenMP within MPI style
40
Heat Transfer in X10 VI
  • All previous versions permit fine-grained remote
    access
  • Used to access boundary elements
  • Much more efficient to transfer boundary elements
    in bulk between clock phases.
  • May be done by allocating extra ghost boundary
    at each place
  • API extension Dist.makeBlock(D, P, f)
  • D distribution, P processor grid, f
    region?region transformer
  • reduceMax() phase overlapped with ghost
    distribution phase

41
Conclusions
  • Want to try it out?
  • Download from http//x10-lang.org
  • Hands-on section later today...
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com