Title: Formalization of Generics for the 'NET Common Language Runtime
1Formalization of Generics for the .NET Common
Language Runtime
- Dachuan Yu (Yale University) Andrew Kennedy,
Don Syme (Microsoft Research Cambridge)
2Introduction
- Upcoming revision of Microsoft .NET platform
includes support for parametric polymorphism
(generics) in - Programming languages C, Visual Basic, Managed
C - Common Language Runtime (the virtual machine)
- Visual Studio (Integrated Development
Environment) - Libraries
- Previous work (PLDI01) described implementation
techniques used in the CLR - Now we formalize the polymorphic intermediate
language and aspects of the implementation
3CLR The big picture
C program
Visual Basicprogram
SML.NETprogram
C compiler
Visual Basic compiler
SML.NET compiler
IL
IL
IL
Native binary
Loader JIT front-end
Native interop
Common Language Runtime
JIT IL
Remoting
Garbage collector
Security
JIT code-gen
Threads
ExceptionHandling
Machine code
4CLR The big picture
C program
Visual Basicprogram
SML.NETprogram
C compiler
Visual Basic compiler
SML.NET compiler
IL
IL
IL
Native binary
Loader JIT front-end
Native interop
Common Language Runtime
JIT IL
Remoting
Garbage collector
Security
JIT code-gen
Threads
ExceptionHandling
Machine code
5High-level design of generics
- Type parameterization for all declarations
- classes e.g. class SetltTgt
- interfaces e.g. interface IComparableltTgt
- structse.g. struct HashBucketltK,Dgt
- methods e.g. static void ReverseltTgt(T arr)
- delegates (first-class methods) e.g. delegate
void ActionltTgt(T arg)
6Good design gt Tricky Implementation
- Unrestricted instantiationListltstringgt ls new
Listltstringgt() // reference typesListltdoublegt
ld // primitive typesListltPairltstring,doub
legtgt lsd // struct types - Full support for run-time typesif (x is
Setltstringgt) ... // type-test y (ListltTgt)
z // checked cast - Recursion in instantiationsclass ListltTgt
ICloneableltListltTgtgt // finiteclass CltTgt
CltCltTgtgt fld // infinite
7Why formalize?
- In previous work (POPL01, Gordon Syme) the aim
was a type soundness proof for a subset of IL
(Baby IL) - Our aims are different
- Implementation techniques used in the CLR product
are subtle and difficult to get right (gt bugs,
perhaps security holes) - Wed like to validate those techniques
- Current JIT- and pre-compilers are not
type-preserving - Our formalization provides a basis for typed
compiler intermediate languages for more capable
and robust compilers - Its also difficult to express and apply
optimizations - Formalization makes this easier
- By-product is a generic variant on Baby IL
8Formalization the big picture
BILG classes and methods
BILG Baby IL with GenericsA tiny subset of
MS-IL
Specialize generic classes and methodsShare
instantiations w.r.t. data representationIntroduc
e types-as-valuesOptimize use of types-as-values
BILC classes and methods
BILC Baby IL with Constrained genericsA
typed intermediate language more suitable for
code-generation
9Illustrative example, in C
Want to share generated code for ArrayToList
over different instantiations of T
- class ArrayUtils
- static ListltTgt ArrayToListltTgt(T arr)
- new ListltTgt()
-
- class ListltTgt
- virtual ListltTgt Append(object obj)
(ListltTgt) obj new ListCellltTgt -
Pass type parameters at runtime?
Look up type representations at runtime?
Want to share generated code for List over
different instantiations of T
Look up type representations at runtime? How do
we know what T is?
10Source Language BILG
- Baby IL with Generics
- Purely functional, à la Featherweight Java
(Igarashi, Pierce, Wadler) - Primitive types generic classes
- Inheritance-based subtyping
- Generic methods (static and virtual)
- Type-case operation (isinst) inspects run-time
type of object - No overloading, no interfaces, no abstract
methods, no structs (value classes), no
delegates, no boxing, no null values, no heap, no
bounded polymorphism - Just enough to demonstrate most of the
implementation techniques! - Typing rules big-step semantics in paper
- Easier to work with big-step
- 9 v. e ? v taken as definition of divergence
11Source language BILG
- (type) T,U X int32 int64 I
- (inst type) I CltT1,,Tngt
- (class def) cd class CltX1,,Xn gt I T1
f1 Tm fm md1 mdk - (method def ) md static T
mltX1,,Xngt(T1,,Tm) e virtual T
mltX1,,Xngt(T1,,Tm) e - (method ref) M ImltT1,,Tngt
- (expr) e ldc.i4 i4 ldc.i8 i8 ldarg
x - e1 en newobj I
- e ldfld If
- e1 en call M
- e e1 en callvirt M
- e isinst I or e
12BILG typing and evaluation for isinst
E e I E e I
E e isinst I or e I
fr e ? I(f1v1,,fnvn) I lt I
fr e isinst I or e ? I(f1v1,,fnvn)
fr e ? I(f1v1,,fnvn) (I lt I)
fr e ? v
fr e isinst I or e ? v
13BILG typing and evaluation for isinst
E e I E e I
Observe Types affect evaluation They cannot be
erased They serve static and dynamic purposes
E e isinst I or e I
fr e ? I(f1v1,,fnvn) I lt I
fr e isinst I or e ? I(f1v1,,fnvn)
fr e ? I(f1v1,,fnvn) (I lt I)
fr e ? v
fr e isinst I or e ? v
14Target Language BILC
- Similar to BILG, but adds
- Representation constraints on type parameters
- ref must be a reference type
- i4 must be a 32-bit integer
- i8 must be a 64-bit integer
- Types-as-values
- RT is a value representing closed type T
- The value RT has singleton type Rep(T),
interpreted as is a value representing the
type T - Construct reps for open types
mkrepCltT1,,Tngt(e1,,en) creates a type-rep
for CltT1,,Tngt given type-reps for T1,,Tn - Semantics given by small-step reduction relation
15Target language BILC (subset)
- (type) T,U X int32 int64 I
- (inst type) I CltT1,,Tngt
- (extended types) ? T Rep(T)
- (constraint) s ref i4 i8
- (class def) cd class CltX1 s1,,Xn sn gt I
T1 f1 Tm fm md1 mdk - (method def ) md static T mltX1 s1,,Xn sn
gt(?1,, ?k) e virtual T mltX1 s1,,X
sngt(?1,, ?k ) e - (method ref) M ImltT1,,Tngt
- (expr) e i4 i8 x
- I(e,e1,,en)
- e ldfld If
- e1 en call M
- e e1 en callvirt M
- e isinstIe or e
- RT
- mkrepCltT1,,Tngt(e1,,en)
16Some typing and reduction rules
E CltT1,,Tngt ok E e1 Rep(T1) E
en Rep(Tn)
E mkrepCltT1,,Tngt(e1,,en) Rep(CltT1,,Tngt)
E e I E e Rep(I) E e I
Reflected subtypingRI Á RI iff I lt I
E e isinstI e or e I
v I(w,v1,,vn) w Á w
? (v isinstT w or v) ! v
v I(w,v1,,vn) w w
? (v isinstT w or v) ! v
17Some typing and reduction rules
E CltT1,,Tngt ok E e1 Rep(T1) E
en Rep(Tn)
E mkrepCltT1,,Tngt(e1,,en) Rep(CltT1,,Tngt)
E e I E e Rep(I) E e I
E e isinstI e or e I
v I(w,v1,,vn) w Á w
Observe Types do not affect evaluation They can
be erased They serve only static purposes
? (v isinstT w or v) ! v
v I(w,v1,,vn) w w
? (v isinstT w or v) ! v
18Example
- Static generic method in BILG static ListltTgt
ConvltTgt(object a) a isinst
ListltTgt - Translated to BILC static Listi Convi(object
a) a isinstTreei RTreei) static
Listl Convl(object a) a isinstTreel
RTreel static ListrltTgt ConvrltTrefgt(Rep(T) r,
object a) a isinstListrltTgt
(mkrepListrltTgt(r))
Specialized code for T int32
Specialized code for T int64
Code shared for reference types
Extra parameter representing T
Lookup/Create type rep at runtime
19We need more
- So far
- specialization, sharing, and separation of
run-time types from static types - but mkrep is a costly operation, requiring
type-rep creation at runtime - Idea instead of passing representations for type
parameters, pass representations of types that we
actually needstatic ListrltTgt
ConvrltTrefgt(Rep(ListrltTgt) r, object a)
a isinstListrltTgt(r)
Extra parameter representing ListltTgt
20We need more
- In general, we need many type-reps in a single
method body - So we pass around dictionaries of type-reps
- What type does a dictionary of type-reps have?
- At its simplest, it is just a tuple e.g.
Rep(ListltXgt) Rep(VecltVecltXgtgt) is type of a
two-slot dictionary containing type-reps for
ListltXgt and VecltVecltXgtgt - In general, dictionaries may contain cycles (e.g.
for mutually recursive methods), so we need
recursive values and their types - Worse still, polymorphic recursion requires
infinite dictionaries - Simpler use name-based types for dictionaries
- reps for methods Rep(M), RM, mkrepM(e1,,en)
- statically each Rep-type determines a particular
tuple of other Rep-types - dynamically each type-rep RT or method-rep RM
determines a tuple of type-rep/method-rep values
21Target language BILC (full)
- (type) T,U X int32 int64 I
- (inst type) I CltT1,,Tngt
- (ext type) ? T Rep(T) Rep(M)
- (constraint) s ref i4 i8
- (class def) cd class CltX1 s1,,Xn sn gt I
T1 f1 Tm fm md1 mdk with ?1,,?p - (method def ) md static T mltX1 s1,,Xn sn
gt(?1,, ?k) e with
?1,,?p virtual T mltX1 s1,,X sngt(?1,,
?k) e - (method ref) M ImltT1,,Tngt
- (expr) e i4 i8 x
- I(e,e1,,en)
- e ldfld If
- e1 en call M
- e e1 en callvirt M
- e isinstIe or e
- RT RM
- mkrepCltT1,,Tngt(e1,,en)
- mkrepCltT1,,TngtmltU1,,Ukgt(e1,,en,e1,,ek)
- objdicti e
- mdicti e
22Translation scheme
- Static generic methods
- Extra dictionary parameter associated with method
- Accessed using mdicti(e)
- Virtual methods in generic classes
- Obtain dictionary through type of object
- Accessed using objdict_i(e)
- Generic virtual methods
- Dictionary type not known statically (body could
be overridden) - So pass reps for type parameters and construct
type-reps at runtime using mkdrep
23In the paper
- Complete formalization of BILG, BILC, and a
translation - Theorems
- Translation preserves types
- Translation preserves behaviour
- And in forthcoming technical report
- Full proofs
- Type erasure theorem types in BILC do not affect
evaluation
24Future work
- Extend BILG and the translation to cover more
features - Value classes (structs)
- Would satisfy representation constraint of form
s1,,sn where s1,,sn are constraints on the
fields representations - Now have unbounded number of specializations
- All methods on generic structs whose code is
shared take a dictionary parameter - Need treatment of boxing
- Flexible specialization policies
- Less sharing e.g. full specialization of
selected types - More sharing e.g. share all instantiations of
CltTgt by boxing and unboxing appropriately (cf ML)
25Future work structural typing
- Flexible specialization interacts badly with
run-time types based on name-equivalence - Instead, describe dictionaries using structural
typing - ProductsRep(ListltXgt) Rep(X) is two-slot
dictionary with type-reps for ListltXgt and X - Circular dictionaries gt Recursive types e.g.?
D. Rep(VecltXgt) (Rep(SetltXgt) D) - Polymorphic recursion in code gt Higher-kinded
recursive types e.g. (?D. ?X. Rep(VecltXgt)
D(SetltXgt)) string
26Related work
- Rep(T)
- Crary, Weirich, Morrisett Intensional
polymorphism in type-erasure semantics - Dictionary-passing for polymorphism
implementation - Saha and Shao (ML)
- Viroli and Natali (Java)
27