Title: Phoenix: a framework for Code Generation, Optimization and Program Analysis
1Phoenix a framework for Code Generation,
Optimization and Program Analysis
- Andrew Pardoe Phoenix team
- Andrew.Pardoe_at_Microsoft.com
2What is Phoenix?
- Phoenix is Microsofts next-generation, state of
the art infrastructure for program analysis and
transformation - We wanted to
- Develop an industry-leading compilation and tools
framework - Foster a rich ecosystem for
- Academic
- Research
- Industry
- With an infrastructure that is robust,
retargetable, extensible, configurable and
scalable - Phoenix is built on C/CLI and compiles either
as managed or native code
3Building a program with C/CLI
- Microsoft C compiler
- Input program source code
- Ouput COFF object file
- COFF files are linked with system libraries into
PEs
Driver (CL)
C Source
Frontend (C1)
Backend (C2)
Obj File
4Roles of C1 (C1xx) and C2
- Preprocessing
- Tokenization
- Parsing
- Semantic processing
- CIL emission
- Types and symbol debug info
- Metadata for managed code
- CIL reading
- Program analysis
- Optimization
- Lowering to target
- COFF emission
- Source level debug info
5Why we built Phoenix
- Code generation technology now appears in many
different forms - Large-scale optimizers (PreJIT or Cs LTCG)
- Fast code generation (.NETs JIT, C debug mode,
C) - Custom code generators (fast conditional
breakpoints, SQL expression optimizers) - Code generators in Microsoft target many
different computer architectures - PC platforms (x86, x64, IA64)
- Game consoles (x86, PPC)
- Handheld devices (ARM)
6And another set of reasons
- Microsoft builds sophisticated analysis tools
- VS 2005s C compiler contains an /analyze
switch to perform static analysis for code
defects - The .NET coding guidelines are enforced by FxCop
- We have tools for defect, security and race
detection - These tools are often developed in a manner that
work for one specific product. This limits - Retargeting the tool for other applications
- Ability to adopt the best-of-breed technology
- Ability to move forward as technology changes
7Why the rest of the world needs Phoenix ?
- Research
- Research often spends too much time handling
routine work instead of exploring the novel ideas
that inspired the research - If research doesnt build on a world-class
framework it often cannot handle real-world
problems - Industry
- Much effort is spent on deciphering poorly
documented formats and interfaces (Microsofts
CIL or PE file formats) - There is an inherent fragility in working without
specifications or promises of future
compatibility - Industry mistakes end up costing Microsoft as
well - Academic
- Attempts to provide common infrastructures have
had limited success in the past - By using Phoenix, educators can start with big
problems and leave the routine work to us
8AST Tools
- Static Analysis Tools
- Next Gen Front-Ends
- R/W Global Program Views
.Net CodeGen
MSR Adv Lang
- Language Research
- Direct xfer to Phoenix
- Research Insulated from code generation
- Runtime JITs
- Pre-JIT
- OO and .Net optimizations
Phoenix Infrastructure
Native CodeGen
MSR Partner Tools
- Advanced C/OO Optimizations
- FP optimizations
- OpenMP
- Built on Phoenix APIs
- Both HL and LL APIs
- Managed APIs
- Program Analysis
- Program Rewrite
Retargetable
Academic RDK
- Machine Models
- 3 months -Od
- 3 months -O2
- Full sources (future)
- Managed APIs
- IP as DLLs
- Docs
Chip Vendor CDK
- 6 month ports
- Sample port docs
- Key ports (Xscale) done at msft
9Key features of Phoenix
- Written in C but usable by any .NET language
- Samples provided in C and C/CLI
- Phase and Plug-In model for third-party
extensions to - C compiler backend, JIT/PreJIT
- Static analysis tools, binary analysis and
manipulation - Plug-Ins and extensions to the Phoenix
architecture - Single, strongly-typed, explicit dataflow/control
flow IR used throughout all phases of the
framework - IR and Type system are capable of processing
native and managed code - Strong inter-phase consistency checking
10Compilers
Tools
Browser
Visualizer
Lint
HL Opts
LL Opts
Code Gen
HL Opts
LL Opts
LL Opts
HL Opts
Code Gen
Formatter
Obfuscator
Refactor
Xlator
Profiler
SecurityChecker
Phx APIs
Phoenix Core AST IR Syms Types
CFGraph SSA Dataflow Alias EH
Readers Writers
Native Image
C IL
.NETassembly
CAST
Profile
Phx AST
C
PreFast
Lex/Yacc
C
VB
C
Delphi
Cobol
Eiffel
Tiger
11Dynamic Tools
Locaity opts
12Phoenix Architecture
- Core set of extensible classes to represent
- IR (intermediate representation of code stream)
- Symbols, Types, Function units, Basic blocks,
Graphs, Trees, Aliasing information - Layered set of analysis and transformation
components - Data flow analysis, Loop analysis, Alias
analysis, Dead code removal, Redundant code
detection - Global optimizations built on reusable analysis
lattices - Common input/output library for binary formats
- PE, LIB, OBJ, CIL, MSIL, PDB
- Phoenix both reads and writes binary formats
13Simple example
- void main (int argc, char argv)
-
- char message
- if (argc gt 1)
- message Hello, world!\n
- else
- message Goodbye, world!\n
- printf (message)
14Resulting Phoenix IR
?
15View inside a Phoenix-based C2
S O U R C E
O B J E C T
CI L
HIR
AST
MIR
LIR
EIR
CIL Reader Type Checker
MIR LowerSSA Const SSA Dest Canon Addr Modes
Lower Reg Alloc EH Lower Stack Alloc Frame
Gen Switch Lower Block Layout Flow Opts
Encode Lister
C2
C1
16Types of IR
- High-level IR Architecture and runtime
independent. Object model instructions, array
indices, full aliasing - Mid-level IR Architecture independent, runtime
dependent. Lowered to calls and address
arithmetic - Low-level IR Architecture and runtime dependent.
Lowered to machine instructions - Encoded IR Binary format. Lowered to encoded
data instructions - IRs contain Instructions and Operands of various
types at each IR level
17IR states during compilation
- Phases transform IR either within a state or from
one state to a contiguous state - For example, lower phase transforms MIR into LIR.
Optimizations usually work within a single phase.
Abstract
Concrete
Lowering
Raising
18Extending a Phoenix-based compiler
- The VC optimizer is just a Phoenix client
- All Phoenix clients can host Plug-Ins
- Plug-Ins can
- Add new components
- Extend existing components
- Reconfigure clients
- Extensibility relies upon
- Reflection
- Events and delegates
19Component extensibility
- Most objects in the system support observers by
deriving from the Phoenix class Extensible Object - Observer classes can register delegates so that
they are notified when the host object undergoes
certain events. For example, if the host object
is copied it will notify registered delegates - Phoenix provides a standard plug-in discovery and
registration mechanism - Plug-ins can reconfigure the client, such as
replacing the register allocator - Plug-ins can also use Phoenixs analyses to do
their own analyses and transformations
20Extensibility example birth tracking
- // Called from Instruction ctor
- PlugInNewInstructionEventHandler
- (
- PhxIRInstruction instruction
- )
-
- InstructionBirthExtensionObject
extensionObject - gcnew InstructionBirthExtensionObject()
- extensionObject-gtBirthPhase instruction
- -gtFunctionUnit-gtPhase
- instruction-gtAddExtensionObject(extensionObject
) -
- // Called from Instruction dtor
- void
- PlugInDeleteInstructionEventHandler
- (
- PhxIRInstruction instruction
- )
- // Attach a note to each instruction with the
birth - // phase for reference later
- public ref class InstructionBirthExtensionObject
- public PhxIRInstructionExtensionObject
-
- public
- property PhxPhasesPhase BirthPhase
- property SystemString BirthPhaseText
-
- SystemString get ()
-
- if (BirthPhase ! nullptr)
-
- return BirthPhase-gtNameString
-
21Plug-In VS Integration
?
- Plug-Ins can be created via Visual Studio Wizards
- RDK is downloadable and works with free VS
Express Editions (though you probably want the VS
Team System Edition for your work )
22Example Unitialized local detection
- We would like to warn the user that x is not
initialized before use - To do this we need to perform dataflow analysis
- Well use a plug-in to add this phase to the
existing Phoenix-based C2
23May and Must examples
- message must be used before it is defined
- message may be usedbefore it is defined
- void main()
-
- char message
- if ()
- message Hello
- printf(message)
- void main()
-
- char message
- char other
- if ()
- other Hello
- printf(message)
-
24IR for detecting uninitialized locals
25Detecting an uninitialized use
- For each local variable v
- Examine all paths from the entry of the method to
each use of v - If on every path v is not initialized before the
use - v must be used before it is defined
- If there is some path where v is not initialized
before the use - v may be used before it is defined
- Classic solution is to build a control flow graph
and solve the data flow problem. - State is unknown at the start of each block.
Transfer states between blocks and combine them
as you traverse the control flow graph
26Code sketch using classic dataflow
- bool changed true
- while (changed)
-
- for each (PhxGraphsBasicBlock block in
function) -
- STATE inState inStatesblock
- bool firstPred true
- for each(PhxGraphsBasicBlock
predecessorBlock in block-gtPredecessors) -
- STATE predecessorState
outStatespredecessorBlock - inState meet(inState,
predecessorState) -
- inStatesid inState
- STATE newOutState gcnew STATE(inState)
Update input state
Compute output state
Check for convergence
27Can we make this easier?
- Dataflow solution computes the state for the
entire graph, even at places where v is never
referenced - An alternate model is known as Static Single
Assignment form, or SSA. It directly connects
definitions and uses. - Phoenix uses SSA and builds flow graphs when
necessary - We can rewrite this code letting Phoenix do most
of the routine work
28Code sketch using Phoenix
?
- for each (PhxIROperand destinationOperand
in - PhxIROperandIteratorDestinations(firstInstr
uction)) -
- if (destinationOperand-gtIsMemoryModification
Reference) -
- for each (PhxIROperand useOperand
in - PhxIROperandIteratorUse(destinationOperan
d)) -
- if (useOperand-gtInstruction-gtOpcode
! PhxCommonOpcodePhi - useOpnd-gtIsVariableOpnd)
-
- PhxSymbolsSymbol symbolUse
useOperand-gt - AsVariableOpnd-gtSymbol
- if (symbolUse ! nullptr
!mustList.Contains(symbolUse)) -
- mustList.Add(symbolUse)
-
-
-
29Uninitialized local plug-in
- Plug-in is loaded at runtime by Phoenix-based C2
UninitializedLocal.cpp
Test.cpp
C/CLI
C1
Phx-C2
UninitialzedLocal.dll
Test.obj
30Phoenix C2 with our plug-in added
- This complete plug-in is provided as a sample in
the Research Development Kit - It is only 400 lines of code to add a key
warning to the C2 compiler - Other types of checking can be added just as
easily - A demonstration of the warnings being emitted
31Phoenix PE Reading
- Phoenix can read and write PE files directly
- You can implement your own compiler or linker
- You can create post-link tools for analysis,
instrumentation or optimization - Binaries can be read in, raised into IR, changed
and rewritten as new, working binaries - Phoenix Explorer is only 800 lines of code on
top of the Phoenix binary reading-writing library
32Phoenix explorer is like ILDasm to IR
33Binary rewriting with Phoenix
- mtrace utility injects tracing code into managed
applications - You dont need the source code to do this (you do
need the PDB) - mtrace shows functions being entered and exited
34How do I get Phoenix?
- Early access RDKs are available to selected
universities - Sample projects include aspect oriented
programming, code obfuscation, profiling - Contact phxap_at_microsoft.com for Academic early
access requests - Early access CDK is available to selected
industry partners - Contact phxcp_at_microsoft.com for commercial early
access requests - Phoenix RDK/CDKs release about every 6 months
- Phoenix will be the next MS compiler backend
- We build the next-generation Windows every night
35More information
?
- http//research.microsoft.com/phoenix