Phoenix: a framework for Code Generation, Optimization and Program Analysis - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Phoenix: a framework for Code Generation, Optimization and Program Analysis

Description:

Phoenix: a framework for Code Generation, Optimization and Program Analysis Andrew Pardoe Phoenix team Andrew.Pardoe_at_Microsoft.com How do I get Phoenix? – PowerPoint PPT presentation

Number of Views:245
Avg rating:3.0/5.0
Slides: 36
Provided by: AndrewP166
Category:

less

Transcript and Presenter's Notes

Title: Phoenix: a framework for Code Generation, Optimization and Program Analysis


1
Phoenix a framework for Code Generation,
Optimization and Program Analysis
  • Andrew Pardoe Phoenix team
  • Andrew.Pardoe_at_Microsoft.com

2
What is Phoenix?
  • Phoenix is Microsofts next-generation, state of
    the art infrastructure for program analysis and
    transformation
  • We wanted to
  • Develop an industry-leading compilation and tools
    framework
  • Foster a rich ecosystem for
  • Academic
  • Research
  • Industry
  • With an infrastructure that is robust,
    retargetable, extensible, configurable and
    scalable
  • Phoenix is built on C/CLI and compiles either
    as managed or native code

3
Building a program with C/CLI
  • Microsoft C compiler
  • Input program source code
  • Ouput COFF object file
  • COFF files are linked with system libraries into
    PEs

Driver (CL)
C Source
Frontend (C1)
Backend (C2)
Obj File
4
Roles of C1 (C1xx) and C2
  • C1 or C1xx
  • C2
  • Preprocessing
  • Tokenization
  • Parsing
  • Semantic processing
  • CIL emission
  • Types and symbol debug info
  • Metadata for managed code
  • CIL reading
  • Program analysis
  • Optimization
  • Lowering to target
  • COFF emission
  • Source level debug info

5
Why we built Phoenix
  • Code generation technology now appears in many
    different forms
  • Large-scale optimizers (PreJIT or Cs LTCG)
  • Fast code generation (.NETs JIT, C debug mode,
    C)
  • Custom code generators (fast conditional
    breakpoints, SQL expression optimizers)
  • Code generators in Microsoft target many
    different computer architectures
  • PC platforms (x86, x64, IA64)
  • Game consoles (x86, PPC)
  • Handheld devices (ARM)

6
And another set of reasons
  • Microsoft builds sophisticated analysis tools
  • VS 2005s C compiler contains an /analyze
    switch to perform static analysis for code
    defects
  • The .NET coding guidelines are enforced by FxCop
  • We have tools for defect, security and race
    detection
  • These tools are often developed in a manner that
    work for one specific product. This limits
  • Retargeting the tool for other applications
  • Ability to adopt the best-of-breed technology
  • Ability to move forward as technology changes

7
Why the rest of the world needs Phoenix ?
  • Research
  • Research often spends too much time handling
    routine work instead of exploring the novel ideas
    that inspired the research
  • If research doesnt build on a world-class
    framework it often cannot handle real-world
    problems
  • Industry
  • Much effort is spent on deciphering poorly
    documented formats and interfaces (Microsofts
    CIL or PE file formats)
  • There is an inherent fragility in working without
    specifications or promises of future
    compatibility
  • Industry mistakes end up costing Microsoft as
    well
  • Academic
  • Attempts to provide common infrastructures have
    had limited success in the past
  • By using Phoenix, educators can start with big
    problems and leave the routine work to us

8
AST Tools
  • Static Analysis Tools
  • Next Gen Front-Ends
  • R/W Global Program Views

.Net CodeGen
MSR Adv Lang
  • Language Research
  • Direct xfer to Phoenix
  • Research Insulated from code generation
  • Runtime JITs
  • Pre-JIT
  • OO and .Net optimizations

Phoenix Infrastructure
Native CodeGen
MSR Partner Tools
  • Advanced C/OO Optimizations
  • FP optimizations
  • OpenMP
  • Built on Phoenix APIs
  • Both HL and LL APIs
  • Managed APIs
  • Program Analysis
  • Program Rewrite

Retargetable
Academic RDK
  • Machine Models
  • 3 months -Od
  • 3 months -O2
  • Full sources (future)
  • Managed APIs
  • IP as DLLs
  • Docs

Chip Vendor CDK
  • 6 month ports
  • Sample port docs
  • Key ports (Xscale) done at msft

9
Key features of Phoenix
  • Written in C but usable by any .NET language
  • Samples provided in C and C/CLI
  • Phase and Plug-In model for third-party
    extensions to
  • C compiler backend, JIT/PreJIT
  • Static analysis tools, binary analysis and
    manipulation
  • Plug-Ins and extensions to the Phoenix
    architecture
  • Single, strongly-typed, explicit dataflow/control
    flow IR used throughout all phases of the
    framework
  • IR and Type system are capable of processing
    native and managed code
  • Strong inter-phase consistency checking

10
Compilers
Tools
Browser
Visualizer
Lint
HL Opts
LL Opts
Code Gen
HL Opts
LL Opts
LL Opts
HL Opts
Code Gen
Formatter
Obfuscator
Refactor
Xlator
Profiler
SecurityChecker
Phx APIs
Phoenix Core AST IR Syms Types
CFGraph SSA Dataflow Alias EH
Readers Writers
Native Image
C IL
.NETassembly
CAST
Profile
Phx AST
C
PreFast
Lex/Yacc
C
VB
C
Delphi
Cobol
Eiffel
Tiger
11
Dynamic Tools
Locaity opts
12
Phoenix Architecture
  • Core set of extensible classes to represent
  • IR (intermediate representation of code stream)
  • Symbols, Types, Function units, Basic blocks,
    Graphs, Trees, Aliasing information
  • Layered set of analysis and transformation
    components
  • Data flow analysis, Loop analysis, Alias
    analysis, Dead code removal, Redundant code
    detection
  • Global optimizations built on reusable analysis
    lattices
  • Common input/output library for binary formats
  • PE, LIB, OBJ, CIL, MSIL, PDB
  • Phoenix both reads and writes binary formats

13
Simple example
  • void main (int argc, char argv)
  • char message
  • if (argc gt 1)
  • message Hello, world!\n
  • else
  • message Goodbye, world!\n
  • printf (message)

14
Resulting Phoenix IR
?
15
View inside a Phoenix-based C2
S O U R C E
O B J E C T
CI L
HIR
AST
MIR
LIR
EIR
CIL Reader Type Checker
MIR LowerSSA Const SSA Dest Canon Addr Modes
Lower Reg Alloc EH Lower Stack Alloc Frame
Gen Switch Lower Block Layout Flow Opts
Encode Lister
C2
C1
16
Types of IR
  • High-level IR Architecture and runtime
    independent. Object model instructions, array
    indices, full aliasing
  • Mid-level IR Architecture independent, runtime
    dependent. Lowered to calls and address
    arithmetic
  • Low-level IR Architecture and runtime dependent.
    Lowered to machine instructions
  • Encoded IR Binary format. Lowered to encoded
    data instructions
  • IRs contain Instructions and Operands of various
    types at each IR level

17
IR states during compilation
  • Phases transform IR either within a state or from
    one state to a contiguous state
  • For example, lower phase transforms MIR into LIR.
    Optimizations usually work within a single phase.

Abstract
Concrete
Lowering
Raising
18
Extending a Phoenix-based compiler
  • The VC optimizer is just a Phoenix client
  • All Phoenix clients can host Plug-Ins
  • Plug-Ins can
  • Add new components
  • Extend existing components
  • Reconfigure clients
  • Extensibility relies upon
  • Reflection
  • Events and delegates

19
Component extensibility
  • Most objects in the system support observers by
    deriving from the Phoenix class Extensible Object
  • Observer classes can register delegates so that
    they are notified when the host object undergoes
    certain events. For example, if the host object
    is copied it will notify registered delegates
  • Phoenix provides a standard plug-in discovery and
    registration mechanism
  • Plug-ins can reconfigure the client, such as
    replacing the register allocator
  • Plug-ins can also use Phoenixs analyses to do
    their own analyses and transformations

20
Extensibility example birth tracking
  • // Called from Instruction ctor
  • PlugInNewInstructionEventHandler
  • (
  • PhxIRInstruction instruction
  • )
  • InstructionBirthExtensionObject
    extensionObject
  • gcnew InstructionBirthExtensionObject()
  • extensionObject-gtBirthPhase instruction
  • -gtFunctionUnit-gtPhase
  • instruction-gtAddExtensionObject(extensionObject
    )
  • // Called from Instruction dtor
  • void
  • PlugInDeleteInstructionEventHandler
  • (
  • PhxIRInstruction instruction
  • )
  • // Attach a note to each instruction with the
    birth
  • // phase for reference later
  • public ref class InstructionBirthExtensionObject
  • public PhxIRInstructionExtensionObject
  • public
  • property PhxPhasesPhase BirthPhase
  • property SystemString BirthPhaseText
  • SystemString get ()
  • if (BirthPhase ! nullptr)
  • return BirthPhase-gtNameString

21
Plug-In VS Integration
?
  • Plug-Ins can be created via Visual Studio Wizards
  • RDK is downloadable and works with free VS
    Express Editions (though you probably want the VS
    Team System Edition for your work )

22
Example Unitialized local detection
  • We would like to warn the user that x is not
    initialized before use
  • To do this we need to perform dataflow analysis
  • Well use a plug-in to add this phase to the
    existing Phoenix-based C2
  • int foo()
  • int x
  • return x

23
May and Must examples
  • message must be used before it is defined
  • message may be usedbefore it is defined
  • void main()
  • char message
  • if ()
  • message Hello
  • printf(message)
  • void main()
  • char message
  • char other
  • if ()
  • other Hello
  • printf(message)

24
IR for detecting uninitialized locals
25
Detecting an uninitialized use
  • For each local variable v
  • Examine all paths from the entry of the method to
    each use of v
  • If on every path v is not initialized before the
    use
  • v must be used before it is defined
  • If there is some path where v is not initialized
    before the use
  • v may be used before it is defined
  • Classic solution is to build a control flow graph
    and solve the data flow problem.
  • State is unknown at the start of each block.
    Transfer states between blocks and combine them
    as you traverse the control flow graph

26
Code sketch using classic dataflow
  • bool changed true
  • while (changed)
  • for each (PhxGraphsBasicBlock block in
    function)
  • STATE inState inStatesblock
  • bool firstPred true
  • for each(PhxGraphsBasicBlock
    predecessorBlock in block-gtPredecessors)
  • STATE predecessorState
    outStatespredecessorBlock
  • inState meet(inState,
    predecessorState)
  • inStatesid inState
  • STATE newOutState gcnew STATE(inState)

Update input state
Compute output state
Check for convergence
27
Can we make this easier?
  • Dataflow solution computes the state for the
    entire graph, even at places where v is never
    referenced
  • An alternate model is known as Static Single
    Assignment form, or SSA. It directly connects
    definitions and uses.
  • Phoenix uses SSA and builds flow graphs when
    necessary
  • We can rewrite this code letting Phoenix do most
    of the routine work

28
Code sketch using Phoenix
?
  • for each (PhxIROperand destinationOperand
    in
  • PhxIROperandIteratorDestinations(firstInstr
    uction))
  • if (destinationOperand-gtIsMemoryModification
    Reference)
  • for each (PhxIROperand useOperand
    in
  • PhxIROperandIteratorUse(destinationOperan
    d))
  • if (useOperand-gtInstruction-gtOpcode
    ! PhxCommonOpcodePhi
  • useOpnd-gtIsVariableOpnd)
  • PhxSymbolsSymbol symbolUse
    useOperand-gt
  • AsVariableOpnd-gtSymbol
  • if (symbolUse ! nullptr
    !mustList.Contains(symbolUse))
  • mustList.Add(symbolUse)

29
Uninitialized local plug-in
  • Plug-in is loaded at runtime by Phoenix-based C2

UninitializedLocal.cpp
Test.cpp
C/CLI
C1
Phx-C2
UninitialzedLocal.dll
Test.obj
30
Phoenix C2 with our plug-in added
  • This complete plug-in is provided as a sample in
    the Research Development Kit
  • It is only 400 lines of code to add a key
    warning to the C2 compiler
  • Other types of checking can be added just as
    easily
  • A demonstration of the warnings being emitted

31
Phoenix PE Reading
  • Phoenix can read and write PE files directly
  • You can implement your own compiler or linker
  • You can create post-link tools for analysis,
    instrumentation or optimization
  • Binaries can be read in, raised into IR, changed
    and rewritten as new, working binaries
  • Phoenix Explorer is only 800 lines of code on
    top of the Phoenix binary reading-writing library

32
Phoenix explorer is like ILDasm to IR
33
Binary rewriting with Phoenix
  • mtrace utility injects tracing code into managed
    applications
  • You dont need the source code to do this (you do
    need the PDB)
  • mtrace shows functions being entered and exited

34
How do I get Phoenix?
  • Early access RDKs are available to selected
    universities
  • Sample projects include aspect oriented
    programming, code obfuscation, profiling
  • Contact phxap_at_microsoft.com for Academic early
    access requests
  • Early access CDK is available to selected
    industry partners
  • Contact phxcp_at_microsoft.com for commercial early
    access requests
  • Phoenix RDK/CDKs release about every 6 months
  • Phoenix will be the next MS compiler backend
  • We build the next-generation Windows every night

35
More information
?
  • http//research.microsoft.com/phoenix
Write a Comment
User Comments (0)
About PowerShow.com