Title: On Applying ELAN Strategies in Simulating Processors over Simple Architectures R. M. Neto1, M. Ayala-Rinc
1On Applying ELAN Strategies in Simulating
Processors over Simple ArchitecturesR. M.
Neto1, M. Ayala-Rincón1, R. P. Jacobi1, C.
Llanos1, R. Hartenstein21Instituto de Ciências
ExatasUniversidade de Brasília, Brasília D. F.
Brasil2Fachbereich InformatikUniversität
Kaiserslautern, Kaiserslautern GermanyWRS02
Copenhagen 21-07-2002
2Overview
- Applying rewriting techniques in hardware design
Arvind et al - specification of correct processors formulation
of simple logical digital circuits cache
protocols over memory systems - Correct specification of new features to
processors - Reorder buffers - ROB
- Speculative execution
3Overview
- In Arvinds approach rewriting is neither applied
for simulation nor for verification. Proposal ?
Translate to Verilog! - Simulation and performance estimations using a
rewriting-logic environment - logic plus rewriting allows for
- Discrimination of architectural components
- Execution of assembly programs descriptions
4Overview
- Once correctness is proved, simulations allow
for a statistical analysis of different
implementations.
5Rewriting
- Rewrite rules
- l gt r if C
- Operational semantics
- a rule is applied to a term, when its
left-side matches a sub-term, replacing the
matched sub-term with the corresponding
right-side of the rule. All that, whenever the
premise C of the rule holds.
6Rewriting
l gt r if C
One-step rewriting relation ?
?
C ?
r ?
?
t
whenever
ts
Important rewriting properties termination and
confluence
Notation ? zero or more steps of rewriting
relation s normal or canonical form of
s ?! rewriting to a normal form
7Rewriting
- In the context of processors specification
- terms represent states and
- rewrite rules transformations between states,
according to the instruction set of the
processors. - Beginning from an initial state and applying
these rules we simulate the behavior of
processors.
8Specifying Processors
- AX Architecture
- Instruction set
- rLoadc(v) rLoadpc
- rOp(r1,r2) Jz(r1,r2)
- rLoad(r1) Store(r1,r2)
- Basic Processor
- Single cycle, non pipelined, in-order execution
- SYS Sys(MEM,PROC)
- PROC Proc(ia, rf, prog)
9Specifying ProcessorsBasic Processor
SYS(mem,Proc)
Data Mem
1
Register File
Int Mem
PC
ALU
PROC(ia,rf,prog)
10Specifying Processors
- Jz(r1,r2)
- Defining instructions as rules
- Jz
- Sys(m,Proc(ia,rf,prog)) gt Sys(m,Proc(nia,rf,prog
)) - where instIa() selectinst(prog, ia)
- if isinstJz(instIa) where r1()reg1ofJz(instIa)
- where r2()reg2ofJz(instIa)
- choose try where nia()ia1 if
valueofReg(r1,rf)!0 try where
nia()valueofReg(r2,rf) - if valueofReg(r1,rf)0 end
- end
11Specifying Processors
- Jz Sys(m,Proc(ia,rf,prog)) gt
Sys(m,Proc(nia,rf,prog)) - where instIa () selectinst(prog,ia)
- if isinstJz(instIa)
- where r1() reg1ofJz(instIa)
- where r2() reg2ofJz(instIa)
- choose try where nia()ia1
- if valueofReg(r1,rf) ! 0
- try where nia()valueofReg(r2,rf)
- if valueofReg(r1,rf) 0
- end
- end
- Load Sys(m,Proc(ia,rf,prog)) gt
Sys(m,Proc(ia1,insertRF(rf,r0,v0),prog)) - where inst () selectinst(prog,ia)
- if isinstLoad(inst)
- where r0 () nameofLoad(inst)
- where v0 () getMem(inst,rf,m)
- end
- Set of rewrite rules RB
- Loadc Sys(m,Proc(ia,rf,prog)) gt
Sys(m,Proc(ia1,insertRF(rf,r,v),prog)) - where instIa () selectinst(prog,ia)
- if isinstLoadc(instIa)
- where r () nameofLoadc(instIa)
- where v () valueofLoadc(instIa)
- end
- Loadpc Sys(m,Proc(ia,rf,prog)) gt
Sys(m,Proc(ia1,insertRF(rf,r,ia),prog)) - where instIa () selectinst(prog,ia)
- if isinstLoadpc(instIa)
- where r () nameofLoadpc(instIa)
- end
- Op Sys(m,Proc(ia,rf,prog)) gt
Sys(m,Proc(ia1,insertRF(rf,r,v),prog)) - where instIa () selectinst(prog,ia)
- if isinstOp(instIa)
- where r1 () reg1ofOp(instIa)
12Specifying ProcessorsSpeculative Processor
- To avoid waste of cycles/empty pipeline stages
- Reorder Buffer - ROB
- Holds partially executed instructions
- Renaming Tags/Register correspondence
- Speculative execution
- Branch Target Buffer - BTB
13Specifying ProcessorsSpeculative Processor
SYS(mem,Proc)
Kill/Update BTB
ALUs
Execute
Fetch/Decode/Rename
branch
pmb
Commit
Kill
BTB
Data Mem
Reorder Buffer ROB
Int Mem
PC
mpb
Register File
PROC(ia,rf,itb,btb,prog)
14Specifying ProcessorsSpeculative Processor
- Basic Processor
- Single cycle, non pipelined, in-order execution
- SYS Sys(MEM,PROC)
- PROC Proc(ia, rf, prog)
- Speculative Processor
- Pipelined, out-of-order execution
-
- SYS Sys(MEM,PROC)
- PROC Proc(ia, rf, itb, btb, prog)
15Specifying ProcessorsSpeculative Processor
- Set of rewrite rules RS
- Arithmetic Operation and Value Propagation Rules
- PsOp
- PsValueForward
- PsValueCommit
- Branch Completion Rules
PsJumpCorrectSpec - PsJumpWrongSpec
- PsNoJumpCorrectSpec
- PsNoJumpWrongSpec
- Instruction Issue Rules
- PsLoadcIssue
- PsLoadpcIssue
- PsOpIssue
- PsJzIssue
- PsLoadIssue
- PsStoreIssue
- Memory Access Rules
PsLoad - PsStore
16Specifying ProcessorsSpeculative Processor
- PsOp Sys(m,Proc(ia,rf,ITB(ia1,k,t(k)-Op(v,v1),
wf,sf).itbs2, btb, prog)) gt - Sys(m,Proc(ia,rf,ITB(ia1,k,t(k)-execOponval(v,v1
),wf,sf).itbs2, btb,prog)) - end
- PsJzIssue Sys(m, Proc(ia,rf,itbs,btb,prog)) gt
Sys(m, Proc(pia,rf, - insEndITBs(ITB(ia,k,Jz(
k0,k1),NoWreg,Spec(pia)),itbs), - btb,prog))
- where instIa () selectinst(prog,ia)
- if isinstJz(instIa)
- where r1 () reg1ofJz(instIa) where r2 ()
reg2ofJz(instIa) - where k () lengthof(itbs)1 where k0 ()
searchforLastTag(r1,rf,itbs) - where k1()searchforLastTag(r2,rf,itbs) where
pia()getbtb(ia,btb) - end
17Specifying ProcessorsSpeculative Processor
- PsJumpCorrectSpec
- Sys(m,Proc(ia,rf,ITB(ia1,k,Jz(0,nia),wf,Spec(pia))
.itbs,btb,prog)) gt -
Sys(m,Proc(ia,rf,itbs,btb,prog)) - if piania
- end
18Specifying Processors Reorder Buffer
Program ... roOp(r1,r2) r3Load(r4) r5Op(r3,
r1) ...
Ps_Op_Issue Proc(ia,rf,itbs,btb,...(ia,r5Op(r3
,r1)...)
Issue Rules
Proc(ia1,rf,itbsITB(ia,t3Op(t2,t1)),btb ,prog)
...
t1Op(v,v)
t2Load(t)
t3Op(t2,t1)
t0 v
Ps_Value_Commit Proc(ia,tf,ITB(tv)itbs,...)
Execution in the buffer
Proc(ia,tf,itbs,...)
Memory
Register File
Values Commited
19Specifying Processors Reorder Buffer
Program ... r3Load(r4) r5Op(r3,r1) ...
Ps_Jump_WrongSpec Proc(ia,rf,itbs1ITB(ia,Jz(0,ni
a), Wreg, Spec(pia))itbs2, btb, prog)
Issue Rules
Proc(nia,rf,itbs1, btb, prog)
...
Jz(0,nia),Spec(pia)
t2Load(t)
t3Op(t2,t1)
t0 v
Execution in the buffer
Memory
Register File
Values Commited
20Correctness of the specifications
The speculative processor simulates the basic
one in fact, a basic processor term can be
upgraded to one of the speculative processor
simply by adding an empty ITB and an arbitrary
BTB to the processor.
During some time of the execution over an
speculative processor, if no instruction is
issued then the ITB will soon become empty. Only
instruction issue rules can further expand the
ITB. Thus, we can define another rewriting
system, RITBF, which consist of all rules in RS
except the instruction issue rules.
21Correctness of the specifications
- RB simulates RS
- RS
- Terms of the speculative processor s ?
t - RITBF ?! ?!
- Terms of the basic processor ITBF( s
) ? ITBF( t ) - RB
- Notation ITBF(s) result of eliminating the
empty ITB and the BTB
22Implementation in ELAN
- Philosophy of Rewriting-logic
- combination of possibilities of rewriting and
of logic strategies for controlling application
of rewrite rules. - Also, rewriting logic plus meta-logic.
- Well-known programming environments like
- Maude J. Meseguer, SRI Int. CSL, Menlo Park CA
- ELAN C. kirchner, LORIA/INRIA, Nancy France
and - Cafe-OBJ JAIST, Japan are available.
23Implementation in ELAN
Initial State Assembly Code with Current Memory
State Query
Control of BUFFERS
Logic and strategies
Super user
Instructions, predictions
Rewrite based specification
Computational system
rewrite engine
Transformations
programmer
Result Final State Processor State After Exec
24Implementation in ELAN
- Rewrite rules
- used for specifying the instruction set
- used for specifying the method of branching
prediction
25Implementation in ELAN
Prediction Method(1-Bit)
Not taken
Branch Taken
Branch Not Taken
Not taken
Taken
Taken
26Implementation in ELAN
- Branching prediction
- BTB (1,2).(2,3). ... .(n,m). ...
- nth instruction Jz(r1,r2)
- PsJumpWrongSpec
- Sys(m,Proc(ia,rf,ITB(ia1,k,Jz(0,nia),wf,Spec(pia))
.itbs,btb,prog)) gt
Sys(m,Proc(nia,rf,nilitb,btb1,prog)) - if pia ! nia
- where btb1 ()changebtb(ia1,nia,btb)
- end
27Implementation in ELAN
- Rewriting-logic/strategies
- Control how to apply rules.
Aspects as size and the way of working with the
ROBs may be determined by rewrite strategies.
28Implementation in ELAN
- select one( issue rules )
- select one( issue rules?id
) - repeat ?
n-1 - select one( issue rules?id
) - normalize( select one( non
issue rules ) - Size control of the ROB by strategies.
RITBF
29Implementation in ELAN
Issue Rules
1
2
3
4
ITB
ITB
ITB
ITB
Arithmetic Op
Normalize
Memory Access
Value Forward
Value Commit
30Estimating Processors Performance
- ELAN statistics number of applied rules
- Analyzing eventual performance
- of processor implementations.
- Example
- Number of correct and wrong predictions when
running the same processor with different method
of speculation.
counting the number of applications of
rules PsJumpCorrectSpec PsNoJumpWrongSpec PsJumpW
rongSpec PsNoJumpWrongSpec
31Estimating Processors PerformanceQuicksort
Inst(1,4-Loadc(1)). Inst(2,5-Op(4,0)).
Inst(3,Store(5,97)). Inst(4,6-Op(4,1)).
Inst(5,99-Op(92,100)). Inst(6,Store(6,99))
. Inst(7,2-Load(5)). Inst(8,3-Load(6)).
Inst(9,4-Op(4,98)). Inst(10,5-Op(0,4)).
Inst(11,6-Op(1,4)). Inst(12,99-OpE(2,2,3
)). Inst(13,101-Loadc(61)).
Inst(14,Jz(99,101)). Inst(15,96-Op(2,100)).
Inst(16,95-Op(3,100)). Inst(17,95-Op(95,9
7)). Inst(18,94-Load(96)).
Inst(19,95-Op(95,98)). Inst(20,99-Load(95)).
Inst(21,99-OpE(0,99,94)).
Inst(22,101-Loadc(19)). Inst(23,Jz(99,101)).
Inst(24,96-Op(96,97)). Inst(25,99-Load(96)). Ins
t(26,99-OpE(1,99,94)). Inst(27,101-Loadc(50)).
Inst(28,Jz(99,101)). Inst(29,99-OpE(1,96,95)). I
nst(30,101-Loadc(55)). Inst(31,Jz(99,101)). Inst(
32,90-Load(2)). Inst(33,91-Load(95)). Inst(34,St
ore(2,91)). Inst(35,Store(95,90)). Inst(36,
4-Op(4,97)). Inst(37, 5-Op(4,0)). Inst(38,
6-Op(4,1)). Inst(39,99-Op(95,97)). Inst(40,Store
(5,99)). Inst(41,Store(6,3)). Inst(42,4-Op(4,97))
. Inst(43,5-Op(4,0)). Inst(44,6-Op(4,1)). Inst(4
5,Store(5,2)). Inst(46,99-Op(95,98)).
Inst(47,Store(6,99)). Inst(48,101-Loadc(7
)). Inst(49,Jz(100,101)).
Inst(50,99-OpE(1,96,3)). Inst(51,101-Loadc(2
4)). Inst(52,Jz(99,101)).
Inst(53,101-Loadc(29)). Inst(54,Jz(100,101)).
Inst(55,90-Load(96)).
Inst(56,91-Load(95)). Inst(57,Store(96,91)).
Inst(58,Store(95,90)). Inst(59,101-Loadc(
19)). Inst(60,Jz(100,101)).
Inst(61,99-OpE(0,4,100)). Inst(62,101-Loadc(
7)). Inst(63,Jz(99,101)).
Inst(64,1000-Loadc(999)). nilp
END
32Estimating Processors Performance
- One Bit Vs Two Bit Speculative Prediction
33Conclusions and Future Work
- Rewriting-logic is useful for specifying
correctly processors, but also for testing their
performance. - Going down more levels
- Execution stages fetch-decode-execute by
atomizing the current implemented rules.
Describing in this way more accurately processors
behavior. - Going down more levels
- Logic circuit layout design.
34Conclusions and Future Work
- Important natural aspects of rewriting theory,
nowadays hard to resolve and implement in
programming and proof assistants environments,
may be useful for testing accurately new proposed
technologies - Like a real non deterministic out-of-order
execution.
35Cronograma para Conclusão
36Further Reading
- M. Ayala-Rincón, R. Hartenstein, R. P. Jacobi and
C. Llanos, Designing Arithmetic Digital Circuits
via Rewriting-Logic, http//www.mat.unb.br/ayala/
publications.html - M. Ayala-Rincón, R. Maya Neto, R. P. Jacobi, C.
Llanos and R. Hartenstein, Architectural
Specification and Simulation Through
Rewriting-Logic, http//www.mat.unb.br/ayala/publ
ications.html - Prototypes http//www.mat.unb.br/ayala/Tcgroup
- Talk http//www.mat.unb.br/ayala/
publications.html