Title: Towards Phase Change Memory as a Secure Main Memory
1 Towards Phase Change Memory as a Secure Main
Memory
2Phase Change Memories the technology promises
- Non volatile RAM
- More scalable than DRAM (up to 4X)
- No leakage
- Read access time in the same range as DRAM
- or at least close
- But limited write endurance
- 10 Mwrites ? 100 Mwrites ? 1Gwrites ?
3ISCA 2009 (june)
- 3 papers on using PCM memories as main memory
- Concentrate at showing that simple mechanisms
would allow a PCM main memory to accommodate
conventional applications for the computer
lifetime - Did not even notice the security breach
- Overwrite attack
- can just physically destroy the memory
- can be run by any user without any priviledge
- just want my machine to be replaced before the
end of the 3 years guarantee
Main memory should resist YEARS to overwrite
attacks
4(No Transcript)
5Start-Gap scheme, Micro 2009 (dec)
- Still targeting normal users applications
- Physical address to PCM address translation is
dynamically changed at runtime - Randomization to avoid hot write cells
associated with spatial locality - Security as a by-product of randomization
- First study to consider possible malicious
attack - Region-based Start-Gap scheme
6PCM address is invisible
7Courtesy from Moinuddin Qureshi
Start-Gap Wear Leveling
Two registers (Start Gap) 1 line (GapLine) to
support movement. Move GapLine every G writes to
memory.
?START
0
A
1
B
2
C
3
D
4
PCMAddr (StartAddr) (PCMAddr gt Gap)
PCMAddr)
Storage overhead less than 8 bytes (GapLine
taken from spares) Write overhead One extra
write every G writes ? 1 (G100)
Randomized address space to avoid hot region
and predictability
8The security on RBSG
- W the write endurance
- On a given region of S blocks, the PA-to-PCMA
address translation of one block is changed every
Gap writes induce an extra PCM block write - For a given physical block PA-to-PCMA translation
is guaranteed to change every GapS writes - For a given physical block PA-to-PCMA translation
is periodic with period -
GapS lt W
9RBSG (Micro 2009)
- W 32M
- S 256Kblocks, Gap 100
- 4Ghz write acces time, 4Kcycles 1Mwrite/s
- Basing security on low write bandwidth
(256Mbytes/s) ? - Resist to overwriting same physical block for 4
months - (77 days from my counting !!)
10Birthday Paradox Attack(BPA)
- In a group of 24 persons it is likely (pgt1/2)
that at least two persons have the same birthday. - In a sequence of 9645 randomly selected elements
in a set of 64M memory blocks, it is likely to
have twice the same element.
Micro 2009 - RBSG hypothesis 4GBs/s write
bandwidth should resist 4 years at full
bandwidth interleaving 16 sequences of 32M
writes on 16 different addresses
4 1/2 hours of write endurance (first
failure)
11Sandbagging RBSG against BPA
- Reduce region size S, reduce Gap
- SGap ltlt W
- S128K, Gap64
- Optimized BPA 11.5 days
- RAA 48 days
- S64K, Gap64
- Optimized BPA 97 days
- RAA 24 days
- BUT ..
12Combined BPA-RAA
- 1/16 th of the bandwidth for RAA, 15/16 th for
BPA - S 64K, Gap 64
- 14.25 days
- S256K, Gap 8
- 61 days, but 10 write overhead
- But no page mode ?
13RBSG page mode
- The PA-to-PCMA translation granularity is a page
- 4KB pages write overhead 16 blocks
- Gap 128 (12.5 write overhead), S32K pages
- 4 1/2 days
14And spare lines ?
- Main memory are implemented with spare blocks to
get some permanent fault tolerance. - Any spare line can replace any memory line
- Gap100, 64K spares, no page mode
- RAA-BPA 51 days
15Spare lines page mode
- Gap 128,
- 1K spares 7.75 days, S32K pages
- 64K spares 16 days, S 64K pages
- Endurance 128M writes
- 1K spares 65 days, S 128K pages
- 64K spares 110 days, S 128K pages
16Still want to use PCM main memory and guarantee
the hardware for 3 years ?
17Or
18S-PCM memory
- Security as the first class citizen
- Should resist to attacks for a sizeable fraction
of the expected lifetime
19Principles for a secure PCM main memory
- Invisible PA-to-PCMA translation
- Malicious user cannot figure out PA-to-PCMA
translation - Complete randomization of the PA-to-PCMA
translation changes - Any physical block could be mapped onto any PCM
block - Defeat RAA
- Frequent changes of the PA-to-PCMA translation
- Defeat BPA
- Experimentally, translation change frequency must
be much higher than 1/W to reach 50 of the
expected memory life time (256/W in practice)
20Implementation principles
- Use of a PA-to-PCMA translation table
- One entry for a region of R blocks
- A physical region is mapped on a PCM region
- A block can be mapped on any block in the target
region - PA-to-PCMA translation change
- Only on writes
- Randomly trigerred with frequency F
- No counter only a random number generator
- Swap two PA-to-PCMA translations
21Some implementation constraints
- A region must be larger than a page
- 16 GB memory, 4KB pages 4M pages ..
- Regions should be large
- 256KB ? 64Kentries
- 4MB ? 4Kentries
- A PA-to-PCMA translation change induces 2 R
memory block reads and 2 R memory block writes - For limiting write overhead, should limit the
frequency F
22Dealing with the constraints
- W 32M, 16GB memory, 256 bytes blocks,
- 1 extra write per 8 writes
- F 256/W ? ?50 total write endurance
- extra write bandwidth 2SF 1/8
- S 8K blocks ? 8K 26-bit translation table
entries - 26Kbytes, not a huge table !!
- ?52 total write endurance
- 4GBs/s 2 years of endurance to BPA or RAA
23Initializing the translation table
- The translation table has to set a one-to-one
mapping - Boot-time initialization ? With random
mapping ?
24Physical memory address space
Initialized with zeros at boot-time
Initialized at boot-time
25Swapping two translations blocks
- T(A).addr oldT(B).addr?B?A
- T(A).disp oldT(A).addr?RAND
- T(B).disp oldT(B).addr?RAND
- Randomizing the displacement is needed to avoid
attacks on a fixed position in the region
26Managing region swaps
- Large regions have to be swapped on PA-to-PCMA
translation changes - Normal reads and writes should not to be stopped
- Randomly triggered PA-to-PCMA translation changes
- The memory controller must interleave normal
access flows with region swapping - In practice, a random priority biased to normal
access flow limits the buffer of regions to be
swapped.
27Endurance of the secure PCM memory
- 16GB memory, 256B blocks, 4Kblocks regions
- 52 Kbytes translation table
-
Endurance
32M 64M 128M 256M
3.125 42 53 66 74
12.5 62 69 74 79
Write overhead
Expected life time under attack
28Endurance of the secure PCM memory
- 16GB memory, 256B blocks, 64Kblocks regions
- 3.25 Kbytes translation table
-
endurance
32M 64M 128M 256M
3.125 3 min 0.4 7.4 19
12.5 7.4 ?3 months 19 38 51 ? 2 years
Write overhead
Expected life time under attack
29And normal applications ?
- Region swap after 1/F writes (average)
- In a swap interval
- Malicious attacks
- One block 1/F writes, the other blocks no writes
- Normal applications
- A total of 1/F writes on different blocks in the
same region - For a single PCM block swap frequency is much
higher than F - Endurance is very close to theoretical
30- S-PCM
- Years of endurance
- Address translation
- Table read XOR
- - Hardware logic for region swapping
- RBSG
- - Days of endurance
- - Address Translation
- 1st logic table read 2nd logic
- Simple logic for page moving
31Conclusion
- If PCM technology delivers then secure PCM main
memory will be possible - Wear leveling comes for free with security
- Main overhead costs
- Hardware logic to interleave region swapping with
normal access flow - Random number generator
- Will fix write overhead to less than 1 for
normal workload (just adapt ideas from
Moinuddin) - No need for monstruous cell endurance
32Disclaimer
- There might be other forms of attacks
- Probably not on the scheme by itself
- randomization is a quite good defense
- Side channels attacks against specific hardware
implementations - E.g. concentrate attack on a single bank
33An attack against new Moinuddins scheme
34repeat A (x N) Random (x M)
With Moinuddins parameters N84, M1792, Gap
min(128,d),LRU stack 4 entries Same block written
22M times before PA-PCMA translation change
BPA 7 days and that is it !!
35But that might be corrected
- decrease the gap factor
- Gap Min (128, d/32), 3.5 M consecutive writes
- decrease the region size
- Gap Min(128,d), 512K regions, 2.75 M
consecutive writes
36Concern
- Each new attack generates new countermeasure
- Extra hardware complexity
- New opportunity for new attacks
- Possibility of snowball effects
37New attack opportunities
- decrease the gap factor
- Gap Min (128, d/32), 3.5 M consecutive writes
- Combined with a RAA 4 months
- decrease the region size
- Gap Min(128,d), 512Kblocks regions, 2.75 M
consecutive writes - RAA is improved by a 8x factor