Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center

About This Presentation

Title:

Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center

Description:

... care, financial, retail, government), driven by legislations (e.g. SB1386, HIPAA) ... The DBMS software is trusted. Ciphertext only attack ... – PowerPoint PPT presentation

Number of Views:753

Avg rating:3.0/5.0

Slides: 45

Provided by: rakeshAgr

Category:

more less

Transcript and Presenter's Notes

Title: Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center

1
Order Preserving Encryption for Numeric
DataRakesh AgrawalJerry KiernanRamakrishnan
SrikantYirong XuIBM Almaden Research Center
2
Outline

Motivation and Introduction
OPES encryption
Modeling the distribution
Experimental evaluation

3
Motivation

Encryption is rapidly becoming a requirement in a
myriad of business settings (e.g., health care,
financial, retail, government), driven by
legislations (e.g. SB1386, HIPAA)
Encrypting databases unleashes a host of
problems
Performance slowdown
Incompatibility with standard database features
E.g. comparison predicates and the use of indexes
Changes to applications for encryption
Encryption functions now appear in queries

4
Order Preserving Encryption Function
E is an order preserving encryption function, and
p1 and p2 are two plaintext values, and c1
E(p1) c2 E(p2)
if (p1 lt p2) then (c1 lt c2)
5
Threat Model

The storage system used by the DBMS is untrusted,
i.e. vulnerable to compromise
The DBMS software is trusted
Ciphertext only attack
The adversary has access to all (but only)
encrypted values
Guard against percentile exposure
An adversary should not be able to get even an
estimate of true values

6
Design Goals

Query results from OPES will be sound and
complete
Comparison operations will be performed without
decrypting the operands
Standard database indexes can be used over
encrypted data
Tolerate updates

7
Integration of Encryption and Query Processing
Users have a plaintext view of an encrypted
database
We hereafter strictly focus on the OPES algorithms
Comparison operators are directly applied over
encrypted columns
Queries
Plaintext queries are translated into equivalent
queries over encrypted data
Select name from Emp where sal gt 100000
Translation layer
Select decrypt (xsxx) from cwlxss where
xescs gt OPESencrypt(100000)
DBMS
Tables are encrypted using standard as well as
order preserving encryption
Encrypted data And metadata
8
Outline

Motivation and Introduction
OPES encryption
Modeling the distribution
Experimental evaluation

9
Approach

Plaintext data has unknown distribution
User selects the target (ciphertext) distribution
Ciphertext values exhibit the target distribution

10
Effect of OPES Encryption on Plaintext
Distributions
Input Gaussian, Target Zipf
Input Uniform, Target Zipf
11
OPES Key Generation
Sample of source values from the plaintext
distribution
Sample of target values from the ciphertext
distribution
OPES Key Generation
OPES Key
12
OPES Keys
Target to uniform
Target
Source to uniform
Uniform
Uniform
Source
13
Two Step Encryption

Source (plaintext) to uniform
Uniform to target (ciphertext)

14
OPES Encryption
Step II
Step I
Target
Uniform
Uniform
Source
Step II
Step I
Encrypt
Decrypt
15
Outline

Motivation and Introduction
OPES encryption
Modeling the distribution
Experimental evaluation

16
Modeling the Distribution

Histograms
Equi-depth, equi-width, wavelets
Number of buckets required unreasonably large
Over fitting the model
Parametric
Poor estimation for irregular distributions
Hybrid Konig and Weikum 99
Query result size estimation
Approach
Partition the data into buckets
Model the distribution within a bucket as a
spline
Fixed number of buckets

17
Our Approach

Hybrid Konig and Weikum 99
Partition the data into buckets
Model the distribution within each bucket as a
linear spline
The number of buckets is not fixed
We use MDL to determine the number of bucket
boundaries

18
MDL

The best model for encoding data minimizes the
sum of the cost of
Describing the model
Describing data in terms of the model

19
Model Costs

Data Cost
Using a mapping M from pl,ph) to fl,fh), the
cost of encoding pi is
C(pi)log(fi-E(i))
DC(pl,ph) C(pl)C(pl1)C(ph-1)
Incremental Model Cost
Fixed cost for each additional bucket
Boundary value
Boundary parameters
Slope
Scale factor

20
Computing Boundaries

Growth phase
pl,ph) with h-l-1 sorted points
pl1,pl2,,ph-1
Compute spline for pl,ph)
Compute fl,fh) using the spline
Find further split point ps with fs having the
maximum deviation from the expected value
Prune phase
LB(pl,ph)DC(pl,ph)-DC(pl,ps)-DC(ps,ph)-IMC
GB(pl,ph)LB(pl,ph)GB(pl,ps)GB(ps,ph)
if (GB gt 0), the split at ps is retained

21
Scaling
Number of values in a bucket may be
disproportional to the size of the bucket
Uniform
x
x
x
x
x
Source
x
x
x
x
x
b
b1
b-1
22
Updates

The scale factor ensures that each distinct
plaintext value maps to distinct ciphertext
values
Encrypted values need not be recomputed unless
the distribution of plaintext values changes

23
Quality of Encryption

KS Statistical Test
Can we disprove, to a certain required level of
significance, the null hypothesis that two data
sets are drawn from the same distribution
function?
If not, then the ciphertext distribution cannot
be distinguished from the specified target
distribution

24
Duplicates

Assumptions
A large number of duplicates may leak information
about the distribution of values
Alternatively,
Map duplicates to distinct values
if (f M(p), f M(p1))
f,f) M(p)
Equality expressed as a range
Equi-joins can no longer be expressed
However, many numeric attributes (e.g., salary)
may rarely be used in joins

25
Outline

Motivation and Introduction
OPES encryption
Modeling the distribution
Experimental evaluation

26
Experimental Evaluation

Percentile exposure
Updatability
Key size
Time overhead

27
Datasets

Census
UCI KDD archive, PUMS census data (30,000)
records
Gaussian
Zipf
Uniform

Default
Source Gaussian Target Zipf
28
Percentile Exposure
Source distribution Target distribution Average change in percentile
Census Gaussian 37
Census Zipf 7
Census Uniform 38
Gaussian Zipf 45
Gaussian Uniform 17
Zipf Uniform 44
29
Time to the Build Model
30
Insertion Overhead
31
Cost of Additional Insertion
32
Retrieval Overhead
33
Retrieval Time
34
Related Work

Polynomial functions
Ignores the distribution of plaintext/ciphertext
values
Database as a service
Requires post processing of query results
Privacy homomorphisms
Comparison operations not investigated
Keyword searches on encrypted data
Designed for keyword retrieval
Range queries not supported
Smartcard-based schemes
Infeasible for large ranges
Order-preserving hashing
Protecting the hash values from cryptanalysis is
not a concern, nor is deciphering plaintext
values from hash values
Designed for static collections

35
Closing Remarks

Ensuring safety without impeding the flow of
information is a hard problem
Current choices
Plaintext database
Encrypted databases with loss of functionality or
performance
Our approach focused on the trade-off between
security and efficiency
We developed an algorithm which could easily be
integrated with current systems

36
Backup
37
Encode
Encode(p) z(sp2p) p c 0,ph), s q/(2r), z gt
0 distribution has density function qp r p is
the source (target) value s is the quadratic
coefficient z is the scale factor
38
Decode
z ! z2 4zsf
Decode (f)
2zs
f c 0, fh), s q/(2r), z gt 0
f is the flattened value s is the quadratic
coefficient z is the scale factor
39
Order Preserving Encryption
No Name Position Salary Location

Ciphertext is the index value

Effectively hides the distribution of plaintext
values
The key size is proportional to the number of
distinct attribute values
Any updates require recomputing the key and
ciphertext values

Ciphertext Plaintext
1 28000
2 35000

Cn Pn
Compute distinct attribute values in ascending
order
40
Target Distribution Requirement

Why isnt the source-to-uniform transformation
sufficient for order preserving encryption?
It is, but
The target distribution may cause an adversary to
make incorrect assumptions about the source
distribution
The organization of the source distribution
cannot be inferred from the target

41
Quadratic Coefficient
x
x
x
x
x
x
x
x
x
x

v
b1
b2
i1
j1
i2
j2
j2 i2
j1 i1
-
vj2 vi2
vj1 vi1
q
q
s
vb1 vb2
j1 i1
2
vj1 vi1
42
Scale Factor Constraints
for all p c 0,w) M(p1) M(p) o 2
Ensures that there is a distinct mapped value for
each input value
wf Kn
The width of a bucket in the mapped space is a
function of the number of elements n in the
bucket K is the minimum width needed across
buckets
43
Scale Factor
The scale factor will stretch short buckets to
the width of the largest bucket, further
increasing the dimension of a bucket by a factor
of the number of elements in the bucket
Kn
z
sw2 w
K max x(swi2w), i 1, , m,
2, s o 0 2/(1 s(2w 1)), s lt 0
x
44
Slope
The values within a single bucket are unevenly
distributed within the bucket
b-1
b

Write a Comment

User Comments (0)

About PowerShow.com

Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center - PowerPoint PPT Presentation

Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center

... care, financial, retail, government), driven by legislations (e.g. SB1386, HIPAA) ... The DBMS software is trusted. Ciphertext only attack ... – PowerPoint PPT presentation