Auditing Compliance with a Hippocratic Database Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden Research Center - PowerPoint PPT Presentation

About This Presentation
Title:

Auditing Compliance with a Hippocratic Database Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden Research Center

Description:

... her medical information with pharmaceutical companies for marketing purposes ... promotional literature from a pharmaceutical company, proposing over the counter ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Auditing Compliance with a Hippocratic Database Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden Research Center


1
Auditing Compliance with a Hippocratic
DatabaseRakesh AgrawalRoberto BayardoChristos
FaloutsosJerry KiernanRalf RantzauRamakrishnan
SrikantIntelligent Information Systems
ResearchIBM Almaden Research Center
2
Outline
  • Introduction and motivation
  • Problem statement
  • Foundations
  • System organization and algorithms
  • Performance
  • Summary

3
Motivation
  • Hippocratic databases advocate policy directed
    data management for privacy sensitive data
  • Need reinforced by legislations and regulations
  • Health Insurance Portability Accountability Act
  • Gramm-Leach Bliley Act Consumer Privacy Rule
  • Goal
  • Build a system to assist with auditing compliance
    with the stated policy
  • Event driven - privacy complaint
  • Periodic - monitor exposure to privacy violation

4
Audit Scenario
The doctor must now review disclosures of Janes
information in order to understand the
circumstances of the disclosure, and take
appropriate action
Sometime later, Jane receives promotional
literature from a pharmaceutical company,
proposing over the counter diabetes tests
The doctor uncovers that Janes blood sugar level
is high and suspects diabetes
Jane complains to the department of Health and
Human Services saying that she had opted out of
the doctor sharing her medical information with
pharmaceutical companies for marketing purposes
Jane has not been feeling well and decides to
consult her doctor
5
Audit Expression
Who has accessed Janes disease information?
audit T.disease from Customer C, Treatment
T where C.cidT.pcid and C.name Jane
6
Outline
  • Introduction and motivation
  • Problem statement
  • Foundations
  • System organization and algorithms
  • Performance
  • Summary

7
Problem Statement
  • Given
  • A log of queries executed over a database
  • An audit expression specifying sensitive data
  • Precisely identify
  • Those queries that accessed the data specified by
    the audit expression

8
Suspicious Queries
A query Qi has accessed information contained in
the Customer table
The audit expression A specifies the data to the
audited
Customer table
cid name address zip
1 Jane 1234 95120




If query Qi accesses all the cells specified by
the audit expression A for any row, Qi is
suspicious
9
Issues
  • Convenient language
  • Audit expression (essentially SPJ query)
  • Fast and precise on audits
  • Non disruptive
  • Minimal performance impact on normal database
    operation
  • Fine grained

10
Assumptions
  • Disclosures stemming from multiple query
    executions is not considered
  • No use of outside knowledge to deduce information
    without detection
  • Queries considered include
  • Joins and aggregation, but not nested subqueries
  • Note that existential subqueries can be converted
    into joins SIGMOD92

11
Outline
  • Introduction and motivation
  • Problem statement
  • Foundations
  • System organization and algorithms
  • Performance
  • Summary

12
Informal Definitions
  • Candidate query
  • Logged query that accesses all columns specified
    by the audit expression
  • Indispensable tuple (for a query)
  • A tuple whose omission makes a difference to the
    result of a query
  • Suspicious query
  • A candidate query that shares an indispensable
    tuple with the audit expression

13
Indispensable Tuple
Output columns in Q
Predicates in Q
Duplicate preserving projection operator
Tables common to Q and A
The SPJ query Q and the audit expression A are of
the form
Columns appearing anywhere in Q
14
Candidate Query
Definition 6 - Q is a candidate query with
respect to A if
Only candidate queries can be suspicous queries
15
Suspicious Query
For example, Query Q Addresses of people with
diabetes Audit A Janes diagnosis Janes tuple
is indispensable for both hence query Q is
suspicious with respect to A
16
Outline
  • Introduction and motivation
  • Problem statement
  • Foundations
  • System organization and algorithms
  • Performance
  • Summary

17
System Overview
Query with purpose, recipient
Audit expression
IDs of log queries having accessed data specified
by the audit query
Updates, inserts, delete
Generate audit query
Database Layer
Audit
Database triggers track updates to base tables
Static analysis
Database Layer
Data Tables
Backlog
Audit query
Query Log
ID Timestamp Query User Purpose Recipient
1 2004-02 Select James Current Ours
2 2004-02 Select John Telemarketing public
18
Static Analysis
Query Log
ID Timestamp Query User Purpose Recipient
1 2004-02 Select James Current Ours
2 2004-02 Select John Telemarketing public
Audit expression
Eliminates queries that could not possibly have
violated the audit expression Insures that
Accomplished by examining only the queries
themselves (i.e., without running the queries)
Filter Queries
Candidate queries
19
Audit Query Generation
  • Goal
  • Build a query which, when run, returns the ids
    of suspicious queries with respect to an audit
    expression A

20
Generating the Audit Query
Audit Expression
Replace each table with its backlog to restore
the version of the table to the time of each query
Combine the audit expression with individual
candidate queries to identify suspicious queries
Combine individual candidate queries and the
audit expression into a single query graph
Candidate Query 1
Candidate Query 2
Lines represent input/output relationships
between operators
QGM is a graphical representation of a query
Boxes represent operators, such as select
Boxes with no inputs are tables
21
Suspicious SPJ Query
The candidate SPJ query Q and the audit
expression A are of the form
Proof of correctness is based upon Definition 7
(suspicious query) and given in the paper
QGM rewrites, shown in previous slide, transform
Q and A into
Theorem 2 - A candidate SPJ query Q is
suspicious with respect to an audit expression A
if and only if
22
Suspicious Aggregate Query (Including Having)
  • Solution in the paper

23
Example
Janes audit
24
Audit Expression
Who has accessed Janes disease information?
audit T.disease from Customer C, Treatment
T where C.cidT.pcid and C.name Jane
25
Query Log
Query 1 was executed at time T3
ID Query TS User Purpose Recipient
1 select name, address, zip from Customer, Treatment where disease diabetes and cidpcid T3 james marketing others
2 select name, address from Customer where zip95112 T3 john contact others
26
Backlog Table (Time Stamp)
Operation on a tuple among Insert, Update and
Delete
Janes record was inserted at time T2 and updated
at time T4. The backlog table records both
versions of her information
Timestamp of the operation
C. S. Jensen, L. Mark, and N. Roussopoulos TKDE
1991
Name Address OPR TS
Jane 1234 I T2
Jane 1234 U T4
Alice I T1
Attributes also in the source table
Attributes only in the backlog table
27
Merge Logged Queries and Audit Expression
Merge logged queries and audit expression into a
single query graph
T.s
C.n, C.a, C.z
Select T.sdiabetes and T.pC.c
audit expression T.pC.c and C.n Jane
C
T
T
C
c, n, , t
p, r, , t
Customer
Treatment
28
Transform Query Graph into an Audit Query
Q1
The audit expression now ranges over the logged
query. If the logged query is suspicious, the
audit query will output the id of the logged query
audit expression X.n Jane
X
C.n
Select T.sdiabetes and C.cT.p
C
T
View of Customer (Treatment) is a temporal view
at the time of the query was executed
c, n, , t
p, r, ..., t
Customer
Treatment
29
Scenario Outcome
  • The audit uncovers that Query 1 in the query log
    accessed Janes information

30
Outline
  • Introduction and motivation
  • Problem statement
  • Foundations
  • System organization and algorithms
  • Performance
  • Summary

31
Empirical Evaluation Goals
  • Cost of maintaining backlog tables
  • Understand the impact of maintaining backlog
    tables on ongoing database operations
  • Cost of running audits
  • Understand whether audits can run in reasonable
    time

32
Experimental Setup
  • IBM M Pro 6868 Intellistation
  • 800 MHz Pentium III processor
  • 512 MB of memory
  • 16.9 GB disk drive
  • Windows 2000 Version 5, SP 4
  • DB2 v7 with default settings
  • TPC-H database
  • Supplier table
  • 100,000 tuples

33
System Structures
  • Indexing
  • Eager indexing
  • Maintain an index over the backlog table
  • Maintained during ongoing database operations
  • Lazy indexing
  • No index over the backlog table
  • Create indices at the time of audit
  • Choice of index
  • Simple index
  • Primary key of source table
  • Composite index
  • Primary key of source table
  • Time stamp

34
Impact on Ongoing Operations
  • Queries
  • Additionally log the query string
  • Already performed in many application
    environments
  • Updates
  • For each updated tuple,
  • Insert a tuple to the backlog table
  • Inserts and deletes are handled similarly
  • In a majority of environments, queries are much
    more frequent than updates

35
Update Performance
  • 100,000 tuples in Supplier table
  • Update statement updates all tuples
  • Each update statement fires triggers which
    inserts an additional 100,000 tuples in backlog
  • Evaluate impact of multiple versions on
    performance

36
Overhead on Updates
Simple wins over Composite
Number of version of each tuple in the Supplier
backlog table
7x if all tuples are updates 3x if a single
tuple is updated
Eager indexing doesnt add much cost
37
Audit Query Performance
Audit query select Q from Supplier where skey
k Experiment Evaluate the impact of the
number of versions of tuples in the backlog
table on performance
38
Audit Query Execution Time
Composite wins over simple if initial version is
selected
Simple wins over composite if the current version
is selected
39
Takeaways
  • The composite index
  • Enhances the performance of audits, but
  • Additionally burdens updates when using eager
    indexing
  • The system supports
  • Efficient auditing
  • Without substantially burdening normal query
    processing

40
Related Work
  • Oracle Privacy Security Auditing
  • Facility for logging queries with timestamp
  • Flash-back queries
  • Restores the version of the data at the time of
    the query
  • No support for automated auditing
  • User manually selects queries from the log and
    runs them
  • The user to decide if the query is suspicious
  • G. Miklau D. Suciu SIGMOD 2004
  • Formal analysis of information disclosure in data
    exchange
  • Is information about a secret query S revealed by
    views V1,,Vn
  • Considers all possible instances of a database
    schema
  • Assumes tuple independence
  • Were interested in given instances (temporal
    versions)
  • Nonetheless, it will be interesting to explore
    the connection between the two works
  • Active enforcement of policies by limiting
    disclosure VLDB04
  • Literature on multi-query optimization

41
Summary
  • In light of new privacy legislation
  • The problem of auditing usage of information
    represents an important opportunity for database
    research
  • Formalized the problem through the fundamental
    concepts of indispensable tuple and suspicious
    queries
  • Achieved our design goals

42
Design Goals
  • Convenient language
  • Fast and precise on audits
  • Non disruptive
  • Minimal performance impact on normal database
    operation
  • Fine grained

43
Backup
44
Multiple Candidate Queries
Union
Q1
Q2
audit expression C.n Jane
audit expression C.n Jane
45
Aggregate Queries with Having
The join on aggregate columns ensures that the
group being tracked by the audit has not been
eliminated by the having clause
c1, , ci
Qh
select
c1, , ci, agg1, , aggn
Qg
group c1, , ci
Qs
46
Dynamic Temporal Views
View of Customer table at time t
c id n name a address h phone z zip o
contact t marketing ts ts op opr
C1
Time stamp of the logged query
c, n, a, h, z, o, t
Select ts lt t andop ltgt delete andnot(C5)
C5

Exists C4.ts lt t andC3.c C4.c andC4.ts gt
C3.ts
C3
C4
c, n, a, h, z, o, t, ts, op
Customer_backlog
47
Cost of Building Indices over Backlog Tables
Write a Comment
User Comments (0)
About PowerShow.com