Title: Semantically Enhanced and Efficient Enforcement of Mobile Consumers Privacy Preferences
1Semantically Enhanced and Efficient Enforcement
of Mobile Consumers Privacy Preferences
Nabil R. Adam, Mahmoud Youssef and Vijay Atluri
Center for Information Management Integration and
Connectivity (CIMIC) Rutgers University
Presentation at SAP Research Labs, Palo
Alto 4/5/2005
2Outline
- Introduction
- Research Problem
- Part I A Solution with Focus on Efficiency
- Controlling Information Flow
- Access Control Model
- Evaluation and Enforcement Mechanism
- Comments on the System Design
- Performance Evaluation Study
- Summary (Part I)
- Part II A Solution with Focus on Expressiveness
- Another Privacy Criterion
- Quick Look at Description Logics, OWL , and RDF
- Preferences Ontology
- Query Processing
- Implementation
- Performance Evaluation Study
- Summary (Part II)
3Scenario Location-based Advertising
- With the availability of positioning and tracking
technology, it is possible to - Track different entities, e.g., vehicles,
containers, and individuals - The Location Service (LS) aggregates location
information - Location information is managed as Moving Objects
Database (MOD) - Merchants customize offers based on consumer
profile and location.
4The Moving Objects Problem
- Moving objects need special data modeling due to
- The rate of update
- Too many customers sending continuous updates.
- Traditional databases are not designed for
intensive updates - The same problem exists in the RFID domain.
- Queries usually need to address the future
- Type of queries
- Queries submitted by the consumers are also
moving
3
2
6
R2
5
7
1
R4
8
R3
1
2
3
4
5
6
7
8
R1
4
(a)
3
R1
2
6
R3
5
7
1
8
4
R2
1
5
6
7
8
(b)
The structure of R-tree has to changes
drastically due to the movement of the objects
5The Moving Objects Spatio-temporal Model (MOST)
- In the MOST, Sistla et al., 1997
- A database attribute that is continuously
changing is considered a dynamic attribute - That requires less updates
- linear change is assumed.
- MO indexing schemes index objects in the
projections, in the d-dimensional space, or in a
transformed space. - We use projected trajectories in our computations
- How good is the linearity assumption?
Y
O1
X
X
tnow
T (Time)
6The Tradeoff BetweenPersonalization and Privacy
- Personalization involves
- collection of profile and location information
which raises privacy concerns - Studies by Chellappa et al., Harn et al., and
Spiekermann et al. show - consumers do not opt-in to online services when
they do not trust merchants for their profiles. - consumers are willing to tradeoff their
information with trusted vendors for convenience.
- Even among the privacy-concerned consumers.
7The Tradeoff Between Personalization and Privacy
(Contd)
- Analysis of over 120 surveys show Westin, 2002
- Change in the attitude of 3/4 of American
consumers towards privacy from a modest to a high
intensity matter - Three segments of consumers fundamentalists,
unconcerned, and pragmatic. - The size of the pragmatic group is 125 Million.
- The challenge to businesses is to address the
needs of this group - Provide convenience
- Protect Privacy
8One Trusted Third Party
A Proposed Solution The consumer to trust only
one third-party
The Problem The consumer has to trust too many
merchants for her profile
It stands to reason then that the LS assumes that
role
9Basic Approaches to Privacy Protection
- Device-based Approach (e.g., Schilit et al.,
2003) - Advantage consumer does not have to trust anyone
- Limitations
- Consumer receives all the messages. Who is being
charged for transmission? - Too much load on the network.
- Requires powerful devices.
- Trusted Third Party
- Anonymity Approaches The Anonymizer Project
- Do not support identity-based analysis (e.g.,
purchase history) - Consumers still have to trust the anonymizer.
- Our Proposed Approach (Access Control with
Controlled Information Flow)
10The Environment The Players
- The player in this environment are
- The LS maintains consumer information, enforce
their privacy policy, and provides answers to
queries - Information Requester a merchant or a marketing
intermediary - Location Information Providers e.g., the
Wireless Networks - Information Owners the consumers
11The Research ProblemPolicy Requirements
- Preventing unauthorized sharing of consumer
information among information requesters - Consider the spam problem
- Preventing misuse of permitted access to consumer
information - If access policies are based on merchant
identity, merchants can violate consumer
preferences in terms of time and location. - Access policies need to have spatio-temporal
constraints
12The Research ProblemUser Interface Requirements
- Consumers need a user-friendly approach to
defining policy rules - Access rules should be defined at different
granularities. - However, such representation will create
granularity conflict - Example
- R1 (Hilton, c1_info, read, -) ltEssexCounty,
all_timegt - R2 (Hotels, c1_info, read, ) ltNJ, week_daysgt
13The Research ProblemPolicy Enforcement
Requirements
- Two Capabilities are required
- Addressing the impact of consumer motion and its
interaction with the spatio-temporal constraints. - Spatio-temporal conflict The location query may
intersect with the spatio-temporal constraints of
more than one access rule, e.g., - During the time interval of the query the
customer will pass by two locations (Hudson
County and NYC) which she has different
permissions for. - Translating between geospatial coordinates, as
expressed in the MOD, and civil names, as
expressed in the constraints, e.g., - MOD Current Location of Customer C1 (74. 32145,
40.75321) - Access Rule (Hotels, No Access) ltNew York City,
All timesgt
14The Research ProblemScalability and Efficiency
Requirements
- The system has to accommodate for growth in the
number of consumers and merchants - yet
- Not adversely impacting the overall performance
of the query processing.
15Summary of The Challenges
- How to prevent the illegal sharing of consumer
information? - How to efficiently resolve spatio-temporal and
granularity conflicts? - How to efficiently compute the interaction among
the spatio-temporal constraints and the location
information? - How to translating between geospatial coordinates
and civil names?
16Part I A Solution with Focus on Efficiency
17Overview of the Proposed Solutions
- 1. Control information flow to merchants
- 2. Develop an access control model that allows
- Specification of spatio-temporal policies
- Example merchant Hilton has access to my
information when I am outside New Jersey on
Weekdays. - Representation of merchants, location, and time
at different levels of granularity. - 3. Efficient enforcement of access control
- Turn the problem into a string search problem
181. Controlling Information Flow
- Solution
- Merchants send information related to a specific
offer along with query to the LS - The LS runs the query producing a list of
consumers IDs who satisfy the merchant criteria - The LS enforces the access control which filters
the IDs - The filtered IDs are then forwarded with the
advertisement to the wireless networks to deliver
them to the consumers - The wireless network sends the offers to the
consumer devices and reports to the LS then - The LS sends pseudonyms to the merchants.
192. The Proposed Access Control Model
- An access rule consists of an authorization
triple and a constraint - (s, o, /-), ltstcgt
- Where
- s ? S is a subject, i.e., a merchants at some
granularity. - o ? O is an object, i.e, a consumer ID ? l,p,
where l, p is location and profile information. - /- is a flag ,i.e, grant/deny.
- stc is a spatio-temporal constraint consisting
of a civil location and a time interval. - Spatial and temporal constraints are generalized
to stc - The only Access Mode is read.
- no need to represent it in the model.
- Generic access rule
- (s, ID?lp, /-), ltstcgt
202.1 Model Components Representation
- All Components are represented as hierarchies
(except the ID and the flag) - These hierarchies hold several properties
- In every level in a hierarchy, the nodes are
exact decomposition of their parents (i.e., the
parent is the union of children and the children
are disjoint). Thus - the root always represents All Members
- the leaves are the members at their most specific
representation. - No multiple inheritance.
Subject Hierarchy
212.2 Order of Hierarchies and Precedence
- We adopt the following order
- ID ? Object ? Subject ? Location ? Time ? Flag
- The order among hierarchies implies precedence
- Precedence has no impact on the model behavior as
long as the same order is followed in the
specification and evaluation of access rules - However, it has impact on the notion of relative
specificity, as we will see later
222.3 The System State
- The system state includes partial instantiations
of the hierarchies. - Each instance of a hierarchy includes only the
nodes that have access rules defined on them. - The instances belonging to the same consumer can
be seen as a tree
232.4 Conflict Resolution
- For spatio-temporal conflict ? denial precedes
grants - I.e., being conservative.
- For Granularity Conflict ? Inheritance with
Overriding - Nodes not in the instance are
- Assumed to virtually exist, and
- Inherit permissions from the next existing
ancestor - Nodes in the instance
- More specific rules override less specific ones
- The semantics must be conveyed to the consumer
242.4.1 Relative Specificity Among Rules
- For two rules R1 and R2 for the same consumer, R1
is more specific than R2 if - R1 has a more specific object than R2 i.e., R2
has lp and R1 has l or p - R1 and R2 have the same object AND R1 has a more
specific subject - R1 and R2 have the same object and subject AND R1
has a more specific location or - R1 and R2 have the same object, subject, and
location AND R1 has a more specific time.
252.5 Advantages of the Model
- Support for more efficient search.
- Overriding motivates that the search starts from
the most-specific representation. - We exploit that by adaptively searching for the
most specific rule that matches some search key. - The system size is kept small
- Since instances are partial instantiations.
- Component representation is granular.
- This streamlines the user interface,
- Provides support for aggregate queries.
263. Evaluation of Access Control
- Evaluation involves
- Compose search keys
- Match them against the access rules.
- Each consumer in the query result can generate
multiple search keys - based on the intersection between the query and
the consumers motion line. - Granular representation is another source of
search keys. - Definition A spatio-temporal window is a
combination of a time leaf and a location leaf.
273.1 The Evaluation Procedure
- The evaluation proceeds as follows
- For each consumer and for each spatio-temporal
window that the consumer passes through, a search
key is created. - For each of the created keys, an adaptive search
operation is performed and a flag is retrieved. - The flags that belong to the same consumer are
combined using the denial precedes grants rule.
Check YAA05 for detailed computations
283.2 Components of the Evaluation and Enforcement
Mechanism
- A spatio-temporal module
- Provides computations for interaction between
moving objects and consumer location information.
- Translates geospatial coordinates to civil names.
- Built on top of Oracle Spatial using Oracle
Pre-compiler (ProC/C and PL/SQL) - An encoder
- Encodes both access rules and search keys into
equal-length alphabetical strings. - The ASM-trie (the Adaptive Search Multiway-trie)
- Performs the adaptive search on specially encoded
strings.
293.2.1 The Encoder
- In the access rule (search key), each hierarchy
substring is drawn from a table that encodes that
hierarchy. - Depending on the max cardinality of children, one
or more letters are used for each level, e.g.,
one letter for region, and 2 letters for state. - There is no one-to-one relationship between nodes
in an access rule and the nodes in the ASM-Trie. - Adaptive search is not just back tracking
30The Encoders Support for the Adaptive Search
- Letter a is used as padding to give equal
length to all substrings - This way, it also represents the parent node in
the access rule. - Letter a is never used in encoding a child.
- The ID substring is encoded in uppercase to
indicate that adaptive search is not supported.
313.2.2 The ASM-trie
- The ASM-trie is a main memory structure that
supports adaptive search. - In the ASM-trie, the node includes
- 27 pointers-to-node to represent the alphabet and
a null character - Letters are implied by their order (radix) (e.g.,
0null, 1a, 2b, ) - A pointer to its parent for adaptive search,
- A pointer to the previous-letter for backward
traversal, and - A Boolean variable to indicate whether adaptive
search is supported in this level. - For the Insert and Search algorithms, check
YAA05
32Performance Evaluation Study
- ASM-trie vs. main memory trie with linear scan
vs. Oracle linear scan. - Machine Xeon 2.4 GHz with 2 GB RAM.
- 100 search key sets and 30 data replicas
- The ASM-trie had a constant search time, around
32000 keys/Sec. - The ASM-trie exhibited linear space utilization
around 1200 access rules per MB. - The difference between the ASM-trie and the
regular trie can be attributed to the adaptive
search.
ASM-trie
Main Memory Linear scan
Oracle Linear scan
33Comments on the Design
- The choice of a memory resident approach.
- The limit on main memory size should not affect
the scalability of the LS for several reasons - The LS is implemented as a distributed system
where every node is responsible for a specific
service area - 64-bit processors becoming a commonplace.
- New directions in implementing large-scale
services, e.g., Google - rely on multiple cheap servers
- all the data is indexed in the memory.
- This year, a first conference on data management
on new hardware
34Summary (Part I)
- Contribution
- An access control model for moving objects and
consumer profiles that supports granular
representation. - An efficient enforcement mechanism that utilizes
a new data structure, the ASM-trie. - A design of information flow that prevents
merchants from sharing consumer information. - Future work
- Disk-based ASM-trie
35Part II A Solution with Focus on Expressiveness
36Another Criterion for Privacy
- Why do customers accept receiving advertisement?
- Convenience of timely and location-based offers
- Related to their interests.
- Have incentives.
- The current privacy policy considers 1 and 2, but
not 3. - Can we add incentives to the privacy criteria?
- Yes, but this type of domains is difficult to
model with some data structures like the
hierarchies in Part I. - In general, it is difficult to model exceptions
in such hierarchies. - Consider NYC (a city that is composed of five
counties). It violates the hierarchys structural
properties.
37KR Techniques
- Knowledge Representation (KR) Techniques
- Modeling approaches based on KR techniques are
more expressive - KR techniques can be broadly classified into
- logic-based and non logic-based.
- Description Logics (DLs) is a class of
logic-based KR that has been used recently as a
basis for designing the Ontology Web Language
(OWL) - We propose a solution based on
- modeling incentives and the other preference as
an ontology and - enforcing these preferences using DL reasoning
techniques.
38Overview
- 1. A brief overview of DLs
- 2. Preferences Ontology
- 3. Query Processing
- 4. Implementation
- 5. Performance Evaluations Study
391. DLs A Brief Overview
- The basic building block of KR in DLs is
- The concept -- defined as a set of individuals
- Concepts and the IS-A relationship are used to
build hierarchical terminologies (taxonomies). - Terminologies are the intensional knowledge
- The extensional knowledge comes from assertions
about individuals - In addition to the IS-A, DLs can represent other
types of relationships - roles
401.1 A Minimal DL language and its Interpretation
DLs have well-defined model-theoretic
interpretation The following is the
interpretation of the AL language.
?? ?? (the universal concept,
thing) ?? ? (bottom concept,
nothing) (? A)? ?? \ A ? (atomic
negation) (C ? D)? C? ? D ? (conjunctio
n) (C ? D)? C? ? D ? (disjunction) (?R.
C)? ? ? ?? ??. (?, ?) ? R?? ? ?
C? (Value restriction) (?R.?)? ? ? ??
??. (?, ?) ? R? (limited value existential
quanti.)
411.2 SHIQ(D) and OWL
- SHIQ(D) is equivalent to AL plus full concept
negation, transitive roles, qualified cardinality
restrictions, role hierarchies, inverse roles,
and datatypes. - SHIQ(D) has a good balance between
expressiveness and computational efficiency
(computability and decidability) - SHIQ(D) is almost equivalent to OWL
- For an excellent reference on DLs, check the DLHB.
421.3 Important Features of DLs
- Two types of terminology axioms inclusion
(e.g., ) and equality (e.g., ). - A definition is an equality with atomic left
side. - A finite set of definitions T is called
terminology or TBox. - A finite set of assertions about individuals is
called ABox. - The Open-world semantics
- The unique name assumption
431.4 Reasoning in DLs
- Assuming a knowledge base K, concepts C and D,
and an individual a - TBox reasoning includes
- Class subsumption queries determine if C is a
subclass of D with respect to K. - Class hierarchy queries given a class C, return
all or the most-specific (most-general)
superclasses (subclasses) of C in K. - Class satisfiability queries given a class C,
determine if C is satisfiable (consistent) with
respect to K. - ABox reasoning includes
- Ground determine whether a given individual a is
an instance of C. - Open determine all the individuals in K that are
instances of C. - All-classes given an individual a, determine all
the classes in K that have element a.
442. Preferences Ontology
- The ontology includes six taxonomies
- IncentiveType,
- IncentiveValue,
- Location,
- Time,
- Products,
- Merchants.
- Both Consumer preferences and merchant queries
are - Subsumed by a class called CPP (Consumer Privacy
Preferences).
45Modeling Promotions
- Promotion Techniques
- Price reduction,
- Happy hour (i.e., price reduction for a short
time), - No payment for a specific period,
- Payments on installments,
- More items for free,
- Bundle (which could be homogeneous reduced
price for second item, or heterogeneous another
product at a reduced price), - Premium (i.e., a free non-related product or
service, e.g., free miles), - Prize,
- Contest (i.e., based on a skill),
- Sweepstakes (i.e., based on chance), and
- Rebates or refund (i.e., cash refund, coupon
refund, or escalating refund). - We analyzed these techniques and found that a
promotion includes - Incentive type ? IncentiveType Taxonomy
- incentive value ? IncentiveValue Taxonomy
- Conditions ? Property Restrictions on the
IncentiveType Taxonomy
462.1 Incentive Type Taxonomy
IncentiveType T Monetary
IncentiveType Coupon IncentiveType Time
Slack IncentiveType ExtraItems
IncentiveType PayOnInstallments
IncentiveType InstantRefund
Monetary DelayedRefund Monetary
- For each of these subclasses, an object property
is defined. - Promotion conditions are expressed as property
restrictions on the class IncentiveType and its
subclasses - Example product condition
- Property hasProduct
- Range Products_Services
- Restriction allValuesFrom AllProduct and with
cardinality 1.
472.2 Incentive Value Taxonomy
- The IncentiveValue taxonomy includes five
subclasses - PercentageReduction,
- ScalarReduction,
- Price,
- TimeSlack, and
- NumberOfInstallments
- The taxonomy also includes five datatype
properties hasPercentageValue, hasScalarValue,
etc. - The range for these properties is the XML integer
data type. - Example
- ?20 IncentiveValue.hasPercentageValue
482.3 Location Taxonomy
- The main class in the location taxonomy is
AllLocations where its semantics is the set of
all cities. - since we are using class subsumption for
reasoning, we represented cities as primitive
classes instead of individuals - Example
492.4 Products and Services Taxonomy
- Used the United Nations Standard Products and
Services Code (UNSPSC) . - UNSPSC provides five levels taxonomy Segment,
Family, Class, Commodity, and Business Function. - Imported from XML to OWL.
502.5 Time Taxonomy
- The main class in the time taxonomy is AllTimes
where its semantics is the set of all hours (in
one year) - You can express things like Labor Day even though
it does not have a specific date.
512.6 Merchants Taxonomy
- Merchant taxonomy is similar to the industry
hierarchy in Part I. - Compatible with the Census Bureaus
Classification of Industries - Related to Products and Services Taxonomy with
two properties hasProduct and hasService
522.7 Consumer Preferences
- ConsPref ? CPP ?
- (?hasLocation.L1 ?
- ?hasTime.T1 ?
- ?hasProduct.P1 ?
- ?hasMerchant.M1 ?
- ?hasIncentiveType.IT1 ?
- ?hasIncentiveValue.IV1)
Consumer ID
Preference
- ConsPref_12345_1 ? CPP ?
- (?hasLocation.USA_CA_Cities ?
- ?hasTime.WeekEnd ?
- ?hasService.Lodging ?
- ?hasMerchant.Hotels ?
- ?hasIncentiveType.Percentage ?
- ? ?20 hasIncentiveValue)
532.8 Merchant Queries
- MerchQuery ? CPP ?
- (?hasLocation.L2 ?
- ?hasTime.T2 ?
- ?hasProduct.P2 ?
- ?hasMerchant.M2 ?
- ?hasIncentiveType.IT2 ?
- ?hasIncentiveValue.IV2)
Merchant ID
Query
- MerchQuery_Hilton_1 ? CPP ?
- (?hasLocation. SanJose ?
- ?hasTime.Saturday ?
- ?hasService.Lodging ?
- ?hasMerchant.Hilton ?
- ?hasIncentiveType.Percentage ?
- ? ?25 hasIncentiveValue)
543. Query Processing
- Queries are processed by computing the
intersection between ConsPref_CID_n (P) and
MerchQuery_MID_m (Q) - Exact match P ? Q ? Permission Grant
- Disjoint P Q ? ? Permission Deny
- Subsumes Q P ? Permission Grant
- Subsumed P Q ? Permission is undetermined
553.1 Query Processing Algorithm
564.1 Semantic Enforcement Mechanism
574.2 Tools Used
- Protégé Excellent KR editor with OWL plugin
- Uses Jena for persistent storage
- Uses DIG interface to communicate with reasoners
- The DIG interface does not support datatype
properties. How to get around it? - Has visualization tools, query tools, etc.
- Racer (or Fact) DL reasoner that supports DIG
interface - OWL API converts OWL ontologies to DIG format
- The DIG interface does not support datatype
properties. - How to get around it?
584.3 Getting Around DIG Problems
- In the IncentiveValue taxonomy, we modeled the
values as set of recursive subclasses that
represent value levels. - Example, for the percentage subclass, the first
class is GreaterThan5, the second is
GreaterThan10 and so forth. - We used the complement of the incentive value
submitted by the merchant - For the scalar subclass (which is infinite), we
introduced an artificial ceiling and used unequal
steps (e.g., 50, 100, 200, 500). - We followed a similar approach with the other
taxonomies.
595. Performance of matching using RACER for query
sizes from 15 to 90
60Summary (Part II)
- Performance
- Query size is more significant than ontology size
for the reasoner - Contribution
- Promotion Model
- Preference Ontology
- Enforcement Mechanism
- Algorithm for semantic-based query processing
- Future work
- Combining the two approaches in one solution that
provides efficiency and expressiveness. - Examining other query processing approaches,
e.g., using an instance store, and asserting
multiple queries before processing them.