Title: A comprehensive model for Data Quality Value of Data, and User Interface Design
1A comprehensive model for Data QualityValue of
Data, and User Interface Design
- Andrew U. Frank
- Geoinformation
- TU Vienna
- frank_at_geoinfo.tuwien.ac.at
2What are the most important problem hindering
wide use of GIS today?
- Gueting said Support for temporal data
- Spaccapietra said Semantics
3What are the most important practical problems
for the GI industry?
- Consider that the market for GI in Europe is only
1/10 of the comparable industry in the USA
(approx. same population). - Impediments for business
- User Interface
- Value of Data
- Data Quality
4Comprehensive model of GI use
- Different applications of GIS are operating with
very different concepts of what the GIS produces
- Produce maps (for decision makers)
- Analyze situations
- Explore data
- Each time, a different user interface must be
learned, which is a high cost and a large
impediment.
5Economic value of information
- (Geographic) information can only be used to
improve decision. - This is the only situation in which data can
produce economic value. - Read
- Varian Shapiro Network economy
6Model of rational decision making
- A rational man (a.k.a. homo economicus) decides
between action such that his well-being is
optimized.
7Multiple critiques
- Not just economic (monetary) optimizations, but
general well-being. - Bounded rationality neither the information nor
the inference resources are available to make the
optimal decision
8Model of rational decision making is (only) a
model
- Descriptive model it is often used when we
rationalize our behavior after the fact. - We explain our actions in terms of optimizing our
utility. - Prescriptive model for administrative decisions
the model is used to justify a decision and to
communicate the arguments to others.
9Core model of rational decision making
- Produce all candidate actions
- Exclude action by non-compensatory criteria
- Evaluate utility of remaining candidate actions
using compensatory criteria and weights. - Select best action (i.e. action with highest
utility).
10Actions change state of the world
11Hotel for a weekend candidates
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19My Criteria
- Distance to beach
- Classification of hotel
- Restaurant
- Garden
- Trail access
- Noise
- Price
20Collection of data for these criteria
21Normalize data
- Data is collected on different measurement scales
(cf. Stevens paper in Science 1946). - Make it comparable by normalizing it, for example
on a scale 0..10 (or 0..1), but allow positive
and negative utility.
22Non-compensatory criteria
- Non-compensatory criteria (a.k.a. K.O. criteria)
- must be fulfilled for a candidate to make it
acceptable.
23Compensatory criteria
- These criteria list the contribution of
properties of the candidate actions. - Weights indicate what the contribution to utility
per unit of the property is
24Unifying criteria
25Interaction with the spreadsheet
- The weights are not well determined this is one
of the major critique of the method. - Too many non-compensatory criteria no elements
left. - Reduce non-compensatory criteria.
- Many similar solutions reduce weight for the
common criterion.
26User interaction style
- User interface must be direct manipulation
not requiring a rational analysis, - but give a feeling for connections between
criteria and optimal selection.
27User Interface Consideration
- Shneiderman has pointed out that the only
interface style which works consistently are
interfaces based on direct manipulation. They
exploit human abilities which are not based on
verbal (rational) understanding, but use the
connection between actions and reactions. - Direct manipulation
- The user has some controls and the result reacts
immediately to changes.
28Emotional aspects
- Experience shows that users play with weights
till the solution feels right. - This means, that it is emotionally acceptable.
- Modern neurophysiology has observed that actual
decision making in human brains is not rational,
but emotionally controlled. - Insert a property likable and assess each
candidate. Then the weight given this property
indicates the emotional influence.
29What are the controls in the rational decision
model?
- Non-compensatory criteria
- Threshold for fulfillment.
- Compensatory criteria
- Weight
- What data is considered either a threshold or a
weight is set.
30A first sketch of an interface
- Very simple interface.
- Interface is completely in the language of the
user.
31General user interface because model is general
- The rational decision model is general EVERY
decision is modeled. - Users have to learn only one conceptual model,
not many different ones.
32Decision model links directly to user task
- Intermediate elements are excluded, which
simplifies the conceptualization (less is
better!) - Compare with Standard approach GIS produces map
which is used as input to the decision process. - Many details of map form must be fixed, which are
not relevant for the decision process. - User interface must have controls for these.
33Value of decision
- In the model of rational decision making, the
value of data can be estimated - The value of the data is the improvement of the
decision compared to no information. - For decision on actions where the action have a
cost, the difference between highest and lowest
cost can be used as an estimate for the value of
the decision.
34Value of data
- Properties which have more weight contribute more
to the decision. The value of the decision can
be distributed to the data according to the
weights.
35Price of data
- The value of the data is not the price at which
it can be sold - Deduce cost of obtaining and using it
- Price must be set for many users, value is
specific for a decision. - Opportunities for specialized user interfaces,
connections to data collections and thus BUSINESS.
36Data Quality
- Quality of the data is typically measured from
the perspective of the data producer. Metadata
standards codify this approach. - Observations indicate that users are not using
metadata. How should a user decide on the
usability of data from metadata?
37Data quality from a user perspective
- Data is good, if it leads to the best decision.
It is bad, if it makes me take the wrong
decision. - Data quality is the risk of me making the wrong
decision.
38Can we translate a producers assessment of data
quality to the risk of the user making the wrong
decision?
- Example Precision
- The producer of data states that the distance to
the beach is 100 m - 50 (one standard deviation,
corresponds to 68 of all values are between 50
and 150 m).
39Translation of completeness to risk
- Incomplete data will make us miss the best
solution. The risk is comparable to the amount of
missing data.
40Example
- 50 of data are missing (realistic in the
selection of hotels based on web browsing). - Reduce value of data by risk proportionally.
41Temporal currency
- Temporal currency is a standard data quality
element. - Temporal currency is not separable from other
criteria.
42Effects of temporal currency
- Time passed since collection reduced
- Precision
- Completeness (omissions, commissions).
43Data does not change, but quality is diluated
with time
44Estimate movement per period and reduce precision
proportionally
- Estimate appearance/disappearance of objects
- Reduce completeness proportionally.
45Decision model translates data quality to risk
- The decision model translates
- data quality to risk and
- risk to a reduction in the value of the data.
46Conclusion
- The model or rational decision making gives a
single conceptual framework in which three
important practical problems of today's use of
Geographic Information can be discussed
47User Interface
- Decisions can be modeled as a selection of the
action which optimizes the utility, given some
conditions. - The user must select what are the elements which
influence the decision (selection of data layers,
themes..) - What are candidate actions.
- What are the minimal requirement for a property
- What are his preferences, translated to weights
for each property. - This is the same for many (all?) decision
situations.
48Value of data
- The value of the data is in the improvement of
the decision. The contribution of each data
element is comparable to the weight of this
property.
49Data quality from a user perspective
- Better data reduces the risk of taking a wrong
decision. - Precision and completeness can be translated
directly to the risk of taking a wrong decision
and reduces the value of the data. - Temporal currency is first converted to reduced
precision and completeness (this should be done
by data provider)
50Closed loop semantics
- My answer to the problem of semantics
- Link observation semantics in the database to
action semantics in the decision.
51My choice