Title: A method to propagate permissions in biomedical data using a semantic web framework
1- A method to propagate permissions in biomedical
data using a semantic web framework - Helena F. Deus and Jonas S. Almeida
- hdeus_at_mathbiol.org
- The University of Texas M. D. Anderson Cancer
Center
2History of the web
Web 1.0 Links -gt Documents Web 2.0 Links -gt Data
Structures -gt Web services Web 3.0 Links -gt Web
Services -gt Links -gt Web Services -gt Links -gt Web
Services .
3Evolution of data representation
Nature Biotechnology. 2005 Vol 23 Nr 29
4Data management in the life sciences
Clinical/Medical data
Electronic Health Records
RDBMS
Life is good!
5Heterogeneous data management
Core facilities data
Clinical/Medical data
DNA Sequencing
Microarrays
RDBMS
Protein Arrays
Data everywhere!
Pulse Field Gel Electrophoresis
6Semantic web of data a set of best practices
7A data pyramid
Wisdom
Knowledge
W3C
OWL, OBO
RDF
SPARQL
Information
XML
TEXT
Data
Files
8S3DB Core Model
9Snapshots of interfaces using S3DBs API
(Application Programming Interface). These
applications exemplify why the semantic web
designs can be particularly effective at enabling
generic tools to assist users in exploring data
documenting very specific and very complex
relationships. Snapshot A was taken from S3DBs
web interface, which is included in the
downloadable package. This interface was
developed to assist in managing the database
model and, therefore, is centered on the
visualization and manipulation of the domain of
discourse, its Collections of Items and Rules
defining the documentation of their relations.
The application depicted on snapshots B-D
describe a document management tool S3DBdoc,
freely available as a Bioinformatics Station
module (see Figure 6). The navigation is
performed starting from the Project (C), then to
the Collection (B) and finally to the editing of
the Statements about an Item (D). The snapshot B
illustrates an intermediate step in the
navigation where the list of Items (in this case
samples assayed by tissue arrays, for which there
is clinical information about the donor) is being
trimmed according to the properties of a distant
entity, Age at Diagnosis, which is a property of
the Clinical Information Collection associated
with the sample that originated the array
results. This interaction would have been
difficult and computationally intensive to manage
using a relational architecture. The RDF
formatted query result produced by the API was
also visualized using a commercial tool, Sentient
Knowledge Explorer (IO-Informatics Inc), shown in
snapshot E, and by Welkin, F, developed by the
digital inter-operability SIMILE project at the
Massachusetts Institute of Technology. See text
for discussion of graphic representations by
these tools. To protect patient confidentiality
some values in snapshots B and D are scrambled
and numeric sample and patient identifiers
elsewhere are altered.
PLoS ONE. 2008 Aug 133(8)e2946
10Example TCGA data structure
http//tcga.s3db.org
11S3DB Rule
http//tcga.s3db.org/R247
Patient
Sample
??
blood
tumor
S3DB Statement
http//tcga.s3db.org/S234
12TCGA domain - instance
PLoS ONE. 2008 Dec3(12)e4076
13SPARQL
14Code portability and distributed data
API
API
SPARQL
15Permission management
Markov Model
16Permission propagation
17Experimental evolving ontologies
MGED and others
Current entry level for computation
Experimental, evolving Data Models
Proposed entry level for computation
Raw data
18(No Transcript)
19S3DB.ORG
What is S3DB?
- It is a web service that manages semantic web
content distinguishing the domain of discourse
from its instantiation. It was configured
specifically for the needs of Biomedical
Informatics projects where - Those who submit the data keep a fine tuned
control over its access and use. - The data model is deployed over a core ontology
that allows its editing. - It has a distributed deployment designed to deal
with heterogeneous environments.
What S3DB is not?
- It is not a client application.
- It is not a work in progress a SPARQL endpoint
assures that experimental data is not kept
outside of the Linked Data Web until is matures
20In Conclusion
- Dissolution of boundaries between data structures
is a good thing But doing it without losing the
role of each data element is even better ? - Some level of explicit granularity in the data is
necessary to implement a permission model.
21Acknowledgements
Jonas S. Almeida Kadir Akdemir Miriã
Coelho Cintia Palú Pablo Freire The Integrative
Bioinformatics Lab at the University of Texas MD
Anderson Cancer Center (Houston, Tx) Instituto
de Tecnologia Quimica e Biologica, Universidade
Nova de Lisboa (Lisbon, Portugal)
http//s3db.org