Title: Exploiting Connector Knowledge to Efficiently Disseminate Highly Voluminous Data Sets
1Exploiting Connector Knowledge to Efficiently
Disseminate Highly Voluminous Data Sets
- Chris A. Mattmann
- David Woollard
- Nenad Medvidovic
- 3rd Workshop on SHaring and reusing ARchitectural
Knowledge - 30th Intl Conference on Software Engineering
(ICSE08) - Leipzig, GermanyTuesday, April 15, 2008
2Outline
- Problem Space
- Connector Selection
- Connector Knowledge
- Insight
- Observation
- DISCO A Framework for Connector Selection
- Results
- Conclusion
3Data Distribution Scenarios
A Backup Site periodically connects across the
WAN to the Digital Movie Repository to backup its
entire catalog and archive of over 20 terabytes
of movie data and metadata.
A medium-sized volume of data, e.g., on the order
of a gigabyte needs to be delivered across a LAN,
using multiple delivery intervals consisting of
10 megabytes of data per interval, to a single
user.
4Challenges of Selecting the Right Connector
Technology
XML-RPC
Given our current architecture?
UFTP
Siena
GridFTP
bbFTP
Which one is the best one?
FTP
Aspera
CORBA
RMI
SOAP
Given our distribution scenarios and requirements?
Bittorrent
SFTP
HTTP/REST
JXTA
SCP
GLIDE/PRISM-MW
5This is an Architectural Decision
- Architectural decisions (such as connector
selection) impact functional and non-functional
properties of the overall data distribution
system architecture - It does matter what connector you select
- Functional (performance)
- Efficiency, consistency, scalability,
dependability of the data transfer - Non-functional (e.g., interoperability, security)
- We assert that this process has largely remained
an art form and forces organizations to rely on
organizational gurus whose knowledge is never
encoded or understood
6The Role of Architectural Knowledge
- Connector selection is so difficult because there
is no reproducible way to make the right
connector selection that a guru would make - Why?
- Lack of an audit trail
- Lack of Architectural Knowledge
- Our work define and capture this knowledge about
connectors!
7Two Types of Knowledge
- Insight
- This is the inherent knowledge about the
architectural properties of a connector that make
it suitable for a particular distribution
scenario - Example Because connector A has Cache-based data
access, it is inherently more scalable (than a
connector B with Session-based access) and
ultimately more applicable for larger volume
scenarios - Key connector architectural properties data
access transient availability, which can have
values of Peer, Cache, or Session-based.
8Two Types of Knowledge
- Observation
- The observed characteristics of a connector that,
based on past experience using it in a single (or
family of) data distribution scenario(s), that
make it either applicable, or inapplicable for a
scenario - Example Because connector A successfully
delivered 100 MB of data while maintaining a
transfer rate of near 10 MBs/sec it is more
efficient and scalable than connector B which had
3 drops in connection, varying in transfer from
from 3MBs/sec to 8 MBs/sec delivering around 90
MB of same data as connector A. - Key observable properties efficiency, scalability
9An Example
- 2 Postulated Connectors
- Architectural Metadata captured
- Connector model based on ICSE00 Mehta et al.
Taxonomy of Connectors - Sample distribution scenario Need to send 1 more
terabytes of data across a WAN from the US to
Europe to exactly 1 user in a single delivery
interval. - Insight and Observation
10(No Transcript)
11DISCO A Framework for Connector Selection
12Experimental Results
- 30 real data distribution scenarios from JPL, and
NCI EDRN projects - Run DISCO connector selection using architectural
knowledge - Low (10 insights, 8 observations)
- Medium (50 insights, 16 observation)
- High Knowledge (100 insights, 24 observations)
- Compare against expert selection answer key
- 80 accuracy
13Conclusions
- Architectural Insight and Observation
- Positive Impact framework for connector
selection thats over 80 accurate - Standards for architectural knowledge description
- Didnt show it, but using standard XML files and
schemas to describe connectors, capture
distribution scenarios, and record observation
and insight - Needed first known step in capturing
architectural knowledge about connectors in a
standard form
14Questions?