Tactical Storage: Simple, Secure, and Semantic Access to Remote Data - PowerPoint PPT Presentation

About This Presentation
Title:

Tactical Storage: Simple, Secure, and Semantic Access to Remote Data

Description:

Title: Cooperative Computing Author: Douglas Thain Last modified by: Douglas Thain Created Date: 9/6/2004 11:44:06 PM Document presentation format – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 39
Provided by: Dougl134
Learn more at: https://www3.nd.edu
Category:

less

Transcript and Presenter's Notes

Title: Tactical Storage: Simple, Secure, and Semantic Access to Remote Data


1
Tactical StorageSimple, Secure, and
SemanticAccess to Remote Data
  • Prof. Douglas Thain
  • University of Notre Dame
  • http//www.cse.nd.edu/dthain

2
(No Transcript)
3
(No Transcript)
4
Plentiful Computing Power
http//www.cs.wisc.edu/condor/map
  • As of 25 April 2006...
  • Condor Worldwide
  • 56,682 CPUs / ??? TB / 1758 sites
  • Teragrid
  • 15,328 CPUs / 220 TB / 6 sites
  • Open Science Grid
  • 21,156 CPUs / 83 TB / 61 sites
  • EGEE Grid
  • Lots???

5
Complex Ecology of Storage
HTTP, FTP, RFIO, gLite, SRB, SCP, RSYNC, HTTP...



private disk





shared disk

Independent Cluster Disks
6
Problems Accessing Data
  • Large Burden on the User
  • User may not be able/willing to state files in
    advance.
  • Different services/protocols available at
    different sites.
  • Programs not modified to take advantage of
    services.
  • Different access modes for different purposes.
  • File transfer preparing system for intended use.
  • File system access to data for running jobs.
  • Resources go unused.
  • Disks on each node of a cluster.
  • Unorganized resources in a department/lab.
  • Would like to combine disks into larger
    structures.
  • A global file system cant satisfy everyone!
  • (Global means different things to different
    people.)
  • Both a technical and social problem.

7
Whats the Problem?
  • We often assume that the site administrator is
    responsible for making the site comfortable for
    the user. (Not possible on the grid!)
  • Rather, the user should be able to bring along a
    mechanism to access multiple independent
    (remote?) data sources.
  • Of course, we have to make it easy!

8
Tactical Storage Systems (TSS)
  • A TSS allows any node to serve as a file server
    or as a file system client.
  • All components can be deployed without special
    privileges but with security.
  • Users can build up complex structures.
  • Filesystems, databases, caches, ...
  • Admins need not know/care about larger
    structures.
  • Two Independent Concepts
  • Resources The raw storage to be used.
  • Abstractions The organization of storage.

9
App
Parrot
???
file system
file system
file system
file system
file system
file system
file system
10
Key Properties
  • Tactical Storage is Simple
  • Appears as an ordinary filesystem.
  • Applies to unmodified applications and data w/out
    code changes, relinking, kernel modules, etc...
  • Tactical Storage is Secure
  • Authentication with standard GSI or Kerberos.
  • Rich distributed access control system.
  • Tactical Storage is Semantic
  • Name data by meaning, not by location.
  • Supports external name resolution mechanisms.

11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
Access Control in File Servers
  • Unix Security is not Sufficient
  • No global user database possible/desirable.
  • Mapping external credentials to Unix gets messy.
  • Instead, Make External Names First-Class
  • Perform access control on remote, not local,
    names.
  • Types Globus, Kerberos, Unix, Hostname, Address
  • Each directory has an ACL
  • globus/ONotreDame/CNDThain RWLA
  • kerberosdthain_at_nd.edu RWL
  • hostname.cs.nd.edu
    RL
  • address192.168.1.
    RWLA

18
Distributed Group ACLs
file server
file server
file server
file server
file server
file server
file server
file system
file system
file system
file system
file system
file system
file system
UNIX
UNIX
UNIX
UNIX
UNIX
UNIX
UNIX
19
Semantic Data Access
Appl
Parrot
/usr/local /chirp/host5.nd.edu/software /tmp
/chirp/host9.nd.edu/scratch /data
/gsiftp/ftp.nd.edu/mydata /db
resolverfind_db
find_db
20
(No Transcript)
21
Remote Database Access
Credit Sander Klous _at_ NIKHEF
  • HEP Simulation Needs Direct DB Access
  • App linked against Objectivity DB.
  • Objectivity accesses filesystem directly.
  • How to distribute application securely?
  • Solution Remote Root Mount via Parrot
  • parrot M //chirp/fileserver/rootdir
  • DB code can read/write/lock files
    directly.

GSI
script
DB data
file server
file system
Parrot
libdb.so
WAN
GSI Auth
Simple FS
sim.exe
22
Remote Application Loading
Credit Igor Sfiligoi _at_ Fermi National Lab
  • Modular Simulation Needs Many Libraries
  • Devel. on workstations, then ported to grid.
  • Selection of library depends on analysis tech.
  • Constraint Must use HTTP for file access.
  • Solution Dynamic Link with TSSHTTP
  • /home/cdfsoft -gt /http/dcaf.fnal.gov/cdfsoft

appl
proxy
select several MB from 60 GB of libraries
liba.so
HTTP server
file system
Parrot
libb.so
proxy
HTTP
libc.so
23
Technical Problem
  • HTTP is not a filesystem! (No directories)
  • Advantages Firewalls, caches, admins.

Appl
HTTP Server
root
Parrot
etc
home
bin
HTTP Module
alice
cms
babar
24
Technical Problem
  • Solution Turn the directories into files.
  • Can be cached in ordinary proxies!
  • Hierarchical SHA1 integrity check.

Appl
HTTP Server
make httpfs
root
Parrot
etc
home
bin
HTTP Module
alice
cms
babar
25
Logical Access to Bio Data
  • Many databases of biological data in different
    formats around the world
  • Archives Swiss-Prot, TreMBL, NCBI, etc...
  • Replicas Public, Shared, Private, ???
  • Users and applications want to refer to data
    objects by logical name, not location!
  • Access the nearest copy of the non-redundant
    protein database, dont care where it is.
  • Solution EGEE data management system maps
    logical names (LFNs) to physical names (SFNs).

Credit Christophe Blanchet, Bioinformatics
Center of Lyon, CNRS IBCP, France http//gbio.ibcp
.fr/cblanchet, Christophe.Blanchet_at_ibcp.fr
26
Logical Access to Bio Data
gLite Server
BLAST
nr.data
EGEE File Location Service
Chirp Server
Parrot
nr.data
FTP Server
RFIO
gLite
HTTP
FTP
nr.data
27
Performance of Bio Apps on EGEE
28
Expandable Filesystemfor Experimental Data
Credit John Poirer _at_ Notre Dame Astrophysics
Dept.
Project GRAND http//www.nd.edu/grand
29
Expandable Filesystemfor Experimental Data
Credit John Poirer _at_ Notre Dame Astrophysics
Dept.
Project GRAND http//www.nd.edu/grand
file server
30
Current Work
  • Now that we can easily use any storage...
  • Much easier to arrange data/jobs arbitrarily.
  • Idea combine cluster storage / cluster comp!
  • Goal keep jobs close to data that they need.
  • PINS Processing in STorage
  • Example GEMS Distributed Databank
  • Facility for creating, storing, and analyzing
    molecular dynamics data in a cluster.
  • Goal Be able to easily scale both CPU and
    storage capacity by adding commodity nodes.

Credit Jesus Izaguirre and Aaron Striegel _at_
Notre Dame
31
meta-data database
Compute F(D1)
Query (MolCH4) (Tgt300K)
file system
file system
file system
file system
file system
file system
file system
F(D1)
F
32
More Open Problems
  • Resource Management
  • How to prevent overcommitment -gt badput?
  • Security
  • How to easily express complex policies for
    sharing and controlling combined cpu/disk?
  • Reliability
  • How to deal with disconnection, erasure,
    rejection, unexpected performance, etc...
  • Garbage Collection
  • Whats to prevent me from filling every disk
    everywhere with computations that I might need?
  • Debugging
  • How do we dig out of numerous, noisy, distributed
    logs that state relevant to a complex workflow?

33
Conclusion
  • Tactical storage allows end users to build large
    structures out of simple building blocks without
    getting stuck on the ugly details.

34
Acknowledgments
  • Science Collaborators
  • Christophe Blanchet
  • Patrick Flynn
  • Sander Klous
  • Peter Kunzst
  • Erwin Laure
  • John Poirier
  • Igor Sfiligoi
  • CS Collaborators
  • Jesus Izaguirre
  • Aaron Striegel
  • CS Students
  • Paul Brenner
  • James Fitzgerald
  • Jeff Hemmes
  • Paul Madrid
  • Chris Moretti
  • Gerhard Niederwieser
  • Phil Snowberger
  • Justin Wozniak

35
For more information...
  • Cooperative Computing Lab
  • http//www.cse.nd.edu/ccl
  • Cooperative Computing Tools
  • http//www.cctools.org
  • Douglas Thain
  • dthain_at_cse.nd.edu
  • http//www.cse.nd.edu/dthain

36
(No Transcript)
37
Problem Shared Namespace
file server
globus/ONotreDame/ RWLAX
38
Solution Reservation (V) Right
file server
mkdir only!
ONotreDame/CN V(RWLA)
Write a Comment
User Comments (0)
About PowerShow.com