Title: Storage Research in the UCSC Storage Systems Research Center SSRC
1Storage Research in the UCSC Storage Systems
Research Center (SSRC)
- Scott A. Brandt
- (scott_at_cs.ucsc.edu)
- Computer Science Department
- Storage Systems Research Center
- Jack Baskin School of Engineering
- University of California, Santa Cruz
2SSRC Overview
- Systems-oriented storage research center
- Supported by low-level research in materials,
devices and interconnects - Funded by DOE and NSF, and industry sponsors
- 10 faculty members
- Others involved as associates
- External researchers as affiliates
- Faculty growth in databases security
- Significant educational component
- Undergraduate and graduate storage and related
systems courses - 20-25 graduate and undergraduate students
- Close cooperation with industry research sponsors
- HP, IBM, Microsoft, Intel, Agile/OnStor
- Actively pursuing others
3Primary SSRC Faculty
- Darrell Long, Director
- High-performance storage systems, distributed
systems - Patrick Mantey
- Sensor networks, data acquisition, and multimedia
systems - Scott Brandt
- High-performance storage systems, distributed
systems, soft real-time systems - Ethan Miller
- Scale security in file systems, next generation
file system design
4Other SSRC Faculty
- Alexandre Brandwajn (Performance Analysis)
- Performance modeling and analysis of computer
systems - Katia Obraczka (Networking)
- Network architectures for large-scale storage
- Hamid Sadjapour (Coding Theory)
- Storage system reliability and availability
- Raymie Stata (Web Archaeology)
- Mechanisms to preserve and mine multi-terabyte
data sets - Claire Gu (Optical Storage)
- Phokion Kolaitis (Logic and Databases)
5SSRC Storage Research Challenges
- Huge Capacity and Scalability
- Internet Archive
- Genomic databases
- Performance
- Ever-increasing gap between CPU and secondary
storage - Security
- Networked storage introduces many new security
issues - Portability
- Power management, disconnected operation, new
devices - New storage technologies
- MEMS, MRAM, Flash, etc.
- Large-scale information management
6SSRC Research Thrusts
- Object-based Storage
- Scalable high-performance distributed storage
- Archival Storage
- Large-scale on-line disk-based storage systems
- New Storage Technologies
- Systems-level research in new storage
technologies - Predictive/Adaptive Techniques
- Machine-learning based techniques for increasing
performance and reducing I/O latency and traffic - Secure Storage
- Secure file systems techniques and systems
71. Object-based Storage
- Scalable High-Performance Object-based Storage
from Commodity Components - 2 Petabytes
- 100 GB/sec aggregate throughput
- Parallel accesses from up to 10,000 clients
- Possibly to the same file
- Files bytes to terabytes
- 1-10,000 files/directory
- 50 msec access times
- Mid-performance local access by visualization
workstations - Wide-area access
81. Object-based StorageOBFS Object-based
Storage Manager
- Design Principles
- Flat object name space
- Variable block size with fixed maximum (common)
OBFS outperforms Ext2/3 and meets or exceeds the
performance of XFS with 1/25 the code
91. Object-based StorageLazy Hybrid Metadata
Management
- Efficient, flexible, scalable metadata cluster
management - Filename hashing
- Efficient
- Avoids hot spots
- Directory hierarchy
- Provides standard hierarchical directory
semantics - Lazy policies
- Efficient metadata operations
- Dual-entry Access Control List (DACL)
- Server-side permission caching
- Update Logging
101. Object-based StorageReliability in
Large-Scale Storage
- More disks ? reliability problem
- Disk failures
- Non-recoverable bit errors
- 1 in 10131015 bits
- Large disks ? long rebuild time
- Capacity outpaces data transfer rate
- RAID alone cannot solve the problem
- Solution for Disk failures
- Configuration for a redundancy set
- 2-way or 3-way mirroring
- RAID51
- Fast Mirroring Copy
- Lazy Parity Backup
- Solution for Bit errors
- Signature scheme
Mean-Time-To-Data-Loss of a 2PB storage system by
using three configurations and fast recovery
mechanisms. The upper lines are for a system
built from disks with 106 hour MTTF, and lower
lines are for a system built from disks with 105
hour MTTF.
111. Object-based StorageRobust Data Distribution
- OBSDs are added to the system in groups
- Allocation/Reallocation
- Objects are placed in the new group with a
probability equal to the fraction of the total
number of OBSDs in the new group - Lookup
- If an object isnt found in the newest group, the
next newest group is checked as if the newest
group does not exist - In-memory ? very fast
This figure shows placement of an object into a
system with three groups. In this case, the
object didnt get placed in Group 3, or Group 2,
so it will be placed in Group 1.
122. Archival StorageDeep Store
- Efficient On-line Deep Store
- Differentially Compress Data (1001)
- Disk-based for on-line performance
- Search for similar files
- Compress against existing data
- Organize similar files using data clusters
- Scale to billions of files
133. New Storage TechnologiesMEMS-based Storage
- MEMS-based storage very dense non-rotating
orthogonal magnetic or physical recording - 2D seeks
- Large of active read/write tips
- Device Modeling
- Power management
- Aggressive spin-down, sequential request merging,
subsector accesses - 50 lower power consumption with no performance
penalty - Storage Subsystem Architectures
- MEMS metadata storage and MEMS disk write buffer
- Performance ? MEMS alone
- Request scheduling
- Zone-based Shortest Positioning Time First
- SPTF-like response times and C-SCAN-like
variability - Storage Allocation
- Zone-based allocation (in progress)
143. New Storage TechnologiesHeRMES MRAM-based
Storage
- Magnetic RAM Fast non-volatile DRAM-like storage
- How to use MRAM in a file system?
- Combined disk/MRAM file system
- File systems for mobile devices
- MRAM metadata storage and data caching
- Online metadata and file compression
154. Predictive/Adaptive Techniques
- Dynamic techniques improve application and system
performance - Much better than static parameter tuning
- Formal Machine Learning-based approach
- Profs. Manfred Warmuth and David Helmbold
- Problems identified/examined
- File access pattern prediction for prefetching
and grouping - Cache management algorithm selection
- Disk spin-down timeout selection
- File lifetime prediction
- Network congestion control
164. Predictive/Adaptive TechniquesPredictive
Prefetching/Data Grouping
- Recency-based models track file access patterns
- Successor information maintained in metadata
- Prefetch related files or groups
- Reduces storage latency and increases cache
effectiveness - Predictors
- Finite Multi-Order Context (Kroger)
- Noah aggregating cache (Amer)
- Program-based Successor (Yeh)
- Current research
- Hoarding for mobility
174. Predictive/Adaptive TechniquesAdaptive
Caching
- Best cache management policy changes over time
- Workloads change
- Filtering occurs
- Cache relationships change
- Solution Dynamically choose best policy
- Machine Learning Fixed-Share to Uniform Past
- Refetching helps
184. Predictive/Adaptive TechniquesAdaptive
Caching Results
28 fewer cache misses than LRU 8 fewer than
BestFixed 4-24 reduction in I/O traffic
195. Secure StorageSecure Network-Attached Storage
- For each file block on disk, keep sufficient
information to - Decode the data (at the client)
- Validate the sender of the data
- Ensure data integrity
- Use encryption to keep data secret on disk and in
transit - Decryption occurs at the client
- Information to decrypt available only at client!
- Prevent compromise of data
- Impossible to protect against denial of service
- Loss of data may occur ? make sure its noticed!
- Three similar security schemes
- Trade off resistance to intrusion for speed
205. Secure StorageIntra-file Security (IFS)
- IFS end-to-end file system encryption technology
- Encrypts independent file extents
- Flexible encryption region size
- Files may contain one or more isolated or
overlapping secure regions - Transparent to the user
- Supports strong encryption
21Summary
- The UCSC Storage Systems Research Center is
becoming a nationally recognized storage systems
research group - Darrell Long recently founded the Conference on
File and Storage Technology (FAST), already the
premier storage systems research conference - We are actively recruiting faculty and students
to participate in SSRC research activities - We are actively soliciting corporate sponsorships
and research relationships