les robertson cernit0899 1 - PowerPoint PPT Presentation


PPT – les robertson cernit0899 1 PowerPoint presentation | free to view - id: bc0cd-ZDc1Z


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

les robertson cernit0899 1


photo - Seagate Technology, Inc. 36 GB capacity. half height 3.5' 12 platters, 24 heads ... principle of the scanning near-field optical microscope ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 72
Provided by: Rober877


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: les robertson cernit0899 1

The Data Storage Challenge for LHC
  • CERN School of Computing
  • Stare Jablonki - September 1999
  • Les Robertson
  • CERN - IT Division
  • les.robertson_at_cern.ch

Part I - The technology
  • today's workhorses
  • magnetic hard disk
  • magneto-optics
  • magnetic tape systems
  • optical disks
  • exotic storage technologies
  • holography
  • atomic force microscopy
  • robotics for handling mass storage

disk storage
  • state of the art
  • technology limits - the super-paramagnetic
  • heads
  • access performance and caches
  • magneto optics OAW, Terastor

units very small sizes are expressed in
micrometres, denoted ? almost everything else
is in - inches - in square inches - in2 feet
- 1 foot 12 inches Gigabit 109 bits -
Gb Gigabit per square inch - Gb/in2 Gigabyte
109 bytes - GB
disk storage - state of the art
  • platters
  • sputtered magnetic and protective layers
  • protective layer has textured landing area for
    the head - to avoid stiction on take-off
  • head flies at around 50 nanometres
  • current product - 3-4 Gb/in2
  • lab demonstrations - gt20 Gb/in2

super-paramagnetic limit
  • bit size - decreases in proportion as the areal
    density increases
  • width X length
  • 1 Gbpi2 3.5? X 0.18?
  • 10 Gbpi2 1? X 0.06?
  • 40 Gbpi2 0.5? X 0.03?
  • 80 Gbpi2 0.4? X 0.02?
  • fewer particles in a bit smaller separation
    between bits
  • increased tendency for domains spontaneously
    to change polarisation

super-paramagnetic limit
  • super-paramagnetic limit
  • point where the fluctuations in thermodynamic
    energy at operating temperatures have a moderate
    probability of causing magnetic state changes
  • in current disks, the magnetic energy barrier is
    about 40 times the thermodynamic range
  • it is expected that new materials, recording
    techniques will push the barrier to at least
    100 Gbpi2

  • inductive read heads
  • signal current varies as rate of flux change
  • MR read heads
  • NiFe conductor -- resistance changes with flux
  • independent of velocity
  • signal strength proportional to sense current
  • increased sensitivity in high density, high
    bandwidth recording
  • a transverse bias field is applied to
    discriminate between positive and negative
    recording polarisations

inductive read head
magneto-resistive read head
?R ? H ?V ? I ?R
inductive write head, MR read head
picture IBM Research - Almaden
Giant Magneto-Resistive Effect the Spin Valve
  • Giant Magneto-Resistive
  • Multi-layer head
  • magneto-resistive layer (NiFe)
  • conducting layer (e.g. Ag, Cu)
  • pinned layer (e.g. Co) - fixed magnetic
  • exchange layer ferro-magnetic material which
    maintains the pinned layer orientation
  • GMR exploits the different behaviour of
    conduction electrons with spin parallel to or
    opposed to the magnetic orientation of the MR and
    pinned layers - hence the term Spin Valve

GMR layers
exchange layer - magnetised
pinned layer (Co)
conducting layer (Cu)
MR layer (NiFe)
Spin Valve
picture IBM Research - Almaden
merged head
Seagate Cheetah 36
36 GB capacity half height 3.5 12 platters, 24
heads 5.7 ms average seek 10,000 rpm 2.99 ms
latency 1 MB cache 18-28 MB/sec internal
transfer rate
photo - Seagate Technology, Inc.
Data transfer speed
  • Data transfer speed increases with
  • the linear density (? of the areal density - i.e.
    about 26 per year)
  • the rotation speed - which has only increased by
    about 50 in the past 5-6 years
  • The actual data transfer speed
  • is faster on outer tracks than
  • on inner tracks - so be careful
  • when reading specifications to
  • discriminate between average and
  • maximum transfer speed.

assumes recent evolution maintained 60 per year
increase in areal density, rotational speed
increasing 50 in 5 years
1999 1
The importance of the cache
  • Access time depends on
  • the seek time - which has hardly improved by 50
    in ten years
  • the latency - half a turn of the platter
  • Without a cache, this would lead to
    very unimpressive performance for small transfer
  • The cache helps to get back to the nominal data
    transfer rate - no more than that!

Future possibilities
  • continuing developments of GMR - with the
    formidable research capability of IBM
  • current interest in the use of rare-earth/transiti
    on metal composites, evolved for MO recording
  • low Curie point
  • stable magnetisation at normal operating
  • stable magnetic domains demonstrated at a density
    of 250 Gb/in²
  • Longer term --
  • holography
  • atomic force microscopy
  • ….

Optically Assisted Winchester (OAW)
  • Developed by a Seagate subsidiary - Quinta
  • Magnetic layer uses a composition of rare earth
    transition metals
  • Write
  • laser heats material beyond Curie point
  • induction coils changes magnetic orientation
  • magnetisation stable at normal temperatures
  • Read
  • rotation of polarisation of reflected light
    (Kerr effect)
  • Technology
  • laser delivery fibres
  • micro mirror (head of a pin)
  • micro-optics
  • Potential 100 Gb/in2 ?
  • limited by the resolution of the optics

The Solid Immersion Lens Near Field Recording
Terastor Corporation
  • Solid Immersion Lens
  • laser is focussed internally in a material with a
    very high refractive index
  • with a red laser can get the spot diameter down
    to 0.2? (the bit width for 160 Gb/in2)

where ? is the wavelength n the
refractive index na is the
numerical aperture
  • Near field recording
  • principle of the scanning near-field optical
  • the oscillating dipoles of the radiating surface
    produce an evanescent field which decays in
    about one wavelength
  • .. but activate other dipoles within this range

Developments in Magneto Optics
  • The recorded area of the disk cannot be narrower
    than the spot (or at least the high temperature
    area of the spot)
  • But when recording, spots can be overlapped to
    increase linear density
  • This is not possible on conventional MO disk,
    which has a thick transparent substrate over the
    recording layer, which required a high field
    coil, with a high inductance and so low
    modulation frequency
  • Surface recording reduces the separation of the
    head and recording layer, making crescent
    recording possible, and also enabling the use of
    high numerical aperture lenses - producing
    smaller spots
  • But it is a challenge for the designer of
    removable media

disk rotation
Magnetic Super Resolution - MSR
  • Easy to see how the crescents are recorded, but
    how are they read back?
  • Three layers
  • 1) recording layer
  • 2) intermediate masking layer temperature sensiti
    ve magnetic orientation
  • low temperature parallel to plane
  • intermediate temperature perpendicular
  • high temperature loses orientation
  • couples the recording layer to the read-out
    layer only at intermediate temperatures
  • 3) read-out layer magnetised (erased) during

magnetic tape
  • why use magnetic tapes?
  • basics
  • linear
  • helical scan
  • state of the art drive - the StorageTek 9840
  • current trends

Why use magnetic tape?
  • Why use a sequential access medium with a history
    of relatively poor reliability?
  • historically the answer has been --
  • cost - 10-100 times cheaper per Byte than disk
  • volumetric storage density
  • removable, transportable medium
  • backup
  • archive
  • data exchange
  • robotic storage - automated access to enormous
    amounts of data
  • but there is considerable competition from
  • hard disks - cost, storage density
  • optical storage - archive longevity, data exchange

Volumetric Storage Density
Assumes shelf storage of -- raw
tape, DVD cartridge -- disk without
enclosure, power
supply, fan -- no compression on tape
Storage Capacity and Density
Native Cartridge Capacity -
Density TB/m
IBM 3590
STK 9840
STK Redwood
DVD-RAM (2-side)
Quantum DLT 8000
LTO Ultrium (future)
Seagate Cheetah (3.5" disk)
capacity (GB)
Device type
basic characteristics
  • medium
  • flexible substrate - 10? thick polyethylene
  • recording layer - 0.1-0.2?
  • Metal Particle
  • Metal Evaporated
  • stored in cartridge (1 reel) or cassette (2 reel)
  • tape extracted and loaded on drive
  • recording technology spin-off from magnetic
    disk developments
  • MR, GMR heads
  • track following servo systems
  • media

sequential access
  • basically a sequential medium
  • no delete/update
  • new data written at end
  • open - and read from start of file
  • usually a directory at the beginning of the tape
  • so open(file) can use servo information for
    a fast skip to the start of the data

logical data format
  • The tape is organised logically as a set of
    files, separated by labels and tape marks.
  • In early drives, the drive could seek rapidly to
    the next tape mark, which was recorded with a
    very special pattern Modern drives use a
    directory and information on servo tracks to seek
    to the logical tape mark

file data
file data
file data
volume labels
tape mark
file labels
tape mark
tape mark
end of volume
tape mark
tape mark
file labels
tape mark
tape mark
file labels
tape mark
tape mark
physical data format
  • The data is recorded in blocks, each with a
    cyclic redundancy check (CRC) to detect errors
  • The logical block is recorded in a series of
    physical blocks, spread across the parallel
    recording channels
  • each channel corresponds to a set of physical
    head elements
  • Substantial recording capacity is reserved for
    error correction data
  • The 4-channel DLT format is shown - newer tape
    systems have even more complex patterns to
    support recovery from more severe tape damage

linear recording
  • linear recording
  • tape passes over fixed head
  • multiple track read write
  • serpentine dual-directional recording
  • head unit
  • dual-directional
  • low head-medium contact pressure
  • multi-channel head array

linear recording
  • media issues
  • tape roughness, head contact, surface wear, dust
  • tape path complexity, tension gt distortion
  • lateral expansion/contraction with environmental
  • reel sag in long term storage

head array
tape has expanded laterally since it was recorded
helical scan
  • developed for entertainment business
  • high end market in broadcasting
  • mass market in domestic VCR
  • tape moves slowly past rapidly spinning head on

helical scan
  • head wear problems due to tape contact pressure
  • helical path controlled using tape edge -
    requires very accurate slitting in manufacture
  • edge damage, tape warp cause track curving
  • linear tapes reserve a guard band at the edges
  • historically helical scan has had a higher track
    density than linear
  • 2800 tracks per inch helical
  • 7-800 tracks per inch linear
  • but linear tape is improving track density with
    MR heads, track following technology

data compression
  • an advantage of sequential access over random
    access disks is that the device can implement
    data compression
  • digital Lempel-Ziv 1 algorithm
  • replaces variable length phrases with code words
  • enhanced LZ 1 algorithm (e.g. StorageTek 9840)
    can give up to four times compression on
    commercial data, 2 times on pre-compressed
    physics data

the recording channel
write channel
read channel
channel complexity can increase with improved
ASIC technology
9840 Mechanism
Head 23 Patents Pending 1 Patent
Issued Mechanism 10 Patents Pending
1 Patent Issued
Reel Motor
Operator Panel
  • 1/2 tape in IBM 3480 form factor
  • MP on PEN medium
  • 288 tracks
  • 16 parallel heads ( ? 18 stripes )
  • 2 metres/second past head - 10 MB/sec data rate
  • cassette (2 reel) with tape unloaded at mid point
  • tape path entirely in cassette
  • 4 sec load
  • 900 feet of tape ( 274 metres )
  • 8 sec average search
  • 16 sec max rewind
  • 20 Gbytes user data (uncompressed)
  • LZ-1 enhanced compression

Cartridge 6 Patents Pending 1
Patent Issued
(No Transcript)
current trends
Many new drives Several aggressive road
maps Major application is backup Expect strong
competition at the low end from optical
scheduled for 2000
Optical Recording
  • The historical advantage of optical over magnetic
    technology was the potential recording density
  • Red laser -- spot size 0.4? diameter 5
  • Many high end products - but never gave real
    competition to magnetic products
  • performance, cost
  • niche market for write-once applications
  • magnetic disk has now reached or exceeded optical
    recording densities
  • BUT for the first time we see real competition
    from low-end mass market products CD-R, DVD-R
    and DVD-RAM

Write Once - CD-R DVD-R
  • preformed polycarbonate substrate
  • wobbled groove to guide and clock laser
  • photo/heat sensitive dye layer
  • cyanine
  • reflection layer
  • gold
  • laser spot heats dye, changes its structure which
    in turn deforms the substrate
  • read-out laser is absorbed/scattered by the

  • laser system
  • ? 640 nm numerical aperture 0.6 refractive
    index 0.8
  • spot diameter 0.4 ?
  • capacity of side 4.7GB
  • 1.3 MB/sec record read speed
  • Prices (Panasonic)
  • 5.4K for the drive
  • 35 double sided media ( 3.90 / GB)
  • (a CD-R 640 MB disk costs about 1 in quantity)

Erasable DVD-RAM
  • phase change recording layer - TeGeSb
  • heated by laser spot
  • high power write fast melt-cool cycle leaves
    amorphous spot with low reflectivity
  • lower power erase slower melt-cool cycle leaves
    crystalline spot with high reflectivity
  • read-out - low power laser
  • land groove recording

  • capacity 2.6 GB per side
  • single layer only, unlike DVD-ROM
  • 4.7 GB per side in version 2 due in 2000
  • record and read-back performance - 1.3 MB/sec
  • access time 210 ms
  • 1999 prices
  • drive 640
  • double sided disk (5.2 GB) 35 (6.70 per GB)
  • With high volume
  • could we expect media costs to come down to 1-2
    per disk (like CD-R today)?
  • giving 0.2 per GB

exotic storage technologies
  • holography
  • atomic force microscopy
  • Keele Ultra High Density Memory

holographic storage
graphic Byte Magazine
atomic force microscopy
  • atomic force microscopy applied to data storage
    by IBM
  • sharp tip mounted on a micro-mechanical
    cantilever made from silicon nitride
  • heat pressure applied as it is passed over
    plastic substrate
  • read-out - the cantilevertip are scanned over
    the surface
  • 45 GB/in2 demonstrated
  • 300 GB/in2 theoretically possible

pictures - IBM Research Almaden
Keele Ultra High Density Memory
  • Basic research done at Keele University, by
    emeritus professor Ted Williams (inventor of an
    NMR scanner in late 70s/early 80s)
  • The Keele Ultra High Density Memory uses magneto
    optical alloys to store 2.3 TeraBytes of user
    memory on a device the size of a credit card, but
    8.5 cm thick, for less than 50!
  • Uses optical techniques to store and retrive data
    in 3D storage
  • Multi-layer (3) recording
  • Could put 100 Gbytes in a wristwatch
  • All information on the technology controlled by a
    venture capital company - which says that
    licensing negotiations are under way with a large
    company - products can be expected in under 2

Robotics - no problem but prices are best at the
65 per 9.4GB slot 7/GB
NSM jukebox 620 DVDs
20 per 50GB slot 0.4/GB
Part II - LHC requirements solutions
  • summary of the requirements of the LHC
  • strawman LHC computing farm
  • cost factors an attempt to estimate the costs of
    storage in 2005
  • conclusions

LHC storage requirements
  • summary of the storage requirements of the LHC
  • but this is just part of the computing fabric
  • which also includes processing and networking

Data Recording and Offline Computing Facilities
at CERN - for LHC experiments
  • For each LHC experiment capacity at CERN is
    needed for
  • Data Recording
  • First-pass reconstruction
  • Some re-processing
  • Basic Analysis (pass-1 pass-2) - ESD ? AODTAG
  • Support for a few analysis groups
    (ATLASCMS 4 groups, 100/1600 physicists)
  • Good external networking
  • Current assumption is that this would be
    complemented with a few large regional centres
    together providing about as much computing
    capacity as at CERN

raw data ? ESD
Capacity Estimates
  • Estimate uses figures from CMS in mid-98 ATLAS
    would be similar
  • Raw data is recorded at 100 MB/sec

  • 1015 Bytes
  • 1,000 TeraBytes
  • 20,000 Redwood tapes
  • 30,000 Cheetah 36 disks
  • 100,000 dual-sided DVD-RAM disks
  • 1,500,000 sets of the Encyclopaedia Britannica
    (w/o photos)

disk capacity v. data rate
CERN physics 1999 12 MB/sec-per-TB
CMS 2006 74 MB/sec-per-TB
  • ALICE requires a much higher data recording rate
    than ATLAS or CMS
  • 1 GB/sec - during the 1-2 month ions run
  • Total raw data 1 PByte per year
  • Tape data rates may remain modestly in the
    15-20 MB/sec range
  • Requiring a nominal 50-70 drives in practice
    100-150 drives and some good storage management
  • This problem will be addressed by Fabrizio in his

storage network
12 Gbps
………… 5600 processors 1400 boxes 160 clusters 40
1.5 Gbps
0.8 Gbps
6 Gbps
8 Gbps
24 Gbps
farm network
960 Gbps
0.8 Gbps (daq)
100 drives
CMS Offline Farm at CERN circa 2006
LAN-WAN routers
250 Gbps
storage network
5 Gbps
0.8 Gbps
0.5 M SPECint95 0.5 PByte disk
5400 disks 340 arrays ……...
lmr for Monarc study- april 1999
Is there a problem?
  • Because HEP computing has the property of event
    independence we can process any number of events
    in parallel and so we can use real commodity
    components (well, maybe not for tertiary
    storage) nothing special - just lots of them
  • The technology is looking good
  • but there are two small problems which come from
    the scale
  • -- Cost
  • -- Management
  • Fabrizio will talk about the storage management
  • but note that the management problem applies
    across the board -
  • processors, network, storage, workflow, WAN

Cost evolution
  • cost factors
  • development costs
  • production costs ?
  • technology
  • market volume
  • marketing costs
  • distribution costs
  • price factors
  • production costs
  • profit
  • competition
  • the best technology often does not win

Share of Hard Disk Market Units shipped in 1998
1998 145M disks sold - total revenue 30
Bn 110M in PCs (IDE) 30M SCSI/FCAL - mostly
storage systems which generated
13Bn revenues
prices paid by CERN compared with 35 evolution
since 1990 simple disk arrays (JBOD)
How much should we budget for hard disk?
  • So we are reasonably happy that LHC can use
    inexpensive disk, and that the prices will
    continue to decrease steadily
  • To minimise data loss and other operational
    problems associated with failing disks, we will
    use RAID. Today RAID systems come with a
    substantial price penalty, but we can expect that
    in 2005-06 we shall only have to pay for the
    redundant disk capacity.
  • Bottom line At an estimated 4-8/GByte the
    500TB needed by CMS will cost 2-4M

tape price evolution
  • Estimating the cost of magnetic tape is not
    nearly so easy.

Total revenues 5Bn 0.5 linear devices
DLT, 3590, 9840, 3570 0.5 helical Redwood 19mm
helical AMPEX, Sony D1 8mm helical
EXABYTE, Sony AIT 4mm helical DAT
  • As we saw earlier, DVD-R and DVD-RAM have the
    potential to provide a very convenient way of
    archiving modest amounts of data - 5-10 GBytes -
    at a modest data rate (1.4 MB/sec).
  • DVD is a random access device - offering a
    significantly different functionality from
    sequential access tape.
  • The cost today for a DVD-RAM disk is a few per
    GB, rather similar to the cost of 8mm, 4mm tape.
  • With a little improvement in the cost of the
    DVD-RAM drive - DVD-RAM could destroy the
    market for low-end tape (home, small office
    backup archive)

Data Centre Tapes
  • But we are concerned with data centre tapes -
    0.5 linear Where performance, capacity,
    robotics, …. are important factors
  • But so is overall cost which today, for ATLAS
    or CMS would be dominated by the media cost!
  • ALICE is a bit different

Can we estimate how tape costs will evolve?
  • NO - we cannot estimate - only guess for media
    - which dominates the overall cost
  • Cost of high quality drives will not change much
  • Cost of a robot slot will not change (but we may
    see competitive pricing for DLT format robots)

CHF per GB of data
CHF per foot of tape
log scale!
single supplier multiple suppliers
guesstimate for Magnetic Tape
  • Maybe the recording density increases by a factor
    of 4
  • So the cost of the media will fall to CHF 0.5 per
  • And a cartridge will hold 100 GB
  • The 5-year cost then works out at CHF 1/GB
  • Two problems for tapes
  • raw disk may be only 3 times more expensive
  • as we guessed earlier, DVD-RAM might be
    substantially cheaper (if there are suitably
    priced robotics!)

time to change the balance?
  • the classic model
  • use the disk as a cache of the active data, which
    is kept on tape
  • may not be the right one for LHC
  • we should consider using much more disk for
    all of the really active data
  • and using tape or something cheaper to archive
    the rest

conclusion (i)
  • disks OK
  • merging of magnetic and magneto-optical
    techniques will ensure that the technology can
    evolve smoothly well into the LHC time-frame
  • unlikely to be displaced as the standard for
    secondary storage
  • DVD - too slow, too small
  • holography - waiting for a material breakthrough
  • the rest are not on the LHC time-scale
  • robots OK

conclusion (ii)
  • ---- BUT tertiary storage is a problem
  • tape - reliability, cost, market - all
  • DVD - may be a solution if a healthy market
  • very likely to eliminate tape for low-end PC
  • could well compete on price reliability for
    data centre applications
  • but likely to remain low capacity, low
  • removable magnetic or magneto-optic disk may
    compete strongly with tape - but are not likely
    to be cheaper

conclusion (iii)
  • which may just give us the opportunity we need to
    change the analysis model
  • active data on disk
  • exchange data using random access DVDs
  • and use tape as the last resort - like the
    rest of the industry!
  • but how do you select the active raw data?
About PowerShow.com