Testing Network Attached Storage Presenting the Fermi Disk Test Suite and some Preliminary Results - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Testing Network Attached Storage Presenting the Fermi Disk Test Suite and some Preliminary Results

Description:

TEST_RUNS Space separated list of numbers of nodes to use for each cluster test ... In this test the nodes serve as servers and clients at the same time. ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 31
Provided by: Brew66
Category:

less

Transcript and Presenter's Notes

Title: Testing Network Attached Storage Presenting the Fermi Disk Test Suite and some Preliminary Results


1
Testing Network Attached StoragePresenting the
Fermi Disk Test Suite and some Preliminary Results
  • C. Brew, L. Giacchetti, J. Kaiser, H. Wenzel R.
    Pasetes.
  • Fermilab
  • http//www-oss.fnal.gov/projects/disksuite/

2
Dealing With Demo Systems
3
Aim
  1. To test the overall performance of some of the
    Network Attached Storage Systems becoming
    available.
  2. To develop a suite of tests and a procedure that
    can be used for objective comparisons between
    these different NAS devices independently of the
    technology being used.

4
How Would We Use The Storage?
  • Interactive Hot Data Store
  • Batch Farm Shared Disk
  • Home Area Server

5
Interactive Hot Data Store
  • Shared disk on an Interactive Analysis farm where
    users/admins can put the heavily used data to
    save copying it from mass storage
  • Requires
  • Support for large numbers of clients
  • Moderate throughput on Writes
  • High Throughput on Reads

6
Batch Farm Shared Disk
  • Shared disk for a Batch farm for storage of
    executables and configuration file. Also as a
    work disk where data sets copied out of Mass
    Storage are worked on.
  • Requires
  • Support for very large number of clients
  • Ability to handle large numbers of simultaneous
    reads and writes
  • High Throughput on Reads and Writes

7
Home Area Server
  • Need to look at future technologies for user home
    areas
  • Requires
  • Support for access by all FERMI OSs
  • Global Namespace
  • I/O Ops rate more important than overall
    throughput
  • Scaleable
  • Ability to back up/snapshot important areas
  • ACL and Kerberos support desirable

8
Fermilab Disk Test Suite
  • Set of scripts and binaries for running single
    client and cluster disk performance tests
  • Technology agnostic can test anything that
    presents as a file system to the clients
  • Supports multiple clients and multiple processes
    per client
  • Standard tools, Bash scripts, PERL Scripts,
    IOZone, Bonnie and some simple C programs
  • Linux, Solaris and Irix
  • Configuration via a simple text file
  • http//www-oss.fnal.gov/projects/disksuite/fdts.tg
    z

9
Tests
  • Performance Tests
  • Max Throughput Read and Write
  • Max Throughput Reading a Single File
  • Simultaneous Reads and Writes
  • Creation, Listing and Deletion of Large Numbers
    of Small Files
  • Data Integrity
  • Manageability Tests
  • Ease of setup
  • Ease to Reconfigure
  • Failure Tests
  • Fail various parts of the system and see what
    happens

10
FDTS Components
  • Benchmark Binaries
  • IOZone, Bonnie and Reader/Writer
  • Test Scripts
  • ops_cluster.sh, tput_cluster.sh and
    single_node.sh
  • Data Processing Scripts
  • proc_data_opstputsingle.pl
  • Internal Control Scripts
  • rfork, lfork, rw_control.pl and parse_config.sh
  • Configuration File
  • disksuite.conf

11
Benchmark Binaries
  • FDTS uses four benchmark binaries
  • Should be located in the bin directory called
    exe_name.uname
  • Reader/Writer
  • Used for throughput measurements
  • Simple C programs written at Fermilab
  • Uses C native read and write functions
  • Bonnie
  • Used for Operations Measurement
  • IOZone
  • Used for optional Data Integrity Test

12
Test Scripts
  • opstput_cluster.sh
  • Run the cluster operations and throughput tests
    respectively
  • Single_node.sh
  • Runs throughput and operations tests on a single
    node
  • Usage
  • opstput_cluster.sh config_file key
  • single_node.sh config_file key

13
Data Processing Scripts
  • Take the raw data produces by the Testing scripts
    and process to produce numbers and output the
    results in comma separated variable format
  • Usage
  • proc_data_ops.pl --key test_key --datadir
    results_dir --out output_file --debug
  • proc_data_tpt.pl --key test_key --datadir
    results_dir --out output_file --debug
  • proc_data_single.pl --key test_key --host
    hostname
  • --datadir results_dir --out output_file

14
Internal Control Scripts
  • rfork
  • Runs multiple copies of a command on remote nodes
  • lfork
  • Runs multiple copies of a command on the local
    node
  • rw_control.pl
  • Wrapper script for reader and writer
  • parse-config.sh
  • Contains the subroutine to parse the config file,
    called by the testing scripts
  • See http//www-oss.fnal.gov/projects/disksuite/con
    trol.html for a full write up

15
Configuration File (1)
  • NODEFILE File containing a list of nodes for the
    cluster tests, one node per line
  • HOSTBASE Host name prefix for the cluster tests.
    Ignored if NODEFILE is set
  • STARTNODE Host suffix to start at for the cluster
    tests. Ignored if NODEFILE is set
  • TEST_RUNS Space separated list of numbers of
    nodes to use for each cluster test
  • TEST_THREADS Space separated list of the number
    of processes to run on each node in the
    throughput tests
  • WORK_DIR Working directory, full path and must be
    visible from all clients
  • RESULTS_DIR Results directory, only need to
    visible from the node from which the tests are
    started
  • KERBEROS If "YES" switch on kerberos ticket and
    AFS token renewal
  • DEFUNCT Comma separated list of node suffixes to
    skip from the cluster tests. Ignored if NODEFILE
    is set
  • FILESIZE File size for the throughput tests

16
Configuration File (2)
  • BLOCKSIZE Block sizes for the throughput tests.
    Single node tests will take a space separated
    list, cluster tests will use just the first in
    that case
  • DF_TEST If "YES", monitor output of df and
    calculate write throughput from it
  • DF_KEY String to identify line in output of df
    that contains the working directory
  • OPS_FILES Number of thousands of files to create
    per node during the operations tests
  • OPS_MIN Minimum files size for files created
    during the operations tests (in bytes)
  • OPS_MAX Maximum files size for files created
    during the operations tests (in bytes)
  • OPS_DIRS Number of directories to create the
    files in during the operations tests.
  • DATA_INTEG If "YES", perform the iozone data
    integrity test on all the nodes after the
    operations test has finished

17
How to Measure Total Throughput?
  • Uses simple C programs to measure R/W speeds
  • We are testing Black Boxes so cannot run a
    process to monitor throughput on the server
    need to calculate it from the client side
  • Each client or process writes/reads 5 1GB files
    in succession and calculates throughput
    individually for each file
  • Calculate two measures of throughput
  • Sustained throughput
  • Overall Throughput

18
Measures of Throughput
  • Sustained Throughput
  • End time (T1) is defined as when the 1st client
    finishes the 5th file
  • All files stats included if finished before end
    time
  • Stats averaged on each node then summed across
    all nodes
  • Overall Throughput
  • (Total data written/read)/Total Time Taken (T2)
  • If a node has not completed any files, then
    the stats from the first file are included

19
Operations Tests
  • Storage companies always quote IO Ops per sec or
    NFS Ops per sec, what does that really mean? We
    still dont know!
  • We use Bonnie to measure
  • File Creates per sec
  • Files Stats per sec
  • File deletes per sec
  • For sequentially and randomly chosen files
  • Data Integrity
  • Uses 2 processes on all available clients to run
    IOZone in Verify mode writes pattern into
    test file and checks it during the read

20
Some NAS systems we tested
21
The Test Farm
64 dual AMD 1.9 in 1U formfactor The machines
are connected via fast ethernet to cisco 6509
switch.
22
Results (Throughput)
  • Zambeel, Spinnaker, Linux File Server
  • Linux Server did not drop clients
  • Zambeel had problems doing server side caching

23
Results (Operations)
  • Zambeel, Spinnaker, Linux File Server
  • Zambeel timed out on an average of 7 clients at
    50 clients

24
In Addition we also developed a test suite for
storage systems which dont access files through
a mounted File system. Usually these systems
stage files in and out using get and put
commands. In some cases the data can be accessed
directly from within an application via POSIX
compliant function calls (.e.g. TDCacheFile).
The processes are synchronized using the FBSNG
batch system.
25
dfarm is a product developed at Fermilab which
utilizes the data disks on the farm nodes.
  • Name space is organized into virtual file name
    space
  • Virtual path /E123/data/file.5 this is what
    user knows
  • Physical path fnpc221/local/stage2/XYZ123
    this is what disk farm knows so that user does
    not have to
  • User operates in familiar UNIX-like file name
    space using familiar commands
  • Solution for node unreliability problem
    replicate data
  • Make 2,3,4 copies of the file on different nodes
  • Data is easy to reproduce or has short life 1
    copy
  • Data is precious 2,5,10 copies
  • Disk Farm replicates data off-line
  • Remote access via GridFTP
  • Load sharing and control
  • Each node has a limit for the number of
    simultaneous reads/writes
  • Load is evenly distributed and optimized

26
Dfarm was installed on the testfarm that was
described on earlier slides. For this test we
used about 50 dual AMD 1.9 nodes with 80 GB of
local data disk. In this test the nodes serve as
servers and clients at the same time.
27
CD/ISD

File Transfer
ENSTORE (Hierarchical Storage Manager)
Disk Cache
Production
Personal Analysis
CMS- specific
Random Access Sequential Access

28
What do we expect from dCache
  • Making a multi-terabyte server farm look like one
  • coherent and homogeneous storage system.
  • Rate adaptation between the application and the
  • tertiary storage resources.
  • Optimized usage of tape robot systems and drives
  • by coordinated read and write requests.
  • No explicit staging is necessary to access the
    data
  • (but pre-staging possible).
  • The data access method is unique independent of
    where the data resides.
  • High performance and fault tolerant transport
    protocol
  • between applications and data servers
  • Fault tolerant, no specialized servers which can
    cause
  • severe downtime when crashing.
  • Can be accessed directly from application (e.g.
    root TDCacheFile class).
  • Can be used as scalable file store without HSM.
  • Remote access via GRIDFTP/SRM.

29
dCache
30
Conclusion
  • With these tools we have the basis of a test
    suite and procedure for comparing the different
    storage technologies that are becoming available
  • We have tested devices from Spinnaker and Zambeel
    along with a Linux Terabyte file server for
    comparison. We have tested dFarm and dCache. In
    the coming months we hope to test devices from
    Panasas and DataDirect. We also hope to work with
    DESY to run these tests on the Exanet they have
    been testing.
Write a Comment
User Comments (0)
About PowerShow.com