IO Performance Case I
Edit me

IO Performance Case I

Sequetial Write

Env Info

  Host1 Host2 Host3
CPU Xeon(R) E-2288G CPU @ 3.70GHz (16 Cores) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 Cores) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 Cores)
Memory DDR4 2666 MHz 32G x 4 DDR4 2400 MHz 32G x 4 DDR4 2400 MHz 32G x 4
Raid Controller AVAGO MegaRAID SAS 9361-4i (1G Cache) HPE Smart Array P440 (4G Cache) HPE Smart Array P440 (4G Cache)
SSD INTEL SSDSC2KB960G8 (D3-S4510 Series) INTEL SSDSC2KG96 (D3-S4610 Series) & INTEL SSDSC2KG96 (DC S4600 Series) INTEL SSDSC2KG96 (D3-S4610 Series) & INTEL SSDSC2KG96 (DC S4600 Series)
RAID Info 4 SSDs → RAID0 6 SSDs → RAID5 6 SSDs → RAID0
Filesystem EXT4 XFS XFS
Mountpoint /var /var /var

Test Script

#! /bin/bash
 
sn=$1
 
iotop -b -o -t -d1 > iotop.$sn &
iostat -tkx 1 sda > iostat.$sn &
 
echo 3 > /proc/sys/vm/drop_caches
( time dd if=/dev/zero of=/var/tmp/test.img bs=1M count=20000 oflag=direct ) |& tee iooutput.$sn
rm -vf /var/tmp/test.img
 
kill -9 $(ps aux | awk '/iotop/{print $2}' | head -1)
kill -9 $(ps aux | awk '/iostat/{print $2}' | head -1)

Result (finish time in Seconds)

  Host1 Host2 Host3
Avg Time to Write 20GB 20.6339 Seconds 13.8235 Seconds 12.4453 Seconds

iostat/iotop collected

Conclusion

XFS is better on throughput test scenario than EXT4

Random Write

Env Info

  Host1 Host2 (CPU on ondemand) Host2 (CPU on performance) Host3
CPU Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 Cores) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 Cores) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 Cores) Xeon(R) E-2288G CPU @ 3.70GHz (16 Cores)
CPU Governor performance ondemand performance powersave
Memory DDR4 2400 MHz 32G x 4 DDR4 2400 MHz 32G x 4 DDR4 2400 MHz 32G x 4 DDR4 2666 MHz 32G x 4
Raid Controller AVAGO MegaRAID SAS 9361-8i (1G Cache) HPE Smart Array P440 (4G Cache) HPE Smart Array P440 (4G Cache) AVAGO MegaRAID SAS 9361-4i (1G Cache)
SSD INTEL SSDSC2KG96 (D3-S4610 Series) INTEL SSDSC2KG96 (D3-S4610 Series) INTEL SSDSC2KG96 (D3-S4610 Series) INTEL SSDSC2KB960G8 (D3-S4510 Series)
RAID Info 4 SSDs → RAID0 4 SSDs → RAID0 4 SSDs → RAID0 4 SSDs → RAID0
Filesystem EXT4 EXT4 EXT4 EXT4
Mountpoint /export /export /export /var

How-to Run Test

FIO Profile

$ cat randw.fio
[global]
        ioengine=libaio
        invalidate=1
        direct=1
        iodepth=20
        ramp_time=30
        random_generator=tausworthe64
        randrepeat=0
        verify=0
        verify_fatal=0
        runtime=300
        exitall=1

[rand]
        filename=/export/test.img
        size=10G
        rw=randwrite
        bs=${blocksize}
        numjobs=${threads}

Test Script

$ cat test.sh
#! /bin/bash

blockdevice='sdb'
blocksize=$1
threads=$2
sn=$3

export blocksize
export threads
output_dir="${blocksize}-${threads}t/"

if [ ! -d ${output_dir} ]; then
        mkdir ${output_dir}
fi

echo "====== ${blocksize}/${threads}threads Test Round ${sn} started ======"

echo 'noop' > /sys/block/${blockdevice}/queue/scheduler
cat /sys/block/sdb/queue/scheduler

find /sys/devices/system/cpu -maxdepth 3 -name scaling_governor | xargs -I {} sh -c 'echo performance > {}'
cpucount=$(lscpu | awk '/^CPU\(s\)/{print $NF}')
performance_count=$(for i in `seq 0 $(expr ${cpucount} - 1)`; do cat /sys/devices/system/cpu/cpu${i}/cpufreq/scaling_governor; done | grep 'performance'  | wc -l)

if [ ${performance_count} -ne ${cpucount} ]; then
        echo "something wrong with cpu setting"
else
        echo "${cpucount} cpus are running on performance state"
fi

iotop -b -o -t -d1 > ${output_dir}/iotop.$sn &
iotop_pid=$!
iostat -tkx 1 ${blockdevice} > ${output_dir}/iostat.$sn &
iostat_pid=$!

echo 3 > /proc/sys/vm/drop_caches
rm -vf /export/test.img
fio randw.fio > ${output_dir}/fio.result.$sn

echo "====== ${blocksize}/${threads}threads Test Round ${sn} ended ======"

kill -9 ${iotop_pid} > /dev/null 2>&1
kill -9 ${iostat_pid} > /dev/null 2>&1

unset blocksize
unset threads
unset iotop_pid
unset iostat_pid

Test CMD with 32threads

$ for j in `seq 1 3`; do for i in {4,8,16}; do ./test.sh ${i}k 32 $

CMD to collect result

$ $ for i in `seq 1 3`; do  grep -R '^\s*write' 4k-32t/fio.result.$i | awk -F',' '{print $3}' | awk -F'=' 'BEGIN{SUM=0}{SUM+=$2}END{print SUM}'; done | awk 'BEGIN{SUM=0}{SUM+=$0}END{print SUM/NR}'

Result (in IOPS with different blocksize)

  4k 8k 16k
Host1 159703 124112 58969.3
Host2 (CPU on ondemand) 34277.9 55004.9 50909.5
Host2 (CPU on performance) 178449 112723 54417
Host3 142179 115600 36361.3

fio/iostat/iotop collected

Conclusion

CPU freqency is crucial for io performance.

Random Write (io_scheduler / filesystem)

Env info

  Host Host Host
CPU Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 Cores) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 Cores) Xeon(R) CPU E5-2650 v4 @ 2.20GHz (48 Cores)
CPU Governor performance performance performance
Memory DDR4 2400 MHz 32G x 4 DDR4 2400 MHz 32G x 4 DDR4 2400 MHz 32G x 4
Raid Controller HPE Smart Array P440 (4G Cache) HPE Smart Array P440 (4G Cache) HPE Smart Array P440 (4G Cache)
SSD INTEL SSDSC2KG96 (D3-S4610 Series) INTEL SSDSC2KG96 (D3-S4610 Series) INTEL SSDSC2KG96 (D3-S4610 Series)
RAID Info 6 SSDs → RAID0 6 SSDs → RAID0 6 SSDs → RAID0
Filesystem EXT4 EXT4 XFS
IO Scheduler noop cfq noop
Mountpoint /export /export /export

Result (iops with 32threads)

  4k 8k 16k
Host(ext4/cfq) 44711.7 40876.7 34597.7
Host(ext4/noop) 186779 155757 111132
Host(xfs/noop) 138856 145067 111602

fio/iostat/iotop collected

Conclusion

  • For SSD, io scheduler noop is much better than cfq, which is default io scheduler in CentOS7/RHEL7.
  • XFS performance on random write is poor than EXT4, especialy io blocksize is smaller.
Tags: storage