Hide Forgot
Description of problem: ---------------------- As per the admin guide,the tuned-adm profile tailored for random workloads is "rhgs-random-io".I see that the random write performance goes for a toss when nodes are tuned to it. Version-Release number of selected component (if applicable): ------------------------------------------------------------ glusterfs-3.7.9-1.el6rhs.x86_64 How reproducible: ---------------- 2/2 Steps to Reproduce: -------------------- 1. Tune all the nodes in the gluster to any profile,say "rhgs-sequential-io" or "throughput-performance".Run Random Write workload thrice. 2. Clean mount point.Switch to "rhgs-random-io".Restart glusterd.Remount volume on clients using FUSE. 3. Run random write workload again,thrice. Actual results: -------------- I see >50% performance hit on rand writes with "rhgs-random-io" as compared to other profiles on the same setup. Expected results: ----------------- RHGS should be equally or more performant on random writes when nodes are tuned to "rhgs-random-io" profile. Additional info: --------------- OS : RHEL 7.2 Iozone was used in a distributed multithreaded manner with a 2G file size ,record size of 64K and a total of 16 threads. Setup consisted of 4 servers,4 clients (1X mount per server) on 10GbE network. Volume Settings : [root@gqas001 ~]# gluster v info Volume Name: testvol Type: Distributed-Replicate Volume ID: 2a668beb-7f26-48f9-8550-157108fe1a55 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gqas001.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0 Brick2: gqas014.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1 Brick3: gqas015.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2 Brick4: gqas016.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3 Options Reconfigured: performance.readdir-ahead: on performance.stat-prefetch: off server.allow-insecure: on [root@gqas001 ~]# [root@gqas001 ~]#
***************** On RHEL 7.2 Setup ***************** Random Writes was ran thrice against each tuned profile.Mean throughputs for each of the profiles is given below: > rhgs-random-io : 133086.533333 KB/s > rhgs-sequential-io : 337787.506667 KB/s > throughput-performance : 356840.130000
I see the same problem on a RHEL 6.X setup : > rhgs-random-io : 146881.85 KB/s > rhgs-sequential-io : 326301.74 KB/s > throughput-performance : 366735.39 KB/s
The exact workload : 3 * [iozone -+m <Your Iozone conf file containing all hostnames here> -+h <one of the hostnames> -C -w -c -e -i 2 -J 3 -+n -r 64k -s 2g -t 16 ]
I'll update the BZ with server profiles soon.
I'll check this with 3.1.2 , just to see if it's regression and update my findings.
Here are the results of some fio tests varying vm.dirty* parameters. The fio tests have a sequential write test for which I used a jobfile like this: <body> [global] rw=write create_on_open=1 fsync_on_close=1 size=4g bs=64k openfiles=1 startdelay=0 ioengine=sync [lgf-write] directory=/mnt/glustervol/${HOSTNAME} nrfiles=1 filename_format=f.$jobnum.$filenum numjobs=8 </body> And a random write test, with a jobfile like this: <body> [global] rw=randwrite fsync_on_close=1 io_size=1g size=4g bs=64k openfiles=1 startdelay=0 ioengine=sync [lgf-randwrite] directory=/mnt/glustervol/${HOSTNAME} nrfiles=1 filename_format=f.$jobnum.$filenum numjobs=8 </body> So, the random write test accesses only a portion of the file (determined by io_size) instead of the whole file (determined by size). The tests are run from 4 clients (8 jobs on each client) to a 2x2 gluster volume on 4 servers. Here are results for different values of size and io_size, for different values of vm_dirty* parameters. I'm only reporting results for the randwrite test. size=4g, io_size=1g ------------------- vm.dirty_ratio=20; vm.dirty_background_ratio=10 write: io=32768MB, bw=79145K/s, iops=1236, runt=423963msec clat (usec): min=97, max=115652K, avg=21957.24, stdev=637489.85 vm.dirty_ratio=5; vm.dirty_background_ratio=2 write: io=32768MB, bw=58347K/s, iops=911, runt=575089msec clat (usec): min=116, max=14990K, avg=33738.72, stdev=108229.87 [in this case, the vm.dirty* values that correspond to rhgs-sequential-io is giving a big boost in throughput. but note that the max clat and clat stdev are much higher when the vm.dirty* values are higher] size=16g, io_size=1g ------------------- vm.dirty_ratio=20; vm.dirty_background_ratio=10 write: io=32768MB, bw=46383K/s, iops=724, runt=723428msec clat (usec): min=126, max=251072K, avg=36163.69,stdev=1386874.60 vm.dirty_ratio=5; vm.dirty_background_ratio=2 write: io=32768MB, bw=37976K/s, iops=593, runt=883569msec clat (usec): min=110, max=44342K, avg=50008.68, stdev=232231.87 size=16g, io_size=0.5g ------------------- vm.dirty_ratio=20; vm.dirty_background_ratio=10 write: io=16384MB, bw=42787K/s, iops=668, runt=392107msec clat (usec): min=119, max=299196K, avg=21617.41,stdev=2337747.27 vm.dirty_ratio=5; vm.dirty_background_ratio=2 write: io=16384MB, bw=41841K/s, iops=653, runt=400973msec clat (usec): min=118, max=25975K, avg=46318.03, stdev=222292.08 [in this case, there is hardly any difference in throughput. but like before the clat max and stdev are much lower for vm.dirty*=5,2.]
(In reply to Manoj Pillai from comment #9) > Here are the results of some fio tests varying vm.dirty* parameters. What I see from these results, is that when the workload is truly random, there is not much difference in throughput between the two profiles. However, by reducing the amount of dirty data buffering in server memory, rhgs-random-io should reduce unpleasant effects like long delays (that appear like system freeze-up) while dirty data is written back. The goal of the rhgs-random-io profile is not so much to boost throughput for random io (compared to rhgs-sequential-io), but to smoothen the latency spikes. My fio test with size=4g,io_size=1g and Ambarish's iozone test where the entire 2g file is overwritten in the random io phase are not particularly random during the writeback to disk. Individual iozone or fio writes will get buffered in the server page cache, and will often lead to multiple writes getting batched into larger writes before writeback. This effect will be more for rhgs-sequential-io, since it allows more dirty data to be buffered and therefore allows more batching of writes before writeback.