Description of problem: the untar for linux kernel tarball is going on from more then 5 hours, the performance is pathetic volume info [root@bigbend1 ~]# gluster volume info Volume Name: dist-rep Type: Distributed-Replicate Volume ID: 5912c35e-ebff-4d5d-9258-a5426b14a6e3 Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.37.155:/rhs/brick1/d1r1 Brick2: 10.70.37.100:/rhs/brick1/d1r2 Brick3: 10.70.37.121:/rhs/brick1/d2r1 Brick4: 10.70.37.211:/rhs/brick1/d2r2 Brick5: 10.70.37.155:/rhs/brick1/d3r1 Brick6: 10.70.37.100:/rhs/brick1/d3r2 Brick7: 10.70.37.121:/rhs/brick1/d4r1 Brick8: 10.70.37.211:/rhs/brick1/d4r2 Brick9: 10.70.37.155:/rhs/brick1/d5r1 Brick10: 10.70.37.100:/rhs/brick1/d5r2 Brick11: 10.70.37.121:/rhs/brick1/d6r1 Brick12: 10.70.37.211:/rhs/brick1/d6r2 Options Reconfigured: server.root-squash: disable [root@rhsauto020 ~]# mount | grep nfs sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 10.70.34.114:/opt on /opt type nfs (rw,vers=4,addr=10.70.34.114,clientaddr=10.70.36.250) bigbend1:dist-rep on /mnt/nfs-test type nfs (rw,vers=3,addr=10.70.37.155) Version-Release number of selected component (if applicable): glusterfs-3.4.0.6rhs-1.el6rhs.x86_64 How reproducible: on latest bits Steps to Reproduce: 1. create a volume, start it 2. nfs mount 3. linux kernel compile script Actual results: untar going on from last five hours or more Expected results: performance is bad, this should have finished in an hour max Additional info:
I am also seeing performance issues with bonnie++. Normally my bonnie++ runs take from 30-40 minutes on gigabit lan, the current run has been going for over 16 hours. It was going so low that I thought there eas a crash or something, but when I straced the PID of bonnie I saw that is was still working, just slow. I tried to quantify this with dd and I didn't see bad performance: <Note the current volume> [root@storage-qe03 ~]# gluster volume info Volume Name: testvol Type: Distributed-Replicate Volume ID: 7eb4179b-3b27-4131-8887-589cebe078d1 Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick0 Brick2: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick1 Brick3: storage-qe10.lab.eng.rdu2.redhat.com:/bricks/testvol_brick2 Brick4: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick3 Brick5: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick4 Brick6: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick5 Brick7: storage-qe10.lab.eng.rdu2.redhat.com:/bricks/testvol_brick6 Brick8: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick7 Brick9: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick8 Brick10: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick9 Brick11: storage-qe10.lab.eng.rdu2.redhat.com:/bricks/testvol_brick10 Brick12: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick11 <now for some dds> [root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; time `dd if=/dev/zero of=./myfile.test bs=1024k count=1000; sync` 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 18.7489 s, 55.9 MB/s real 0m19.050s user 0m0.001s sys 0m0.731s 1 GB in 19 seconds works out to about 53 MB/second which is not bad, actually pretty close to maxing a 2 way replica over gigabit. I thought maybe bonnie was using a smaller block size so I tried: [root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; time `dd if=/dev/zero of=./myfile.test bs=1024 count=1000000; sync` 1000000+0 records in 1000000+0 records out 1024000000 bytes (1.0 GB) copied, 18.9923 s, 53.9 MB/s real 0m19.191s user 0m0.072s sys 0m1.406s That is really close even with the small block size. I would think that with the time it was taking bonnie to finish single threaded I/O would be in the kilobytes. Now what about reads: [root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; dd if=./myfile.test of=/dev/null bs=1024k 976+1 records in 976+1 records out 1024000000 bytes (1.0 GB) copied, 9.22086 s, 111 MB/s [root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; dd if=./myfile.test of=/dev/null bs=1024 1000000+0 records in 1000000+0 records out 1024000000 bytes (1.0 GB) copied, 9.19747 s, 111 MB/s Reads are also really close to what I would expect. This has to be something other than normal I/O. Per BZ #961929 the next thing I tried was disabling eager locking. During one of my bonnie++ runs I ran: # gluster volume set testvol eager-lock disable Instantly the test started going faster and finished in the next several minutes. Next I tried a kernel compile and that also finished in a reasonable amount of time. It looks like Amar was correct and this has something todo with eager locking being enabled.
I may have spoke too soon on the eager lock disable with bonnie++. I re ran bonnie++ with eager lock disabled from the beginning and I am still seeing very poor performance. I am about to leave for the day and bonnie has already been running 4 hours. I was never able to getto my compile kernel tests as the crash fixed in 3.4.0.7 was preventing me from running that test in my overnights. Looking back at my tests from last night my kernel compile(which includes an untar) it only took about 30 minutes for the whole thing. I was under the assumption that Saurah and I were seeing the same performance problem here but that might not be the case. Should I open a different BZ to work what I am seeing in my performance tests or do we want to work everything here until we are sure that these aren't two different issues?
when i did an untar of the kernel source with eager-lock disabled, i saw through the "gluster volume profile" output that the write calls took more than 95% of the time on an average. They were all very small stable writes (32k), but still it seems write performance is very low. With eager-locking, it was the inodelk op which took the major time slice. writes came in second. Ben, if the performance issues you are seeing with eager-lock disabled are NOT write-related, you may open a different BZ.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html