Bug 962510
Summary: | nfs: performance of untar of llinux kernel tarball is pathetic | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Saurabh <saujain> | |
Component: | glusterd | Assignee: | Pranith Kumar K <pkarampu> | |
Status: | CLOSED ERRATA | QA Contact: | Saurabh <saujain> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 2.1 | CC: | amarts, ansubram, bturner, mzywusko, pkarampu, rcyriac, rhs-bugs, spradhan, vbellur | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 966018 (view as bug list) | Environment: | ||
Last Closed: | 2013-09-23 22:39:46 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 966018 |
Description
Saurabh
2013-05-13 17:53:01 UTC
I am also seeing performance issues with bonnie++. Normally my bonnie++ runs take from 30-40 minutes on gigabit lan, the current run has been going for over 16 hours. It was going so low that I thought there eas a crash or something, but when I straced the PID of bonnie I saw that is was still working, just slow. I tried to quantify this with dd and I didn't see bad performance: <Note the current volume> [root@storage-qe03 ~]# gluster volume info Volume Name: testvol Type: Distributed-Replicate Volume ID: 7eb4179b-3b27-4131-8887-589cebe078d1 Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick0 Brick2: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick1 Brick3: storage-qe10.lab.eng.rdu2.redhat.com:/bricks/testvol_brick2 Brick4: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick3 Brick5: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick4 Brick6: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick5 Brick7: storage-qe10.lab.eng.rdu2.redhat.com:/bricks/testvol_brick6 Brick8: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick7 Brick9: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick8 Brick10: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick9 Brick11: storage-qe10.lab.eng.rdu2.redhat.com:/bricks/testvol_brick10 Brick12: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick11 <now for some dds> [root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; time `dd if=/dev/zero of=./myfile.test bs=1024k count=1000; sync` 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 18.7489 s, 55.9 MB/s real 0m19.050s user 0m0.001s sys 0m0.731s 1 GB in 19 seconds works out to about 53 MB/second which is not bad, actually pretty close to maxing a 2 way replica over gigabit. I thought maybe bonnie was using a smaller block size so I tried: [root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; time `dd if=/dev/zero of=./myfile.test bs=1024 count=1000000; sync` 1000000+0 records in 1000000+0 records out 1024000000 bytes (1.0 GB) copied, 18.9923 s, 53.9 MB/s real 0m19.191s user 0m0.072s sys 0m1.406s That is really close even with the small block size. I would think that with the time it was taking bonnie to finish single threaded I/O would be in the kilobytes. Now what about reads: [root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; dd if=./myfile.test of=/dev/null bs=1024k 976+1 records in 976+1 records out 1024000000 bytes (1.0 GB) copied, 9.22086 s, 111 MB/s [root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; dd if=./myfile.test of=/dev/null bs=1024 1000000+0 records in 1000000+0 records out 1024000000 bytes (1.0 GB) copied, 9.19747 s, 111 MB/s Reads are also really close to what I would expect. This has to be something other than normal I/O. Per BZ #961929 the next thing I tried was disabling eager locking. During one of my bonnie++ runs I ran: # gluster volume set testvol eager-lock disable Instantly the test started going faster and finished in the next several minutes. Next I tried a kernel compile and that also finished in a reasonable amount of time. It looks like Amar was correct and this has something todo with eager locking being enabled. I may have spoke too soon on the eager lock disable with bonnie++. I re ran bonnie++ with eager lock disabled from the beginning and I am still seeing very poor performance. I am about to leave for the day and bonnie has already been running 4 hours. I was never able to getto my compile kernel tests as the crash fixed in 3.4.0.7 was preventing me from running that test in my overnights. Looking back at my tests from last night my kernel compile(which includes an untar) it only took about 30 minutes for the whole thing. I was under the assumption that Saurah and I were seeing the same performance problem here but that might not be the case. Should I open a different BZ to work what I am seeing in my performance tests or do we want to work everything here until we are sure that these aren't two different issues? when i did an untar of the kernel source with eager-lock disabled, i saw through the "gluster volume profile" output that the write calls took more than 95% of the time on an average. They were all very small stable writes (32k), but still it seems write performance is very low. With eager-locking, it was the inodelk op which took the major time slice. writes came in second. Ben, if the performance issues you are seeing with eager-lock disabled are NOT write-related, you may open a different BZ. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html |