Bug 962510

Summary:	nfs: performance of untar of llinux kernel tarball is pathetic
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Saurabh <saujain>
Component:	glusterd	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED ERRATA	QA Contact:	Saurabh <saujain>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	2.1	CC:	amarts, ansubram, bturner, mzywusko, pkarampu, rcyriac, rhs-bugs, spradhan, vbellur
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	966018 (view as bug list)		Environment:
Last Closed:	2013-09-23 22:39:46 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	966018

Description Saurabh 2013-05-13 17:53:01 UTC

Description of problem:

the untar for linux kernel tarball is going on from more then 5 hours,

the performance is pathetic

volume info
[root@bigbend1 ~]# gluster volume info
 
Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 5912c35e-ebff-4d5d-9258-a5426b14a6e3
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.155:/rhs/brick1/d1r1
Brick2: 10.70.37.100:/rhs/brick1/d1r2
Brick3: 10.70.37.121:/rhs/brick1/d2r1
Brick4: 10.70.37.211:/rhs/brick1/d2r2
Brick5: 10.70.37.155:/rhs/brick1/d3r1
Brick6: 10.70.37.100:/rhs/brick1/d3r2
Brick7: 10.70.37.121:/rhs/brick1/d4r1
Brick8: 10.70.37.211:/rhs/brick1/d4r2
Brick9: 10.70.37.155:/rhs/brick1/d5r1
Brick10: 10.70.37.100:/rhs/brick1/d5r2
Brick11: 10.70.37.121:/rhs/brick1/d6r1
Brick12: 10.70.37.211:/rhs/brick1/d6r2
Options Reconfigured:
server.root-squash: disable
 
[root@rhsauto020 ~]# mount | grep nfs
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
10.70.34.114:/opt on /opt type nfs (rw,vers=4,addr=10.70.34.114,clientaddr=10.70.36.250)
bigbend1:dist-rep on /mnt/nfs-test type nfs (rw,vers=3,addr=10.70.37.155)

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.6rhs-1.el6rhs.x86_64

How reproducible:
on latest bits

Steps to Reproduce:
1. create a volume, start it
2. nfs mount
3. linux kernel compile script
  
Actual results:
untar going on from last five hours or more

Expected results:
performance is bad, this should have finished in an hour max

Additional info:

Comment 4 Ben Turner 2013-05-14 15:50:50 UTC

I am also seeing performance issues with bonnie++.  Normally my bonnie++ runs take from 30-40 minutes on gigabit lan, the current run has been going for over 16 hours.  It was going so low that I thought there eas a crash or something, but when I straced the PID of bonnie I saw that is was still working, just slow.  I tried to quantify this with dd and I didn't see bad performance:

<Note the current volume>
[root@storage-qe03 ~]# gluster volume info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 7eb4179b-3b27-4131-8887-589cebe078d1
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick0
Brick2: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick1
Brick3: storage-qe10.lab.eng.rdu2.redhat.com:/bricks/testvol_brick2
Brick4: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick3
Brick5: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick4
Brick6: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick5
Brick7: storage-qe10.lab.eng.rdu2.redhat.com:/bricks/testvol_brick6
Brick8: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick7
Brick9: storage-qe03.lab.eng.rdu2.redhat.com:/bricks/testvol_brick8
Brick10: storage-qe04.lab.eng.rdu2.redhat.com:/bricks/testvol_brick9
Brick11: storage-qe10.lab.eng.rdu2.redhat.com:/bricks/testvol_brick10
Brick12: storage-qe11.lab.eng.rdu2.redhat.com:/bricks/testvol_brick11

<now for some dds>
[root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; time `dd if=/dev/zero of=./myfile.test bs=1024k count=1000; sync`
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 18.7489 s, 55.9 MB/s

real	0m19.050s
user	0m0.001s
sys	0m0.731s

1 GB in 19 seconds works out to about 53 MB/second which is not bad, actually pretty close to maxing a 2 way replica over gigabit.  I thought maybe bonnie was using a smaller block size so I tried:

[root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; time `dd if=/dev/zero of=./myfile.test bs=1024 count=1000000; sync`
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 18.9923 s, 53.9 MB/s

real	0m19.191s
user	0m0.072s
sys	0m1.406s

That is really close even with the small block size.  I would think that with the time it was taking bonnie to finish single threaded I/O would be in the kilobytes.  Now what about reads:

[root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; dd if=./myfile.test of=/dev/null bs=1024k 
976+1 records in
976+1 records out
1024000000 bytes (1.0 GB) copied, 9.22086 s, 111 MB/s
[root@storage-qe06 gluster-mount]# echo 3 > /proc/sys/vm/drop_caches; dd if=./myfile.test of=/dev/null bs=1024
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 9.19747 s, 111 MB/s

Reads are also really close to what I would expect.  This has to be something other than normal I/O.  

Per BZ #961929 the next thing I tried was disabling eager locking.  During one of my bonnie++ runs I ran:

# gluster volume set testvol eager-lock disable

Instantly the test started going faster and finished in the next several minutes.  Next I tried a kernel compile and that also finished in a reasonable amount of time.  It looks like Amar was correct and this has something todo with  eager locking being enabled.

Comment 5 Ben Turner 2013-05-14 21:46:54 UTC

I may have spoke too soon on the eager lock disable with bonnie++.  I re ran bonnie++ with eager lock disabled from the beginning and I am still seeing very poor performance.  I am about to leave for the day and bonnie has already been running 4 hours.  I was never able to getto my compile kernel tests as the crash fixed in 3.4.0.7 was preventing me from running that test in my overnights.  Looking back at my tests from last night my kernel compile(which includes an untar) it only took about 30 minutes for the whole thing.  I was under the assumption that Saurah and I were seeing the same performance problem here but that might not be the case.  Should I open a different BZ to work what I am seeing in my performance tests or do we want to work everything here until we are sure that these aren't two different issues?

Comment 6 Rajesh 2013-05-15 09:12:16 UTC

when i did an untar of the kernel source with eager-lock disabled, i saw through the "gluster volume profile" output that the write calls took more than 95% of the time on an average. They were all very small stable writes (32k), but still it seems write performance is very low.

With eager-locking, it was the inodelk op which took the major time slice. writes came in second.

Ben, if the performance issues you are seeing with eager-lock disabled are NOT write-related, you may open a different BZ.

Comment 9 Scott Haines 2013-09-23 22:39:46 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 10 Scott Haines 2013-09-23 22:43:48 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html