Bug 1000916

Summary: quota build 2: linux untar goes to "D+" state, during rebalance
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Saurabh <saujain>
Component: glusterdAssignee: Raghavendra G <rgowdapp>
Status: CLOSED ERRATA QA Contact: Saurabh <saujain>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: grajaiya, kparthas, mzywusko, rhs-bugs, saujain, sdharane, shmohan, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.34rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-27 15:32:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Saurabh 2013-08-26 06:15:15 UTC
Description of problem:
I had a 6x2 volume with quota enabled and 100GB of limit set

started linux untar and did add brick followed by rebalance.

after sometime, finding that the linux untar process is in "D+" state.

[root@rhsauto032 ~]# gluster volume quota dist-rep3 list /
                  Path                   Hard-limit Soft-limit   Used  Available
--------------------------------------------------------------------------------
/                                        100.0GB       90%       4.0GB  96.0GB


present volume status,
---------------------
[root@rhsauto033 ~]# gluster volume status dist-rep3
Status of volume: dist-rep3
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d1r
1-3							49167	Y	30085
Brick rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d1r
2-3							49152	Y	27893
Brick rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d2r
1-3							49152	Y	9693
Brick rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d2r
2-3							49152	Y	9538
Brick rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d3r
1-3							49168	Y	30096
Brick rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d3r
2-3							49153	Y	27904
Brick rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d4r
1-3							49153	Y	9704
Brick rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d4r
2-3							49153	Y	9549
Brick rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d5r
1-3							49169	Y	30107
Brick rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d5r
2-3							49154	Y	27915
Brick rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d6r
1-3							49154	Y	9715
Brick rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d6r
2-3							49154	Y	9560
Brick rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d2r
1-3-add							49158	Y	12608
Brick rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d2r
2-3-add							49158	Y	12437
NFS Server on localhost					2049	Y	30804
Self-heal Daemon on localhost				N/A	Y	30811
Quota Daemon on localhost				N/A	Y	30818
NFS Server on rhsauto034.lab.eng.blr.redhat.com		2049	Y	12620
Self-heal Daemon on rhsauto034.lab.eng.blr.redhat.com	N/A	Y	12627
Quota Daemon on rhsauto034.lab.eng.blr.redhat.com	N/A	Y	12634
NFS Server on 10.70.37.7				2049	Y	22667
Self-heal Daemon on 10.70.37.7				N/A	Y	22674
Quota Daemon on 10.70.37.7				N/A	Y	22681
NFS Server on rhsauto035.lab.eng.blr.redhat.com		2049	Y	12449
Self-heal Daemon on rhsauto035.lab.eng.blr.redhat.com	N/A	Y	12456
Quota Daemon on rhsauto035.lab.eng.blr.redhat.com	N/A	Y	12463
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    5119a3b0-3e9f-479b-8ad9-9e413df4821f              1


Version-Release number of selected component (if applicable):
glusterfs-rdma-3.4.0.20rhsquota2-1.el6rhs.x86_64
glusterfs-3.4.0.20rhsquota1-1.el6.x86_64
glusterfs-server-3.4.0.20rhsquota1-1.el6.x86_64
glusterfs-fuse-3.4.0.20rhsquota1-1.el6.x86_64


How reproducible:
trying rebalance on this build gives this issue.

Steps to Reproduce:
1. create a volume of 6x2 type, start it
2. enable quota
3. limit set of 100GB
4. mount over nfs
5. untar linux on the mount point



Actual results:
[root@rhsauto032 ~]# gluster volume rebalance dist-rep3 status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost             2398       347.4MB         16400             0          4330    in progress          1686.00
       rhsauto034.lab.eng.blr.redhat.com             3946       595.9MB         11857             0           360    in progress          1686.00
       rhsauto035.lab.eng.blr.redhat.com                0        0Bytes         25434             0             0    in progress          1686.00
       rhsauto033.lab.eng.blr.redhat.com                0        0Bytes         25436             0             0    in progress          1685.00
volume rebalance: dist-rep3: success:


but actually,
on client the linux untar is hung,

[root@rhsauto036 ~]# ps -auxww | grep tar
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root     19217  0.7  0.0 116012  1228 pts/2    D+   22:30   0:24 tar xvfj /opt/qa/tools/linux-2.6.31.1.tar.bz2
root     19276  0.0  0.0 103244   808 pts/0    S+   23:23   0:00 grep tar


Expected results:
rebalance should not give issues.

Additional info:
from client,
[root@rhsauto036 ~]#
[root@rhsauto036 ~]# service iptables status
iptables: Firewall is not running.

Comment 3 krishnan parthasarathi 2013-08-26 13:53:51 UTC
Could you attach the sosreport when the issue is seen?

Comment 4 Raghavendra G 2013-08-28 10:01:35 UTC
Saurabh,

Even I observed tar to be in D+ state. But it eventually completes. So, there is no frame loss. Also, if it were to be hung in a syscall, you would not be able to kill using SIGINT. So, I think its not a bug.

I observed untar to be slow. Can you please give verbose option to tar and confirm that its not a hang in a system call?

regards,
Raghavendra.

Comment 5 Raghavendra G 2013-08-28 16:28:09 UTC
tar succeeds with Build 3

Comment 6 shylesh 2013-10-09 04:25:59 UTC
Though I/O goes into D state eventually it finishes , so functionally it works . Marking as verified 3.4.0.33rhs-1.el6rhs.x86_64

Comment 7 errata-xmlrpc 2013-11-27 15:32:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html