Bug 985783

Summary: quota+nfs : add-brick results in EIO
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Saurabh <saujain>
Component: glusterdAssignee: vpshastry <vshastry>
Status: CLOSED ERRATA QA Contact: Saurabh <saujain>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: asriram, kparthas, mzywusko, nsathyan, rhs-bugs, shaines, surs, vagarwal, vbellur, vshastry
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: glusterfs-v3.4.0.33rhs Doc Type: Bug Fix
Doc Text:
Previously, when the quota feature was enabled, the layout handling in DHT resulted in some errors for root inode. With this update, the interaction between quota and DHT has been modified to overcome this issue.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-27 15:29:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Saurabh 2013-07-18 08:55:38 UTC
Description of problem:

did add-brick operation while I/O was going on 
a volume having quota hard limit set.

NFS mountpoint returns, EIO

[root@nfs1 ~]# gluster volume info quota-dist-rep
 
Volume Name: quota-dist-rep
Type: Distributed-Replicate
Volume ID: c2c503b9-19cf-44ef-b468-c4f02e3b35c7
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.180:/rhs/bricks/quota-d1r1
Brick2: 10.70.37.80:/rhs/bricks/quota-d1r2
Brick3: 10.70.37.216:/rhs/bricks/quota-d2r1
Brick4: 10.70.37.139:/rhs/bricks/quota-d2r2
Brick5: 10.70.37.180:/rhs/bricks/quota-d3r1
Brick6: 10.70.37.80:/rhs/bricks/quota-d3r2
Brick7: 10.70.37.216:/rhs/bricks/quota-d4r1
Brick8: 10.70.37.139:/rhs/bricks/quota-d4r2
Brick9: 10.70.37.180:/rhs/bricks/quota-d5r1
Brick10: 10.70.37.80:/rhs/bricks/quota-d5r2
Brick11: 10.70.37.216:/rhs/bricks/quota-d6r1
Brick12: 10.70.37.139:/rhs/bricks/quota-d6r2
Options Reconfigured:
features.quota: on


Version-Release number of selected component (if applicable):
[root@nfs1 ~]# rpm -qa | grep glusterfs
glusterfs-3.4.0.12rhs.beta4-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.12rhs.beta4-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta4-1.el6rhs.x86_64
[root@nfs1 ~]# 

How reproducible:
tried brick operations with quota first time.

Steps to Reproduce:
1. create a volume 6x2, start it.
2. enable quota
3. set limit of 1 GB on "/"
4. mount the volume over nfs
5. create a dir
6. in the dir start creating files on size 1MB in a loop.
7. while data is getting created, add-brick,

[root@nfs1 ~]# gluster volume add-brick quota-dist-rep 10.70.37.180:/rhs/bricks/quota-d1r1-add 10.70.37.80:/rhs/bricks/quota-d1r2-add
volume add-brick: success
[root@nfs1 ~]# gluster volume status
Volume dist-rep is not started
 
Status of volume: quota-dist-rep
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.180:/rhs/bricks/quota-d1r1               49168   Y       29747
Brick 10.70.37.80:/rhs/bricks/quota-d1r2                49168   Y       26314
Brick 10.70.37.216:/rhs/bricks/quota-d2r1               49168   Y       12782
Brick 10.70.37.139:/rhs/bricks/quota-d2r2               49169   Y       26332
Brick 10.70.37.180:/rhs/bricks/quota-d3r1               49169   Y       29758
Brick 10.70.37.80:/rhs/bricks/quota-d3r2                49169   Y       26325
Brick 10.70.37.216:/rhs/bricks/quota-d4r1               49169   Y       12793
Brick 10.70.37.139:/rhs/bricks/quota-d4r2               49170   Y       26343
Brick 10.70.37.180:/rhs/bricks/quota-d5r1               49170   Y       29769
Brick 10.70.37.80:/rhs/bricks/quota-d5r2                49170   Y       26336
Brick 10.70.37.216:/rhs/bricks/quota-d6r1               49170   Y       12804
Brick 10.70.37.139:/rhs/bricks/quota-d6r2               49171   Y       26354
Brick 10.70.37.180:/rhs/bricks/quota-d1r1-add           49171   Y       31031
Brick 10.70.37.80:/rhs/bricks/quota-d1r2-add            49171   Y       27232
NFS Server on localhost                                 2049    Y       31043
Self-heal Daemon on localhost                           N/A     Y       31050
NFS Server on 10.70.37.216                              2049    Y       13715
Self-heal Daemon on 10.70.37.216                        N/A     Y       13722
NFS Server on 10.70.37.80                               2049    Y       27244
Self-heal Daemon on 10.70.37.80                         N/A     Y       27251
NFS Server on 10.70.37.139                              2049    Y       27257
Self-heal Daemon on 10.70.37.139                        N/A     Y       27264
 
There are no active volume tasks
[root@nfs1 ~]# 

Actual results:
from client,

dd: opening `1063': Remote I/O error
1063
dd: opening `1064': Remote I/O error
1064
dd: opening `1065': Remote I/O error
1065
dd: opening `1066': Remote I/O error
1066
dd: opening `1067': Remote I/O error
1067
dd: opening `1068': Remote I/O error
1068
dd: opening `1069': Remote I/O error
1069
dd: opening `1070': Remote I/O error
1070
dd: opening `1071': Remote I/O error
1071
dd: opening `1072': Remote I/O error
1072
dd: opening `1073': Remote I/O error
1073
dd: opening `1074': Remote I/O error
1074
dd: opening `1075': Remote I/O error
1075
dd: opening `1076': Remote I/O error
1076
dd: opening `1077': Remote I/O error
1077
dd: opening `1078': Remote I/O error
1078
dd: opening `1079': Remote I/O error
1079
dd: opening `1080': Remote I/O error
1080



from server,
nfs.log
[2013-07-17 21:38:54.249407] W [nfs3-helpers.c:3391:nfs3_log_common_res] 0-nfs-nfsv3: XID: e408bb0e, LOOKUP: NFS: 10006(Error occurred on the server or IO Error), POSIX: 14(Bad address)
[2013-07-17 21:38:54.255102] I [dht-layout.c:636:dht_layout_normalize] 0-quota-dist-rep-dht: found anomalies in <gfid:9949b438-8348-4205-81b5-c5bce10927f0>. holes=1 overlaps=0 missing=1 down=0 misc=0
[2013-07-17 21:38:54.255190] W [dht-common.c:213:dht_discover_complete] 0-quota-dist-rep-dht: normalizing failed on <gfid:9949b438-8348-4205-81b5-c5bce10927f0> (overlaps/holes present)
[2013-07-17 21:38:54.255242] E [nfs3-helpers.c:3606:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:9949b438-8348-4205-81b5-c5bce10927f0>: Resource temporarily unavailable
[2013-07-17 21:38:54.255318] E [nfs3.c:1387:nfs3_lookup_resume] 0-nfs-nfsv3: Resource temporarily unavailable: (10.70.37.5:759) quota-dist-rep : 9949b438-8348-4205-81b5-c5bce10927f0
[2013-07-17 21:38:54.255372] W [nfs3-helpers.c:3391:nfs3_log_common_res] 0-nfs-nfsv3: XID: e508bb0e, LOOKUP: NFS: 10006(Error occurred on the server or IO Error), POSIX: 14(Bad address)
[2013-07-17 21:38:54.262039] I [dht-layout.c:636:dht_layout_normalize] 0-quota-dist-rep-dht: found anomalies in <gfid:9949b438-8348-4205-81b5-c5bce10927f0>. holes=1 overlaps=0 missing=1 down=0 misc=0
[2013-07-17 21:38:54.262134] W [dht-common.c:213:dht_discover_complete] 0-quota-dist-rep-dht: normalizing failed on <gfid:9949b438-8348-4205-81b5-c5bce10927f0> (overlaps/holes present)
[2013-07-17 21:38:54.262183] E [nfs3-helpers.c:3606:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:9949b438-8348-4205-81b5-c5bce10927f0>: Resource temporarily unavailable
[2013-07-17 21:38:54.262235] E [nfs3.c:1387:nfs3_lookup_resume] 0-nfs-nfsv3: Resource temporarily unavailable: (10.70.37.5:759) quota-dist-rep : 9949b438-8348-4205-81b5-c5bce10927f0
[2013-07-17 21:38:54.262325] W [nfs3-helpers.c:3391:nfs3_log_common_res] 0-nfs-nfsv3: XID: e608bb0e, LOOKUP: NFS: 10006(Error occurred on the server or IO Error), POSIX: 14(Bad address)
[2013-07-17 21:38:54.269088] I [dht-layout.c:636:dht_layout_normalize] 0-quota-dist-rep-dht: found anomalies in <gfid:9949b438-8348-4205-81b5-c5bce10927f0>. holes=1 overlaps=0 missing=1 down=0 misc=0
[2013-07-17 21:38:54.269175] W [dht-common.c:213:dht_discover_complete] 0-quota-dist-rep-dht: normalizing failed on <gfid:9949b438-8348-4205-81b5-c5bce10927f0> (overlaps/holes present)
[2013-07-17 21:38:54.269226] E [nfs3-helpers.c:3606:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:9949b438-8348-4205-81b5-c5bce10927f0>: Resource temporarily unavailable
[2013-07-17 21:38:54.269299] E [nfs3.c:1387:nfs3_lookup_resume] 0-nfs-nfsv3: Resource temporarily unavailable: (10.70.37.5:759) quota-dist-rep : 9949b438-8348-4205-81b5-c5bce10927f0
[2013-07-17 21:38:54.269355] W [nfs3-helpers.c:3391:nfs3_log_common_res] 0-nfs-nfsv3: XID: e708bb0e, LOOKUP: NFS: 10006(Error occurred on the server or IO Error), POSIX: 14(Bad address)

quotad.log,

[2013-07-17 21:59:35.490725] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-quota-dist-rep: / - disk layout missing
[2013-07-17 21:59:35.490901] I [dht-common.c:657:dht_revalidate_cbk] 0-quota-dist-rep: mismatching layouts for /
[2013-07-17 21:59:45.496305] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-quota-dist-rep: / - disk layout missing
[2013-07-17 21:59:45.496365] I [dht-common.c:657:dht_revalidate_cbk] 0-quota-dist-rep: mismatching layouts for /
[2013-07-17 21:59:55.500900] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-quota-dist-rep: / - disk layout missing
[2013-07-17 21:59:55.500967] I [dht-common.c:657:dht_revalidate_cbk] 0-quota-dist-rep: mismatching layouts for /
[2013-07-17 22:00:05.506501] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-quota-dist-rep: / - disk layout missing
[2013-07-17 22:00:05.506543] I [dht-common.c:657:dht_revalidate_cbk] 0-quota-dist-rep: mismatching layouts for /
[2013-07-17 22:00:15.510453] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-quota-dist-rep: / - disk layout missing
[2013-07-17 22:00:15.510486] I [dht-common.c:657:dht_revalidate_cbk] 0-quota-dist-rep: mismatching layouts for /
[2013-07-17 22:00:25.515638] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-quota-dist-rep: / - disk layout missing
[2013-07-17 22:00:25.515666] I [dht-common.c:657:dht_revalidate_cbk] 0-quota-dist-rep: mismatching layouts for /
[2013-07-17 22:00:35.520042] I [dht-layout.c:722:dht_layout_dir_mismatch] 0-quota-dist-rep: / - disk layout missing
[2013-07-17 22:00:35.520108] I [dht-common.c:657:dht_revalidate_cbk] 0-quota-dist-rep: mismatching layouts for /


Expected results:
add-brick should happen successfully, even if quota is implemented.

Additional info:

Comment 8 Scott Haines 2013-08-27 14:11:42 UTC
Per 08/27 scrum-of-scrum, not a blocker.

Comment 10 Saurabh 2013-10-22 13:57:23 UTC
executed the same test as mentioned in description section.
didn't find the same issue log messages this time.
verified on glusterfs.3.4.0.35rhs

Comment 12 errata-xmlrpc 2013-11-27 15:29:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html