Bug 985907 - quota + nfs: EIO with "CREATE" when gluster processes on two nodes are killed(kill -9)
quota + nfs: EIO with "CREATE" when gluster processes on two nodes are killed...
Status: CLOSED NOTABUG
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
2.1
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: krishnan parthasarathi
Sudhir D
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-18 09:24 EDT by Saurabh
Modified: 2016-01-19 01:15 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-12 06:04:12 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Saurabh 2013-07-18 09:24:39 EDT
Description of problem:
Kill process on two nodes of the gluster and finding that the data creation over the nfs mount throws EIO.

volume and dirs have quota limits set.

Volume Name: quota-dist-rep
Type: Distributed-Replicate
Volume ID: c2c503b9-19cf-44ef-b468-c4f02e3b35c7
Status: Started
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.180:/rhs/bricks/quota-d1r1
Brick2: 10.70.37.80:/rhs/bricks/quota-d1r2
Brick3: 10.70.37.216:/rhs/bricks/quota-d2r1
Brick4: 10.70.37.139:/rhs/bricks/quota-d2r2
Brick5: 10.70.37.180:/rhs/bricks/quota-d3r1
Brick6: 10.70.37.80:/rhs/bricks/quota-d3r2
Brick7: 10.70.37.216:/rhs/bricks/quota-d4r1
Brick8: 10.70.37.139:/rhs/bricks/quota-d4r2
Brick9: 10.70.37.180:/rhs/bricks/quota-d5r1
Brick10: 10.70.37.80:/rhs/bricks/quota-d5r2
Brick11: 10.70.37.216:/rhs/bricks/quota-d6r1
Brick12: 10.70.37.139:/rhs/bricks/quota-d6r2
Brick13: 10.70.37.180:/rhs/bricks/quota-d1r1-add
Brick14: 10.70.37.80:/rhs/bricks/quota-d1r2-add
Options Reconfigured:
features.quota: on


Version-Release number of selected component (if applicable):
[root@nfs1 ~]# rpm -qa | grep glusterfs
glusterfs-3.4.0.12rhs.beta4-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.12rhs.beta4-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta4-1.el6rhs.x86_64
[root@nfs1 ~]# 


How reproducible:
executed this test first on this build.

Steps to Reproduce:
a. 4 nodes [node1, node2, node3, node4]
b. 1 client [client1]
c. nfs mount on client
d. 6x2 volume
e. quota enabled and limit of 30GB set on volume
f. 10 directories
g. all the directories have limit set of 2GB each
f. started creating data inside the dirs.
h. while I/O going on killall glusterfs/glusterfsd/glusterd processes on node3 and node4.

Actual results:
several times the EIO seen on mount point.

[root@nfs1 ~]# gluster volume quota quota-dist-rep list
                  Path                   Hard-limit Soft-limit   Used  Available
--------------------------------------------------------------------------------
/                                           30GB       90%       6.5GB  23.5GB
/dir1                                        2GB       90%       1.2GB 846.0MB
/dir2                                        2GB       90%       1.2GB 870.1MB
/dir3                                        2GB       90%       1.0GB 989.0MB
/dir4                                        2GB       90%      0Bytes   2.0GB
/dir5                                        2GB       90%      0Bytes   2.0GB
/dir6                                        2GB       90%      0Bytes   2.0GB
/dir7                                        2GB       90%      0Bytes   2.0GB
/dir8                                        2GB       90%      0Bytes   2.0GB
/dir9                                        2GB       90%      0Bytes   2.0GB
/dir10                                       2GB       90%      0Bytes   2.0GB
[root@nfs1 ~]# 


nfs.log,
[2013-07-18 02:18:50.052164] E [fd.c:536:fd_unref] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta4/xlator/cluster/distribute.so(dht_create_cbk+0x1b6) [0x7f2d2673b636] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta4/xlator/debug/io-stats.so(io_stats_create_cbk+0x260) [0x7f2d26303580] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta4/xlator/nfs/server.so(nfs_fop_create_cbk+0x99) [0x7f2d260ba0b9]))) 0-fd: fd is NULL
[2013-07-18 02:18:50.052191] W [nfs3.c:2354:nfs3svc_create_cbk] 0-nfs: 42201608: /dir3/1374128794 => -1 (Transport endpoint is not connected)
[2013-07-18 02:18:50.052311] W [nfs3-helpers.c:3460:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 42201608, CREATE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000
[2013-07-18 02:18:50.063879] I [afr-common.c:3914:afr_local_init] 0-quota-dist-rep-replicate-1: no subvolumes up
[2013-07-18 02:18:50.064038] E [fd.c:536:fd_unref] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta4/xlator/cluster/distribute.so(dht_create_cbk+0x1b6) [0x7f2d2673b636] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta4/xlator/debug/io-stats.so(io_stats_create_cbk+0x260) [0x7f2d26303580] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta4/xlator/nfs/server.so(nfs_fop_create_cbk+0x99) [0x7f2d260ba0b9]))) 0-fd: fd is NULL
[2013-07-18 02:18:50.064066] W [nfs3.c:2354:nfs3svc_create_cbk] 0-nfs: 44201608: /dir3/1374128794 => -1 (Transport endpoint is not connected)
[2013-07-18 02:18:50.064141] W [nfs3-helpers.c:3460:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 44201608, CREATE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000
[2013-07-18 02:18:50.078849] I [afr-common.c:3914:afr_local_init] 0-quota-dist-rep-replicate-1: no subvolumes up
[2013-07-18 02:18:50.079009] E [fd.c:536:fd_unref] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta4/xlator/cluster/distribute.so(dht_create_cbk+0x1b6) [0x7f2d2673b636] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta4/xlator/debug/io-stats.so(io_stats_create_cbk+0x260) [0x7f2d26303580] (-->/usr/lib64/glusterfs/3.4.0.12rhs.beta4/xlator/nfs/server.so(nfs_fop_create_cbk+0x99) [0x7f2d260ba0b9]))) 0-fd: fd is NULL
[2013-07-18 02:18:50.079036] W [nfs3.c:2354:nfs3svc_create_cbk] 0-nfs: 46201608: /dir3/1374128794 => -1 (Transport endpoint is not connected)
[2013-07-18 02:18:50.079112] W [nfs3-helpers.c:3460:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: 46201608, CREATE: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000


Expected results:

the data creation should not get affected.


Additional info:
Comment 3 vpshastry 2013-09-12 06:04:12 EDT
When both the bricks of AFR is down, the data can't be created at the hashed brick. In the above case 3 pair of replica bricks reside in the nfs3 and nfs4.  Its expected to throw EIO, as bricks in both nfs3 and nfs4 are killed. It is observed even without the quota/nfs.

Note You need to log in before you can comment on or make changes to this bug.