Bug 1179649

Summary: quota: quotad down and mkdir for few names fails with "EIO"
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Saurabh <saujain>
Component: quotaAssignee: Vijaikumar Mallikarjuna <vmallika>
Status: CLOSED NOTABUG QA Contact: Anil Shah <ashah>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: annair, asriram, mzywusko, rghatvis, rhs-bugs, smohan, spandit, storage-qa-internal, vagarwal, vmallika
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
This is an expected behaviour. As the quotad is down the enforcement won't take place and because of that we get the error on the I/O which is basically an EIO.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 04:15:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
node1 sosreport
none
node2 sosreport
none
node3 sosreport
none
node4 sosreport none

Description Saurabh 2015-01-07 09:30:09 UTC
Description of problem:
 I have a 4 node cluster of RHS nodes. The volume I created had quota on, with quota on it brings up a daemon called "quotad" on all nodes in consideration.
So for this test, I killed quotad on all nodes but one.
 Then from mount-point I tried to create some directories, some directory creation was successful and some not.

Version-Release number of selected component (if applicable):
glusterfs-3.6.0.41-1.el6rhs.x86_64

How reproducible:
for particularly present set of names  of directories and scenarios, it happens always

Steps to Reproduce:
1. create a volume of 6x2 type.
2. enable quota on the volume
3. set limit on the volume
4. mount the volume 
5. create a directory inside the mount-point, in this case /run17948
5a. set some limit on the above created directory.
6. now kill quotad on three nodes.
7. try to create directories in the above created directory.

Actual results:
step 7 fails for directories, 
[root@rhsauto015 run17948]# mkdir dir4
mkdir: cannot create directory `dir4': Input/output error
[root@rhsauto015 run17948]# mkdir dir5
mkdir: cannot create directory `dir5': Input/output error
[root@rhsauto015 run17948]# 

whereas it passes for names,
[root@rhsauto015 run17948]# mkdir dir7
[root@rhsauto015 run17948]# mkdir dir6


Expected results:
mkdir should work for all names, irrespective of quotad up or down on any node(s)

Additional info:
[root@nfs1 ~]# gluster volume info
 
Volume Name: vol0
Type: Distributed-Replicate
Volume ID: d7f65230-efac-409f-9495-9479f986a27c
Status: Started
Snap Volume: no
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.74:/rhs/brick1/d1r1
Brick2: 10.70.37.89:/rhs/brick1/d1r2
Brick3: 10.70.37.91:/rhs/brick1/d2r1
Brick4: 10.70.37.133:/rhs/brick1/d2r2
Brick5: 10.70.37.74:/rhs/brick1/d3r1
Brick6: 10.70.37.89:/rhs/brick1/d3r2
Brick7: 10.70.37.91:/rhs/brick1/d4r1
Brick8: 10.70.37.133:/rhs/brick1/d4r2
Brick9: 10.70.37.74:/rhs/brick1/d5r1
Brick10: 10.70.37.89:/rhs/brick1/d5r2
Brick11: 10.70.37.91:/rhs/brick1/d6r1
Brick12: 10.70.37.133:/rhs/brick1/d6r2
Brick13: 10.70.37.74:/rhs/brick1/d1r1-add
Brick14: 10.70.37.89:/rhs/brick1/d1r2-add
Options Reconfigured:
features.default-soft-limit: 90%
performance.readdir-ahead: on
features.quota: on
features.quota-deem-statfs: on
features.uss: off
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable


[root@nfs1 ~]# gluster volume quota vol0 list
                  Path                   Hard-limit Soft-limit   Used  Available  Soft-limit exceeded? Hard-limit exceeded?
---------------------------------------------------------------------------------------------------------------------------
/                                        240.5GB       90%     172.0GB  68.5GB              No                   No
/run16646                                200.0GB       90%     100.0GB 100.0GB              No                   No
/run16646/dir                            100.0GB       90%     100.0GB  0Bytes             Yes                  Yes
/run16646/dir/dir1                       200.0GB       90%     100.0GB 100.0GB              No                   No
/dir2                                     10.0GB       90%      12.0GB  0Bytes             Yes                  Yes
/run17948/dir2/dir3/dir4/di5/dir6/dir7/dir8/dir9  40.0GB       90%      40.0GB  0Bytes             Yes                  Yes
/run17948                                 70.0GB       80%      60.0GB  10.0GB             Yes                   No
/run17948/dir3                            10.0GB       80%      10.0GB  15.1MB             Yes                   No
/run17948/dir6                            10.0GB       90%      10.0GB   4.2MB             Yes                   No

Comment 1 Saurabh 2015-01-07 09:33:57 UTC
nfs.log, at the time of mkdir of "dir5"


[2015-01-07 09:32:20.366567] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-vol0-client-11: remote operation failed: Transport endpoint is not connected. Path: <gfid:1300faa9-207b-48ea-969e-61ddec0592d8>/dir5
[2015-01-07 09:32:20.366775] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-vol0-client-10: remote operation failed: Transport endpoint is not connected. Path: <gfid:1300faa9-207b-48ea-969e-61ddec0592d8>/dir5
[2015-01-07 09:32:20.366819] W [nfs3.c:2722:nfs3svc_mkdir_cbk] 0-nfs: b2020bff: <gfid:1300faa9-207b-48ea-969e-61ddec0592d8>/dir5 => -1 (Transport endpoint is not connected)
[2015-01-07 09:32:20.366858] W [nfs3-helpers.c:3470:nfs3_log_newfh_res] 0-nfs-nfsv3: XID: b2020bff, MKDIR: NFS: 5(I/O error), POSIX: 107(Transport endpoint is not connected), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000

Comment 2 Saurabh 2015-01-07 09:40:58 UTC
Created attachment 977174 [details]
node1 sosreport

Comment 3 Saurabh 2015-01-07 09:42:20 UTC
Created attachment 977175 [details]
node2 sosreport

Comment 4 Saurabh 2015-01-07 09:44:08 UTC
Created attachment 977176 [details]
node3 sosreport

Comment 5 Saurabh 2015-01-07 09:45:21 UTC
Created attachment 977177 [details]
node4 sosreport

Comment 7 Vijaikumar Mallikarjuna 2015-03-02 08:29:40 UTC
Quota enforcer communicates to quotad to fetch the current usage.

When a quota limit is exceeded, we should not allow creating any new files/directories.

Without quotad, enforcer cannot find if the quota limit is exceeded. 
So it is an expected behavior that you get 'Transport endpoint is not connected' when quotad is down

Comment 8 Vivek Agarwal 2015-04-20 09:22:33 UTC
Primarily a documentation effort

Comment 11 Vijaikumar Mallikarjuna 2015-07-22 04:15:08 UTC
Quota enforcer will not work if quotad is down, this is a expected behavior.
So closing it as Not a Bug