Bug 1339144

Summary: Getting EIO error when limit exceeds in disperse volume when bricks are down
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: disperseAssignee: Sunil Kumar Acharya <sheggodu>
Status: CLOSED WONTFIX QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: medium Docs Contact:
Priority: low    
Version: rhgs-3.1CC: amukherj, aspandey, pkarampu, rhs-bugs
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-13 11:34:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nag Pavan Chilakam 2016-05-24 08:54:27 UTC
Description of problem:
======================
As part of the validation of fix 1224180 - Getting EIO instead of EDQUOTE when limit execeeds in disperse volume
I tested the case when i bring down bricks.
When I brought down two bricks in a dist-disperse volume, I could still see the EIO error for some files
3072000000 bytes (3.1 GB) copied, 425.922 s, 7.2 MB/s
dd: error writing ‘file103.4’: Input/output error
dd: closing output file ‘file103.4’: Input/output error
dd: failed to open ‘file103.5’: Input/output error
dd: error writing ‘file103.6’: Disk quota exceeded
dd: closing output file ‘file103.6’: Input/output error
dd: failed to open ‘file103.7’: Input/output error
dd: failed to open ‘file103.8’: Disk quota exceeded
dd: failed to open ‘file103.9’: Disk quota exceeded
dd: failed to open ‘file103.10’: Disk quota exceeded
[root@dhcp35-103 103]# mount|grep disperse


As the steps mentioned in 1224180 was working well with "Disk quota exceeded" error msg, I moved that bug to verified as discussed with dev.
Raising this bug to track for brick down scenarios


Version-Release number of selected component (if applicable):
==========================================================
glusterfs-cli-3.7.9-6.el7rhgs.x86_64
glusterfs-libs-3.7.9-6.el7rhgs.x86_64
glusterfs-fuse-3.7.9-6.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-6.el7rhgs.x86_64
glusterfs-server-3.7.9-6.el7rhgs.x86_64
python-gluster-3.7.9-5.el7rhgs.noarch
glusterfs-3.7.9-6.el7rhgs.x86_64
glusterfs-api-3.7.9-6.el7rhgs.x86_64




Steps to Reproduce:
    TC#2: should get  EDQUOTE error instead of EIO when limit execeeds in disperse volume when bricks are down -->FAIL

    1. Create a dist-disperse volume 2x(4+2)

    2. mount the volume on say two clients

    3. Now create a dir dir1 and dir2 for the respective clients and start creating files in a loop of say 1gb each (from each of the mounts)

    4. Now enable quota 

    No errors or IO issues should be seen

    5. Now set the quota limit of say 10GB to the dir dir1 and say 5GB to dir2

    Once  the Quota limits are reached, user must see "Disk quota exceeded"  instead of the previous wrong error of "Input/output error"

    6.  Now bring down a couple of bricks

    Once  the Quota limits are  reached, user must see "Disk quota exceeded"  instead of the previous  wrong error of "Input/output error" --->STEP Fails as we see input/output errors sometimes and sometimes "disk quota" 

    Now extend the quota limit to 100GB for the dir1

    the IO must continue as the quota is not hit

    7. Now reduce quota back to say 15GB

    Once  the Quota limits are  reached, user must see "Disk quota exceeded"  instead of the previous  wrong error of "Input/output error"--->STEP Fails as we see input/output errors sometimes and sometimes "disk quota" 


Expected results:
==============
should get disk-quota error instead of file io error


sos reports will be attached




volinfo:
Volume Name: disperse
Type: Distributed-Disperse
Volume ID: f8d9157e-0d75-4b38-b8a3-d87d11e99e24
Status: Started
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.35.191:/rhs/brick1/disperse
Brick2: 10.70.35.27:/rhs/brick1/disperse
Brick3: 10.70.35.98:/rhs/brick1/disperse
Brick4: 10.70.35.64:/rhs/brick1/disperse
Brick5: 10.70.35.44:/rhs/brick1/disperse
Brick6: 10.70.35.114:/rhs/brick1/disperse
Brick7: 10.70.35.191:/rhs/brick2/disperse
Brick8: 10.70.35.27:/rhs/brick2/disperse
Brick9: 10.70.35.98:/rhs/brick2/disperse
Brick10: 10.70.35.64:/rhs/brick2/disperse
Brick11: 10.70.35.44:/rhs/brick2/disperse
Brick12: 10.70.35.114:/rhs/brick2/disperse
Options Reconfigured:
performance.readdir-ahead: on
[root@dhcp35-191 ~]# 
[root@dhcp35-191 ~]# 
[root@dhcp35-191 ~]# gluster v quota disperse enable
volume quota : success
[root@dhcp35-191 ~]# gluster v quota
Usage: volume quota <VOLNAME> {enable|disable|list [<path> ...]| list-objects [<path> ...] | remove <path>| remove-objects <path> | default-soft-limit <percent>} |
volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} |
volume quota <VOLNAME> {limit-objects <path> <number> [<percent>]} |
volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>}
[root@dhcp35-191 ~]# gluster v quota disperse /root 2GB
Invalid quota option : /root
Usage: volume quota <VOLNAME> {enable|disable|list [<path> ...]| list-objects [<path> ...] | remove <path>| remove-objects <path> | default-soft-limit <percent>} |
volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} |
volume quota <VOLNAME> {limit-objects <path> <number> [<percent>]} |
volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>}
[root@dhcp35-191 ~]# gluster v quota disperse limit-usage /root 2G
Please enter an integer value in the range of (1 - 9223372036854775807)
Usage: volume quota <VOLNAME> {enable|disable|list [<path> ...]| list-objects [<path> ...] | remove <path>| remove-objects <path> | default-soft-limit <percent>} |
volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} |
volume quota <VOLNAME> {limit-objects <path> <number> [<percent>]} |
volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>}
[root@dhcp35-191 ~]# gluster v quota disperse limit-usage /root 2GB
volume quota : success
[root@dhcp35-191 ~]# gluster v quota disperse limit-usage /root 20GB
volume quota : success
[root@dhcp35-191 ~]# gluster v quota disperse limit-usage /root 2GB
volume quota : success
[root@dhcp35-191 ~]# gluster v quota disperse limit-usage / 20GB
volume quota : success
[root@dhcp35-191 ~]# gluster v info
 
Volume Name: consmerg
Type: Replicate
Volume ID: aa4e04d2-591d-4905-ad8b-7abcbc34ac37
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.35.98:/rhs/brick1/consmerg
Brick2: 10.70.35.64:/rhs/brick1/consmerg
Options Reconfigured:
cluster.entry-self-heal: off
cluster.data-self-heal: off
cluster.metadata-self-heal: off
cluster.self-heal-daemon: on
performance.readdir-ahead: on
 
Volume Name: disperse
Type: Distributed-Disperse
Volume ID: f8d9157e-0d75-4b38-b8a3-d87d11e99e24
Status: Started
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.35.191:/rhs/brick1/disperse
Brick2: 10.70.35.27:/rhs/brick1/disperse
Brick3: 10.70.35.98:/rhs/brick1/disperse
Brick4: 10.70.35.64:/rhs/brick1/disperse
Brick5: 10.70.35.44:/rhs/brick1/disperse
Brick6: 10.70.35.114:/rhs/brick1/disperse
Brick7: 10.70.35.191:/rhs/brick2/disperse
Brick8: 10.70.35.27:/rhs/brick2/disperse
Brick9: 10.70.35.98:/rhs/brick2/disperse
Brick10: 10.70.35.64:/rhs/brick2/disperse
Brick11: 10.70.35.44:/rhs/brick2/disperse
Brick12: 10.70.35.114:/rhs/brick2/disperse
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
[root@dhcp35-191 ~]# gluster v quota
Usage: volume quota <VOLNAME> {enable|disable|list [<path> ...]| list-objects [<path> ...] | remove <path>| remove-objects <path> | default-soft-limit <percent>} |
volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} |
volume quota <VOLNAME> {limit-objects <path> <number> [<percent>]} |
volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>}
[root@dhcp35-191 ~]# gluster v quota disperse list
                  Path                   Hard-limit  Soft-limit      Used  Available  Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/root                                      2.0GB     80%(1.6GB)    4.3GB  0Bytes             Yes                  Yes
/                                         20.0GB     80%(16.0GB)   18.1GB   1.9GB             Yes                   No
[root@dhcp35-191 ~]# gluster v quota disperse list
                  Path                   Hard-limit  Soft-limit      Used  Available  Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/root                                      2.0GB     80%(1.6GB)    4.3GB  0Bytes             Yes                  Yes
/                                         20.0GB     80%(16.0GB)   18.4GB   1.6GB             Yes                   No
[root@dhcp35-191 ~]# gluster v quota disperse list
                  Path                   Hard-limit  Soft-limit      Used  Available  Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/root                                      2.0GB     80%(1.6GB)    4.3GB  0Bytes             Yes                  Yes
/                                         20.0GB     80%(16.0GB)   19.6GB 374.7MB             Yes                   No

Comment 3 Sunil Kumar Acharya 2017-01-30 14:11:50 UTC
Tried recreating the issue with following configuration and steps. Couldn't recreate the issue.

[root@varada ~]# glusterd -V
glusterfs 3.7.9 built on Jan 27 2017 14:58:18
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root@varada ~]#

1. Created a EC volume [ (4 + 2) = 6 ] and mounted it on 4 different mount points - same node.

gluster volume create ec-1 disperse-data 4 redundancy 2 varada:/LAB/store/ec-{1..6} force

2. Created files as shown below on all the 4 mount points:

for i in {1..50}; do dd if=/dev/urandom of=/LAB/fuse_mounts/<mount-point>/dir1/file_<mount-point>-$i bs=1024 count=100000& done

3. Enabled Quota.

gluster volume quota ec-1 enable

gluster v quota ec-1 soft-timeout 0
gluster v quota ec-1 hard-timeout 0

4. set the limit to 5MB

gluster volume quota ec-1 limit-usage /dir1 5mb

5. While write is going on killed 2 bricks.

6. When the hard limit was hit no error related to input/output was observed.

Comment 4 Sunil Kumar Acharya 2017-01-31 12:42:10 UTC
As per the Comment 19 of BZ1224180, we would need a transaction infrastructure to fix this issue.

Pranith,

I think it is good to wait till the infrastructure is implemented. Any suggestion?

Comment 5 Pranith Kumar K 2017-02-01 08:32:31 UTC
(In reply to Sunil Kumar Acharya from comment #4)
> As per the Comment 19 of BZ1224180, we would need a transaction
> infrastructure to fix this issue.
> 
> Pranith,
> 
> I think it is good to wait till the infrastructure is implemented. Any
> suggestion?

Agreed.

Comment 6 Ashish Pandey 2018-02-09 09:10:22 UTC
*** Bug 1339167 has been marked as a duplicate of this bug. ***

Comment 8 Sunil Kumar Acharya 2018-02-13 11:34:37 UTC
Due to : https://bugzilla.redhat.com/show_bug.cgi?id=1224180#c19

We won't be fixing this issue until the required infrastructure is in place.