Bug 1224180 - Getting EIO instead of EDQUOTE when limit execeeds in disperse volume
Summary: Getting EIO instead of EDQUOTE when limit execeeds in disperse volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.1.3
Assignee: Raghavendra G
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On: 1208079 1292020
Blocks: qe_tracker_everglades 1223636 1299184
TreeView+ depends on / blocked
 
Reported: 2015-05-22 10:06 UTC by Bhaskarakiran
Modified: 2018-01-23 12:38 UTC (History)
14 users (show)

Fixed In Version: glusterfs-3.7.9-4
Doc Type: Bug Fix
Doc Text:
When disk quotas were enabled and a write exceeded the disk quota, the failure to write was reported as an input/output error (EIO) instead of a quota exceeded error (EDQUOT). This occurred because writes that failed for reasons related to storage space were converted to EIO, even when space was technically available. This behavior has been corrected so that the excessive bytes are retried, and writes that exceed quota limits are treated correctly as EDQUOT.
Clone Of: 1208079
Environment:
Last Closed: 2016-09-14 06:21:30 UTC
Embargoed:
rgowdapp: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1240 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 Update 3 2016-06-23 08:51:28 UTC

Description Bhaskarakiran 2015-05-22 10:06:32 UTC
+++ This bug was initially created as a clone of Bug #1208079 +++

Description of problem:
=======================
After the quota limit is execeeded instead of "disk quota exceeded" message, it throws out Input/Output error. Can delete the file even.

[root@dhcp37-61 fuse1]# dd if=/dev/urandom of=testfile1 bs=128k count=10240
dd: writing `testfile1': Input/output error
dd: closing output file `testfile1': Input/output error
[root@dhcp37-61 fuse1]# rm -f testfile1 
rm: cannot remove `testfile1': Input/output error
[root@dhcp37-61 fuse1]# 

Version-Release number of selected component (if applicable):
=============================================================
[root@dhcp37-164 ~]# gluster --version
glusterfs 3.7dev built on Apr  1 2015 01:04:00
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@dhcp37-164 ~]# 

How reproducible:
=================
100%

Steps to Reproduce:
1. Create a disperse volume 1x(4+2)
2. set the quota limit to 1GB 
3. Create a file from mount exceeding 1GB

Actual results:
===============
Input / Output error

Expected results:
=================
Disk quota exceeded should be seen

Additional info:
================
Sosreports will be attached.

--- Additional comment from Bhaskarakiran on 2015-04-01 06:56:23 EDT ---



--- Additional comment from Bhaskarakiran on 2015-04-01 06:59:36 EDT ---



--- Additional comment from Bhaskarakiran on 2015-04-01 07:55:12 EDT ---

Comment 2 Anjana Suparna Sriram 2015-07-27 09:48:15 UTC
Please review and sign off to include in Known Issues chapter.

Comment 3 Pranith Kumar K 2015-07-27 09:49:36 UTC
Looks good to me Anjana.

Comment 4 Vijaikumar Mallikarjuna 2016-02-17 11:20:38 UTC
Below Upstream patch fixes the issue
http://review.gluster.org/#/c/13438/

Comment 6 Mike McCune 2016-03-28 23:25:37 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 7 Raghavendra G 2016-05-05 03:36:09 UTC
https://code.engineering.redhat.com/gerrit/73672

Comment 9 Nag Pavan Chilakam 2016-05-24 07:42:47 UTC
    QATP:

    TC#1: should get  EDQUOTE error instead of EIO when limit execeeds in disperse volume 

    1. Create a disperse volume 1x(4+2)

    2. mount the volume

    3. Now create a dir dir1 and start creating files in a loop of say 1gb each

    4. Now enable quota 

    No errors or IO issues should be seen

    5. Now set the quota limit of say 10GB to the dir dir1

    Once the Quota limits are reached, user must see "Disk quota exceeded" instead of the previous wrong error of "Input/output error"

    6. Now extend the quota limit to 100GB for the dir1

    the IO must continue as the quota is not hit

    7. Now reduce quota back to say 15GB

    Once the Quota limits are  reached, user must see "Disk quota exceeded" instead of the previous  wrong error of "Input/output error"


    TC#2: should get  EDQUOTE error instead of EIO when limit execeeds in disperse volume when bricks are down 

    1. Create a dist-disperse volume 2x(4+2)

    2. mount the volume on say two clients

    3. Now create a dir dir1 and dir2 for the respective clients and start creating files in a loop of say 1gb each (from each of the mounts)

    4. Now enable quota 

    No errors or IO issues should be seen

    5. Now set the quota limit of say 10GB to the dir dir1 and say 5GB to dir2

    Once  the Quota limits are reached, user must see "Disk quota exceeded"  instead of the previous wrong error of "Input/output error"

    6.  Now bring down a couple of bricks

    Once  the Quota limits are  reached, user must see "Disk quota exceeded"  instead of the previous  wrong error of "Input/output error" 
    Now extend the quota limit to 100GB for the dir1

    the IO must continue as the quota is not hit

    7. Now reduce quota back to say 15GB

    Once  the Quota limits are  reached, user must see "Disk quota exceeded"  instead of the previous  wrong error of "Input/output error"

Comment 10 Nag Pavan Chilakam 2016-05-24 08:47:40 UTC
QA Validation:
=============
TC#1 --->passed
TC#2 ---->failed
 at step 6 and 7 " --->STEP Fails as we see input/output errors sometimes and sometimes "disk quota" "
3072000000 bytes (3.1 GB) copied, 425.922 s, 7.2 MB/s
dd: error writing ‘file103.4’: Input/output error
dd: closing output file ‘file103.4’: Input/output error
dd: failed to open ‘file103.5’: Input/output error
dd: error writing ‘file103.6’: Disk quota exceeded
dd: closing output file ‘file103.6’: Input/output error
dd: failed to open ‘file103.7’: Input/output error
dd: failed to open ‘file103.8’: Disk quota exceeded
dd: failed to open ‘file103.9’: Disk quota exceeded
dd: failed to open ‘file103.10’: Disk quota exceeded
[root@dhcp35-103 103]# mount|grep disperse

As TC#1 passes (all happy scenario) and the bug was raised for the same steps as TC#1 hence moving to verified.
However,Raising a new bug for TC#2

[root@dhcp35-191 ~]# rpm -qa|grep gluster
glusterfs-cli-3.7.9-6.el7rhgs.x86_64
glusterfs-libs-3.7.9-6.el7rhgs.x86_64
glusterfs-fuse-3.7.9-6.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-6.el7rhgs.x86_64
glusterfs-server-3.7.9-6.el7rhgs.x86_64
python-gluster-3.7.9-5.el7rhgs.noarch
glusterfs-3.7.9-6.el7rhgs.x86_64
glusterfs-api-3.7.9-6.el7rhgs.x86_64

Comment 11 Nag Pavan Chilakam 2016-05-24 08:56:32 UTC
raised a bug for failure of testcase#2 1339144 - Getting EIO error when limit exceeds in disperse volume when bricks are down

Comment 12 Nag Pavan Chilakam 2016-05-24 10:05:38 UTC
on multiple clients parallel IO, I see the issue
raised a bug 1339167 - Getting EIO error for the first few files when limit exceeds in disperse volume when we do writes from multiple clients





Also, I tested the bug on nfs and it worked well(the tc#1)
mkdir: cannot create directory ‘dir1’: Disk quota exceeded
[root@dhcp35-103 126]# 
[root@dhcp35-103 126]# 
[root@dhcp35-103 126]# for i in {1..10};do dd if=/dev/urandom of=cool.$i bs=1024 count=50000;done 
50000+0 records in
50000+0 records out
51200000 bytes (51 MB) copied, 4.51085 s, 11.4 MB/s
50000+0 records in
50000+0 records out
51200000 bytes (51 MB) copied, 4.48226 s, 11.4 MB/s
dd: closing output file ‘cool.3’: Disk quota exceeded
dd: failed to open ‘cool.4’: Disk quota exceeded
dd: failed to open ‘cool.5’: Disk quota exceeded
dd: failed to open ‘cool.6’: Disk quota exceeded
dd: failed to open ‘cool.7’: Disk quota exceeded
dd: failed to open ‘cool.8’: Disk quota exceeded
dd: failed to open ‘cool.9’: Disk quota exceeded
dd: failed to open ‘cool.10’: Disk quota exceeded

Comment 15 Raghavendra G 2016-06-10 04:27:15 UTC
Laura,

Doc text is fine.

regards,
Raghavendra

Comment 17 errata-xmlrpc 2016-06-23 04:53:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Comment 18 Sanoj Unnikrishnan 2016-09-12 12:51:25 UTC
I have been able to repro it with just single client doing IO and without bringing down bricks. Repro steps follow.

gluster volume create v_disp disperse 6 redundancy 2 $tm1:/export/sdb/br1 $tm2:/export/sdb/b2 $tm3:/export/sdb/br3  $tm1:/export/sdb/b4 $tm2:/export/sdb/b5 $tm3:/export/sdb/b6 force
#(Used only 3 nodes, should not matter here)
gluster volume start v_disp
mount -t glusterfs $tm1:v_disp /gluster_vols/v_disp
mkdir /gluster_vols/v_disp/dir1
dd if=/dev/zero of=/gluster_vols/v_disp/dir1/x bs=10k count=90000 &
gluster v quota v_disp enable
gluster v quota v_disp limit-usage /dir1 200MB
gluster v quota v_disp soft-timeout 0
gluster v quota v_disp hard-timeout 0
# echo remove 2 bricks, Not needed
# https://bugzilla.redhat.com/show_bug.cgi?id=1339167

Hence, BZ1339167 is likely a duplicate of this.

Comment 19 Xavi Hernandez 2016-09-13 06:51:08 UTC
I think this bug doesn't have a reliable solution right now. It might be mitigated, but I think it's impossible to solve it completely without some sort of transaction infrastructure that allows us to do a rollback of a write.

To put an easy example: suppose we have a dispersed volume 4+2:

Success cases:

* 4 or more bricks succeed. The other fail with EDQUOT. The result of the operation for upper xlators will be a success. However self-heal won't be able to heal the damaged files because there's not enough space (not absolutely sure about that).

* 4 or more bricks fail with EDQUOT. The result of the operation will be a failure with error EDQUOT. The bricks that have succeeded will be repaired (put back to the old version) by self-heal.

Failure cases:

* 3 bricks success and 3 fail with EDQUOT. This is an inconsistent state. There are not enough bricks to recover the new nor the old version, so the result of the operation is an I/O error. There's no way for disperse to recover the damaged file.

With a rollback feature, the operation could be completed by rolling back the bricks that succeeded and return EDQOUT. But currently this is not possible.

Comment 20 Pranith Kumar K 2016-09-13 19:09:23 UTC
We are tracking these changes as part of https://bugzilla.redhat.com/show_bug.cgi?id=1339167, essentially same discussion happened some months back between Nag and I. We failed to capture this as bz comment which caused this confusion. Sorry about that.

Sanoj,
    Is it okay to capture this as 1339167 itself?

Pranith

Comment 21 Sanoj Unnikrishnan 2016-09-14 03:00:30 UTC
Definitely, both are the same.
However, the bug is easier to reproduce by bringing 2 bricks down. So we could track it here and close 1339167 as a duplicate.

Comment 22 Pranith Kumar K 2016-09-14 06:01:57 UTC
Please discuss with Nag and come to a conclusion about which one to keep.

Comment 24 Rejy M Cyriac 2016-09-14 06:21:30 UTC
This BZ had been CLOSED with resolution ERRATA as part of a release. Refer to Comment 17

https://access.redhat.com/errata/RHBA-2016:1240

Please open a new BZ for regression of the original issue or any related issue


Note You need to log in before you can comment on or make changes to this bug.