+++ This bug was initially created as a clone of Bug #1208079 +++ Description of problem: ======================= After the quota limit is execeeded instead of "disk quota exceeded" message, it throws out Input/Output error. Can delete the file even. [root@dhcp37-61 fuse1]# dd if=/dev/urandom of=testfile1 bs=128k count=10240 dd: writing `testfile1': Input/output error dd: closing output file `testfile1': Input/output error [root@dhcp37-61 fuse1]# rm -f testfile1 rm: cannot remove `testfile1': Input/output error [root@dhcp37-61 fuse1]# Version-Release number of selected component (if applicable): ============================================================= [root@dhcp37-164 ~]# gluster --version glusterfs 3.7dev built on Apr 1 2015 01:04:00 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@dhcp37-164 ~]# How reproducible: ================= 100% Steps to Reproduce: 1. Create a disperse volume 1x(4+2) 2. set the quota limit to 1GB 3. Create a file from mount exceeding 1GB Actual results: =============== Input / Output error Expected results: ================= Disk quota exceeded should be seen Additional info: ================ Sosreports will be attached. --- Additional comment from Bhaskarakiran on 2015-04-01 06:56:23 EDT --- --- Additional comment from Bhaskarakiran on 2015-04-01 06:59:36 EDT --- --- Additional comment from Bhaskarakiran on 2015-04-01 07:55:12 EDT ---
Please review and sign off to include in Known Issues chapter.
Looks good to me Anjana.
Below Upstream patch fixes the issue http://review.gluster.org/#/c/13438/
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
https://code.engineering.redhat.com/gerrit/73672
QATP: TC#1: should get EDQUOTE error instead of EIO when limit execeeds in disperse volume 1. Create a disperse volume 1x(4+2) 2. mount the volume 3. Now create a dir dir1 and start creating files in a loop of say 1gb each 4. Now enable quota No errors or IO issues should be seen 5. Now set the quota limit of say 10GB to the dir dir1 Once the Quota limits are reached, user must see "Disk quota exceeded" instead of the previous wrong error of "Input/output error" 6. Now extend the quota limit to 100GB for the dir1 the IO must continue as the quota is not hit 7. Now reduce quota back to say 15GB Once the Quota limits are reached, user must see "Disk quota exceeded" instead of the previous wrong error of "Input/output error" TC#2: should get EDQUOTE error instead of EIO when limit execeeds in disperse volume when bricks are down 1. Create a dist-disperse volume 2x(4+2) 2. mount the volume on say two clients 3. Now create a dir dir1 and dir2 for the respective clients and start creating files in a loop of say 1gb each (from each of the mounts) 4. Now enable quota No errors or IO issues should be seen 5. Now set the quota limit of say 10GB to the dir dir1 and say 5GB to dir2 Once the Quota limits are reached, user must see "Disk quota exceeded" instead of the previous wrong error of "Input/output error" 6. Now bring down a couple of bricks Once the Quota limits are reached, user must see "Disk quota exceeded" instead of the previous wrong error of "Input/output error" Now extend the quota limit to 100GB for the dir1 the IO must continue as the quota is not hit 7. Now reduce quota back to say 15GB Once the Quota limits are reached, user must see "Disk quota exceeded" instead of the previous wrong error of "Input/output error"
QA Validation: ============= TC#1 --->passed TC#2 ---->failed at step 6 and 7 " --->STEP Fails as we see input/output errors sometimes and sometimes "disk quota" " 3072000000 bytes (3.1 GB) copied, 425.922 s, 7.2 MB/s dd: error writing ‘file103.4’: Input/output error dd: closing output file ‘file103.4’: Input/output error dd: failed to open ‘file103.5’: Input/output error dd: error writing ‘file103.6’: Disk quota exceeded dd: closing output file ‘file103.6’: Input/output error dd: failed to open ‘file103.7’: Input/output error dd: failed to open ‘file103.8’: Disk quota exceeded dd: failed to open ‘file103.9’: Disk quota exceeded dd: failed to open ‘file103.10’: Disk quota exceeded [root@dhcp35-103 103]# mount|grep disperse As TC#1 passes (all happy scenario) and the bug was raised for the same steps as TC#1 hence moving to verified. However,Raising a new bug for TC#2 [root@dhcp35-191 ~]# rpm -qa|grep gluster glusterfs-cli-3.7.9-6.el7rhgs.x86_64 glusterfs-libs-3.7.9-6.el7rhgs.x86_64 glusterfs-fuse-3.7.9-6.el7rhgs.x86_64 glusterfs-client-xlators-3.7.9-6.el7rhgs.x86_64 glusterfs-server-3.7.9-6.el7rhgs.x86_64 python-gluster-3.7.9-5.el7rhgs.noarch glusterfs-3.7.9-6.el7rhgs.x86_64 glusterfs-api-3.7.9-6.el7rhgs.x86_64
raised a bug for failure of testcase#2 1339144 - Getting EIO error when limit exceeds in disperse volume when bricks are down
on multiple clients parallel IO, I see the issue raised a bug 1339167 - Getting EIO error for the first few files when limit exceeds in disperse volume when we do writes from multiple clients Also, I tested the bug on nfs and it worked well(the tc#1) mkdir: cannot create directory ‘dir1’: Disk quota exceeded [root@dhcp35-103 126]# [root@dhcp35-103 126]# [root@dhcp35-103 126]# for i in {1..10};do dd if=/dev/urandom of=cool.$i bs=1024 count=50000;done 50000+0 records in 50000+0 records out 51200000 bytes (51 MB) copied, 4.51085 s, 11.4 MB/s 50000+0 records in 50000+0 records out 51200000 bytes (51 MB) copied, 4.48226 s, 11.4 MB/s dd: closing output file ‘cool.3’: Disk quota exceeded dd: failed to open ‘cool.4’: Disk quota exceeded dd: failed to open ‘cool.5’: Disk quota exceeded dd: failed to open ‘cool.6’: Disk quota exceeded dd: failed to open ‘cool.7’: Disk quota exceeded dd: failed to open ‘cool.8’: Disk quota exceeded dd: failed to open ‘cool.9’: Disk quota exceeded dd: failed to open ‘cool.10’: Disk quota exceeded
Laura, Doc text is fine. regards, Raghavendra
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240
I have been able to repro it with just single client doing IO and without bringing down bricks. Repro steps follow. gluster volume create v_disp disperse 6 redundancy 2 $tm1:/export/sdb/br1 $tm2:/export/sdb/b2 $tm3:/export/sdb/br3 $tm1:/export/sdb/b4 $tm2:/export/sdb/b5 $tm3:/export/sdb/b6 force #(Used only 3 nodes, should not matter here) gluster volume start v_disp mount -t glusterfs $tm1:v_disp /gluster_vols/v_disp mkdir /gluster_vols/v_disp/dir1 dd if=/dev/zero of=/gluster_vols/v_disp/dir1/x bs=10k count=90000 & gluster v quota v_disp enable gluster v quota v_disp limit-usage /dir1 200MB gluster v quota v_disp soft-timeout 0 gluster v quota v_disp hard-timeout 0 # echo remove 2 bricks, Not needed # https://bugzilla.redhat.com/show_bug.cgi?id=1339167 Hence, BZ1339167 is likely a duplicate of this.
I think this bug doesn't have a reliable solution right now. It might be mitigated, but I think it's impossible to solve it completely without some sort of transaction infrastructure that allows us to do a rollback of a write. To put an easy example: suppose we have a dispersed volume 4+2: Success cases: * 4 or more bricks succeed. The other fail with EDQUOT. The result of the operation for upper xlators will be a success. However self-heal won't be able to heal the damaged files because there's not enough space (not absolutely sure about that). * 4 or more bricks fail with EDQUOT. The result of the operation will be a failure with error EDQUOT. The bricks that have succeeded will be repaired (put back to the old version) by self-heal. Failure cases: * 3 bricks success and 3 fail with EDQUOT. This is an inconsistent state. There are not enough bricks to recover the new nor the old version, so the result of the operation is an I/O error. There's no way for disperse to recover the damaged file. With a rollback feature, the operation could be completed by rolling back the bricks that succeeded and return EDQOUT. But currently this is not possible.
We are tracking these changes as part of https://bugzilla.redhat.com/show_bug.cgi?id=1339167, essentially same discussion happened some months back between Nag and I. We failed to capture this as bz comment which caused this confusion. Sorry about that. Sanoj, Is it okay to capture this as 1339167 itself? Pranith
Definitely, both are the same. However, the bug is easier to reproduce by bringing 2 bricks down. So we could track it here and close 1339167 as a duplicate.
Please discuss with Nag and come to a conclusion about which one to keep.
This BZ had been CLOSED with resolution ERRATA as part of a release. Refer to Comment 17 https://access.redhat.com/errata/RHBA-2016:1240 Please open a new BZ for regression of the original issue or any related issue