Description of problem: ====================== As part of the validation of fix 1224180 - Getting EIO instead of EDQUOTE when limit execeeds in disperse volume I tested the case when i bring down bricks. When I brought down two bricks in a dist-disperse volume, I could still see the EIO error for some files 3072000000 bytes (3.1 GB) copied, 425.922 s, 7.2 MB/s dd: error writing ‘file103.4’: Input/output error dd: closing output file ‘file103.4’: Input/output error dd: failed to open ‘file103.5’: Input/output error dd: error writing ‘file103.6’: Disk quota exceeded dd: closing output file ‘file103.6’: Input/output error dd: failed to open ‘file103.7’: Input/output error dd: failed to open ‘file103.8’: Disk quota exceeded dd: failed to open ‘file103.9’: Disk quota exceeded dd: failed to open ‘file103.10’: Disk quota exceeded [root@dhcp35-103 103]# mount|grep disperse As the steps mentioned in 1224180 was working well with "Disk quota exceeded" error msg, I moved that bug to verified as discussed with dev. Raising this bug to track for brick down scenarios Version-Release number of selected component (if applicable): ========================================================== glusterfs-cli-3.7.9-6.el7rhgs.x86_64 glusterfs-libs-3.7.9-6.el7rhgs.x86_64 glusterfs-fuse-3.7.9-6.el7rhgs.x86_64 glusterfs-client-xlators-3.7.9-6.el7rhgs.x86_64 glusterfs-server-3.7.9-6.el7rhgs.x86_64 python-gluster-3.7.9-5.el7rhgs.noarch glusterfs-3.7.9-6.el7rhgs.x86_64 glusterfs-api-3.7.9-6.el7rhgs.x86_64 Steps to Reproduce: TC#2: should get EDQUOTE error instead of EIO when limit execeeds in disperse volume when bricks are down -->FAIL 1. Create a dist-disperse volume 2x(4+2) 2. mount the volume on say two clients 3. Now create a dir dir1 and dir2 for the respective clients and start creating files in a loop of say 1gb each (from each of the mounts) 4. Now enable quota No errors or IO issues should be seen 5. Now set the quota limit of say 10GB to the dir dir1 and say 5GB to dir2 Once the Quota limits are reached, user must see "Disk quota exceeded" instead of the previous wrong error of "Input/output error" 6. Now bring down a couple of bricks Once the Quota limits are reached, user must see "Disk quota exceeded" instead of the previous wrong error of "Input/output error" --->STEP Fails as we see input/output errors sometimes and sometimes "disk quota" Now extend the quota limit to 100GB for the dir1 the IO must continue as the quota is not hit 7. Now reduce quota back to say 15GB Once the Quota limits are reached, user must see "Disk quota exceeded" instead of the previous wrong error of "Input/output error"--->STEP Fails as we see input/output errors sometimes and sometimes "disk quota" Expected results: ============== should get disk-quota error instead of file io error sos reports will be attached volinfo: Volume Name: disperse Type: Distributed-Disperse Volume ID: f8d9157e-0d75-4b38-b8a3-d87d11e99e24 Status: Started Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: 10.70.35.191:/rhs/brick1/disperse Brick2: 10.70.35.27:/rhs/brick1/disperse Brick3: 10.70.35.98:/rhs/brick1/disperse Brick4: 10.70.35.64:/rhs/brick1/disperse Brick5: 10.70.35.44:/rhs/brick1/disperse Brick6: 10.70.35.114:/rhs/brick1/disperse Brick7: 10.70.35.191:/rhs/brick2/disperse Brick8: 10.70.35.27:/rhs/brick2/disperse Brick9: 10.70.35.98:/rhs/brick2/disperse Brick10: 10.70.35.64:/rhs/brick2/disperse Brick11: 10.70.35.44:/rhs/brick2/disperse Brick12: 10.70.35.114:/rhs/brick2/disperse Options Reconfigured: performance.readdir-ahead: on [root@dhcp35-191 ~]# [root@dhcp35-191 ~]# [root@dhcp35-191 ~]# gluster v quota disperse enable volume quota : success [root@dhcp35-191 ~]# gluster v quota Usage: volume quota <VOLNAME> {enable|disable|list [<path> ...]| list-objects [<path> ...] | remove <path>| remove-objects <path> | default-soft-limit <percent>} | volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} | volume quota <VOLNAME> {limit-objects <path> <number> [<percent>]} | volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>} [root@dhcp35-191 ~]# gluster v quota disperse /root 2GB Invalid quota option : /root Usage: volume quota <VOLNAME> {enable|disable|list [<path> ...]| list-objects [<path> ...] | remove <path>| remove-objects <path> | default-soft-limit <percent>} | volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} | volume quota <VOLNAME> {limit-objects <path> <number> [<percent>]} | volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>} [root@dhcp35-191 ~]# gluster v quota disperse limit-usage /root 2G Please enter an integer value in the range of (1 - 9223372036854775807) Usage: volume quota <VOLNAME> {enable|disable|list [<path> ...]| list-objects [<path> ...] | remove <path>| remove-objects <path> | default-soft-limit <percent>} | volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} | volume quota <VOLNAME> {limit-objects <path> <number> [<percent>]} | volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>} [root@dhcp35-191 ~]# gluster v quota disperse limit-usage /root 2GB volume quota : success [root@dhcp35-191 ~]# gluster v quota disperse limit-usage /root 20GB volume quota : success [root@dhcp35-191 ~]# gluster v quota disperse limit-usage /root 2GB volume quota : success [root@dhcp35-191 ~]# gluster v quota disperse limit-usage / 20GB volume quota : success [root@dhcp35-191 ~]# gluster v info Volume Name: consmerg Type: Replicate Volume ID: aa4e04d2-591d-4905-ad8b-7abcbc34ac37 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.70.35.98:/rhs/brick1/consmerg Brick2: 10.70.35.64:/rhs/brick1/consmerg Options Reconfigured: cluster.entry-self-heal: off cluster.data-self-heal: off cluster.metadata-self-heal: off cluster.self-heal-daemon: on performance.readdir-ahead: on Volume Name: disperse Type: Distributed-Disperse Volume ID: f8d9157e-0d75-4b38-b8a3-d87d11e99e24 Status: Started Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: 10.70.35.191:/rhs/brick1/disperse Brick2: 10.70.35.27:/rhs/brick1/disperse Brick3: 10.70.35.98:/rhs/brick1/disperse Brick4: 10.70.35.64:/rhs/brick1/disperse Brick5: 10.70.35.44:/rhs/brick1/disperse Brick6: 10.70.35.114:/rhs/brick1/disperse Brick7: 10.70.35.191:/rhs/brick2/disperse Brick8: 10.70.35.27:/rhs/brick2/disperse Brick9: 10.70.35.98:/rhs/brick2/disperse Brick10: 10.70.35.64:/rhs/brick2/disperse Brick11: 10.70.35.44:/rhs/brick2/disperse Brick12: 10.70.35.114:/rhs/brick2/disperse Options Reconfigured: features.quota-deem-statfs: on features.inode-quota: on features.quota: on performance.readdir-ahead: on [root@dhcp35-191 ~]# gluster v quota Usage: volume quota <VOLNAME> {enable|disable|list [<path> ...]| list-objects [<path> ...] | remove <path>| remove-objects <path> | default-soft-limit <percent>} | volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} | volume quota <VOLNAME> {limit-objects <path> <number> [<percent>]} | volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>} [root@dhcp35-191 ~]# gluster v quota disperse list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------------------------------------------------------------- /root 2.0GB 80%(1.6GB) 4.3GB 0Bytes Yes Yes / 20.0GB 80%(16.0GB) 18.1GB 1.9GB Yes No [root@dhcp35-191 ~]# gluster v quota disperse list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------------------------------------------------------------- /root 2.0GB 80%(1.6GB) 4.3GB 0Bytes Yes Yes / 20.0GB 80%(16.0GB) 18.4GB 1.6GB Yes No [root@dhcp35-191 ~]# gluster v quota disperse list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------------------------------------------------------------- /root 2.0GB 80%(1.6GB) 4.3GB 0Bytes Yes Yes / 20.0GB 80%(16.0GB) 19.6GB 374.7MB Yes No
Tried recreating the issue with following configuration and steps. Couldn't recreate the issue. [root@varada ~]# glusterd -V glusterfs 3.7.9 built on Jan 27 2017 14:58:18 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. [root@varada ~]# 1. Created a EC volume [ (4 + 2) = 6 ] and mounted it on 4 different mount points - same node. gluster volume create ec-1 disperse-data 4 redundancy 2 varada:/LAB/store/ec-{1..6} force 2. Created files as shown below on all the 4 mount points: for i in {1..50}; do dd if=/dev/urandom of=/LAB/fuse_mounts/<mount-point>/dir1/file_<mount-point>-$i bs=1024 count=100000& done 3. Enabled Quota. gluster volume quota ec-1 enable gluster v quota ec-1 soft-timeout 0 gluster v quota ec-1 hard-timeout 0 4. set the limit to 5MB gluster volume quota ec-1 limit-usage /dir1 5mb 5. While write is going on killed 2 bricks. 6. When the hard limit was hit no error related to input/output was observed.
As per the Comment 19 of BZ1224180, we would need a transaction infrastructure to fix this issue. Pranith, I think it is good to wait till the infrastructure is implemented. Any suggestion?
(In reply to Sunil Kumar Acharya from comment #4) > As per the Comment 19 of BZ1224180, we would need a transaction > infrastructure to fix this issue. > > Pranith, > > I think it is good to wait till the infrastructure is implemented. Any > suggestion? Agreed.
*** Bug 1339167 has been marked as a duplicate of this bug. ***
Due to : https://bugzilla.redhat.com/show_bug.cgi?id=1224180#c19 We won't be fixing this issue until the required infrastructure is in place.