Bug 1339167

Summary: Getting EIO error for the first few files when limit exceeds in disperse volume when we do writes from multiple clients
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: disperseAssignee: Sanoj Unnikrishnan <sunnikri>
Status: CLOSED DUPLICATE QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: medium    
Version: rhgs-3.1CC: aspandey, jahernan, rgowdapp, rhs-bugs
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-09 09:10:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nag Pavan Chilakam 2016-05-24 09:58:12 UTC
Description of problem:
========================
As part of the validation of fix 1224180 - Getting EIO instead of EDQUOTE when limit execeeds in disperse volume
I tested the case from multiple clients and found that the issue exists.
I see the i/o error on first few files once quota exceeds.
Raising a new bug, as the bug 1224180 was to address the issue as in disperse volume even from a single client, the IO was failing with io error instead of quota exceeds.

Discussed with dev, who mentioned that the 1224180 was a quota issue while this is more of a ec issue with write sizes.





Version-Release number of selected component (if applicable):
==================================
glusterfs-cli-3.7.9-6.el7rhgs.x86_64
glusterfs-libs-3.7.9-6.el7rhgs.x86_64
glusterfs-fuse-3.7.9-6.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-6.el7rhgs.x86_64
glusterfs-server-3.7.9-6.el7rhgs.x86_64
python-gluster-3.7.9-5.el7rhgs.noarch
glusterfs-3.7.9-6.el7rhgs.x86_64
glusterfs-api-3.7.9-6.el7rhgs.x86_64

How reproducible:
==========
always

Steps to Reproduce:
=======================
1. Create a disperse volume 2x(4+2)

    2. mount the volume on two clients

    3. Now create a dir dir/dir1 and dir/dir2 (one for each client) and start creating files in a loop of say 1gb each

    4. Now enable quota 

    No errors or IO issues should be seen

    5. Now set the quota limit of say 10GB to dir

    Once the Quota limits are reached, user must see "Disk quota exceeded" instead of the previous wrong error of "Input/output error"
--->this step fails
    6. Now extend the quota limit to 100GB for the dir1

    the IO must continue as the quota is not hit

    7. Now reduce quota back to say 15GB
--->this step fails
    Once the Quota limits are  reached, user must see "Disk quota exceeded" instead of the previous  wrong error of "Input/output error"

client1:
[root@dhcp35-103 103]# for i in {1..30};do dd if=/dev/urandom of=../126/f103.$i bs=1024 count=100000;done
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 15.6082 s, 6.6 MB/s
dd: error writing ‘../126/f103.2’: Input/output error
dd: closing output file ‘../126/f103.2’: Input/output error
dd: failed to open ‘../126/f103.3’: Disk quota exceeded
dd: failed to open ‘../126/f103.4’: Disk quota exceeded
dd: failed to open ‘../126/f103.5’: Disk quota exceeded
dd: failed to open ‘../126/f103.6’: Disk quota exceeded
dd: failed to open ‘../126/f103.7’: Disk quota exceeded
dd: failed to open ‘../126/f103.8’: Disk quota exceeded
dd: failed to open ‘../126/f103.9’: Disk quota exceeded
dd: failed to open ‘../126/f103.10’: Disk quota exceeded
dd: failed to open ‘../126/f103.11’: Disk quota exceeded
dd: failed to open ‘../126/f103.12’: Disk quota exceeded


client2:
root@dhcp35-126 126]# for i in {1..30};do dd if=/dev/urandom of=nff.$i bs=1024 count=1000000;done
dd: error writing ‘nff.1’: Input/output error
dd: closing output file ‘nff.1’: Input/output error
dd: failed to open ‘nff.2’: Disk quota exceeded
dd: failed to open ‘nff.3’: Disk quota exceeded
dd: error writing ‘nff.4’: Input/output error
dd: closing output file ‘nff.4’: Input/output error
dd: failed to open ‘nff.5’: Disk quota exceeded
dd: failed to open ‘nff.6’: Disk quota exceeded
dd: failed to open ‘nff.7’: Disk quota exceeded
dd: failed to open ‘nff.8’: Disk quota exceeded
dd: failed to open ‘nff.9’: Disk quota exceeded
dd: failed to open ‘nff.10’: Disk quota exceeded
dd: failed to open ‘nff.11’: Disk quota exceeded
dd: failed to open ‘nff.12’: Disk quota exceeded
dd: failed to open ‘nff.13’: Disk quota exceeded
dd: failed to open ‘nff.14’: Disk quota exceeded
dd: failed to open ‘nff.15’: Disk quota exceeded
dd: failed to open ‘nff.16’: Disk quota exceeded
dd: failed to open ‘nff.17’: Disk quota exceeded
dd: failed to open ‘nff.18’: Disk quota exceeded
dd: failed to open ‘nff.19’: Disk quota exceeded
dd: failed to open ‘nff.20’: Disk quota exceeded
dd: failed to open ‘nff.21’: Disk quota exceeded
dd: failed to open ‘nff.22’: Disk quota exceeded


Related bug 1339144 - Getting EIO error when limit exceeds in disperse volume when bricks are down

Comment 2 Sanoj Unnikrishnan 2016-09-16 08:38:11 UTC
The quota tracking mechanism at the bricks may have slight differences in tracking quota which manifests in EDQUOT detection. If it so happens that in a 4+2 EC upon wind 3 success and 3 EDQUOT return values are obtained, The resultant state leads to an EIO and cannot be resolved with current infrastructure. 

Bringing the brick down as in The bug 1339144, makes the scenario easily reproducible.

More details can be found at
 https://bugzilla.redhat.com/show_bug.cgi?id=1224180#c18
 https://bugzilla.redhat.com/show_bug.cgi?id=1224180#c19

Comment 5 Ashish Pandey 2018-02-09 09:10:22 UTC

*** This bug has been marked as a duplicate of bug 1339144 ***