Bug 1344277 - [disperse] mkdir after re balance give Input/Output Error
Summary: [disperse] mkdir after re balance give Input/Output Error
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ashish Pandey
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1344278 1344594 1344595
TreeView+ depends on / blocked
 
Reported: 2016-06-09 10:19 UTC by Ashish Pandey
Modified: 2017-03-08 08:31 UTC (History)
3 users (show)

Fixed In Version: 3.10.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1344278 1344594 1344595 (view as bug list)
Environment:
Last Closed: 2017-03-08 08:31:27 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Ashish Pandey 2016-06-09 10:19:50 UTC
Description of problem:

For EC volume and nfs mount, creation of directory is giving EIO error.   


Version-Release number of selected component (if applicable):
glusterfs 3.7.9 built on Jun  7 2016 12:00:32
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


How reproducible:
100%

Steps to Reproduce:
1. Create a EC volume, Mount it through nfs on 2-3 differnet mount points.
2. Start IO. Create lots file in a for loop. 
3. When IO's are going on, do add-brick and re balance.
4 - stop IO's and try to do mkdir on same point - It gives EIO.
 
Actual results:

mkdir giving EIO

Expected results:

mkdir should be successful 

Additional info:

Comment 1 Ashish Pandey 2016-06-09 10:37:27 UTC
RCA - 
In case of mkdir failure, dht expects error information so that it can act accordingly. 
In this case, after re balance, layout has been changed. mkdir with old layout  returns EIO. EC gets this error in cbk->xdata but does not pass it back to dht. In this case dht will not be able to take corrective action.

Comment 2 Vijay Bellur 2016-06-09 12:42:17 UTC
REVIEW: http://review.gluster.org/14679 (cluster/ec: Pass xdata to dht in case of error) posted (#1) for review on master by Ashish Pandey (aspandey)

Comment 3 Vijay Bellur 2016-06-09 15:44:50 UTC
REVIEW: http://review.gluster.org/14679 (cluster/ec: Pass xdata to dht in case of error) posted (#2) for review on master by Ashish Pandey (aspandey)

Comment 4 Vijay Bellur 2016-06-10 09:14:42 UTC
COMMIT: http://review.gluster.org/14679 committed in master by Xavier Hernandez (xhernandez) 
------
commit a837357c5c7873bf19155e76bf6c251fa799a605
Author: Ashish Pandey <aspandey>
Date:   Thu Jun 9 16:19:37 2016 +0530

    cluster/ec: Pass xdata to dht in case of error
    
    Problem: In case of mkdir failure, dht expects
    error information so that it can act accordingly.
    Aftre adding bricks and re balance, layout gets
    changed. Fop "mkdir" with old layout returns EIO.
    EC gets this error in xdata but does not pass it
    back to dht. In this case dht will not be able to
    take corrective action.
    
    Solution: Return xdata back to dht
    
    Change-Id: I24def8038e6880607689b7b046dc6428f564c6ab
    BUG: 1344277
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/14679
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Reviewed-by: Xavier Hernandez <xhernandez>
    Tested-by: Atin Mukherjee <amukherj>
    Smoke: Gluster Build System <jenkins.com>
    CentOS-regression: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>

Comment 5 Niels de Vos 2016-06-10 14:49:56 UTC
Please reply (in Gerrit and here) to the comment in http://review.gluster.org/14690:

  xlators/cluster/ec/src/ec-dir-write.c
  Line 594:
  
  Other operations(fsync,writev....) have similar problems, modify together?

If this indeed is an issue, send a follow-up patch to address it there too.

Comment 6 Ashish Pandey 2016-06-10 15:47:33 UTC
we shall send a separate patch that addresses rest of the fops to reduce the amount of testing that we need to do to qualify this patch.


Note You need to log in before you can comment on or make changes to this bug.