Bug 1272404

Summary: Data Tiering:error "[2015-10-14 18:15:09.270483] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-1: Failed to update version and size [Input/output error]"
Product: [Community] GlusterFS Reporter: Nag Pavan Chilakam <nchilaka>
Component: disperseAssignee: bugs <bugs>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.7.5CC: aspandey, bugs, rhinduja, sankarshan, sarumuga
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1272407 1274629 (view as bug list) Environment:
Last Closed: 2016-05-17 12:36:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1272407, 1274629    

Description Nag Pavan Chilakam 2015-10-16 10:37:01 UTC
Description of problem:
=========================
On the longevity/stress setup we are getting the error message 
[2015-10-14 18:15:09.270412] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-1: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:09.270483] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-1: Failed to update version and size [Input/output e
rror]
[2015-10-14 18:15:10.873293] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-0: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:10.873470] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-0: Failed to update version and size [Input/output e
rror]
[2015-10-14 18:15:10.875723] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-0: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:10.875742] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-0: Failed to update version and size [Input/output e
rror]
[2015-10-14 18:15:10.876542] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-0: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:10.876567] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-0: Failed to update version and size [Input/output e
rror]
[2015-10-14 18:15:10.882593] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-1: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:10.882645] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-1: Failed to update version and size [Input/output e
rror]
[2015-10-14 18:15:10.885247] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-1: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:10.885293] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-1: Failed to update version and size [Input/output e
rror]





Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-0.19.git0f5c3e8.el7.centos.x86_64



Steps Carried:
==============

1. Created 12 node cluster
2. Create tiered volume with Hot tier as (6 x 2) and Cold tier as (2 x (6 + 2) = 16)
3. Fuse Mount the volume on 3 clients RHEL7.2,RHEl7.1 and RHEL6.7
4. Start creating data from each client:

Client 1:
=========
[root@dj ~]# crefi --multi -n 10 -b 10 -d 10 --max=1024k --min=5k --random -T 5 -t text -I 5 --fop=create /mnt/fuse/

Client 2:
=========
[root@mia ~]# cd /mnt/fuse/
[root@mia fuse]# for i in {1..10}; do cp -rf /etc etc.$i ; sleep 100 ; done

Client 3:
=========
[root@wingo fuse]# for i in {1..999}; do dd if=/dev/zero of=dd.$i bs=1M count=1 ; sleep 10 ; done

5. After a while, the data creation of client 1 and client 2 should be completed while the data creation from client 3 will still be inprogress

6. At this point the data creation will be of only 1 file from client 3 in every 10 sec.

Comment 2 Vijay Bellur 2015-10-28 10:32:38 UTC
REVIEW: http://review.gluster.org/12440 (cluster/ec: update version and size on good bricks) posted (#1) for review on release-3.7 by Ashish Pandey (aspandey)

Comment 3 Vijay Bellur 2015-11-02 05:51:05 UTC
COMMIT: http://review.gluster.org/12440 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 9a0e3a7ecc61e47a0780708f86efc0170b8a85db
Author: Ashish Pandey <aspandey>
Date:   Fri Oct 23 13:27:51 2015 +0530

    cluster/ec: update version and size on good bricks
    
    Problem: readdir/readdirp fops calls [f]xattrop with
    fop->good which contain only one brick for these operations.
    That causes xattrop to be failed as it requires at least
    "minimum" number of brick.
    
    Solution: Use lock->good_mask to call xattrop. lock->good_mask
    contain all the good locked bricks on which the previous write
    opearion was successfull.
    
    Change-Id: If1b500391aa6fca6bd863702e030957b694ab499
    BUG: 1272404
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/12419
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Xavier Hernandez <xhernandez>
    Tested-by: Xavier Hernandez <xhernandez>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Reviewed-on: http://review.gluster.org/12440
    Tested-by: Gluster Build System <jenkins.com>

Comment 4 Mike McCune 2016-03-28 22:22:31 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions