1272404 – Data Tiering:error "[2015-10-14 18:15:09.270483] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-1: Failed to update version and size [Input/output error]"

Bug 1272404 - Data Tiering:error "[2015-10-14 18:15:09.270483] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-1: Failed to update version and size [Input/output error]"

Summary: Data Tiering:error "[2015-10-14 18:15:09.270483] E [MSGID: 122037] [ec-common...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	3.7.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1272407 1274629
TreeView+	depends on / blocked

Reported:	2015-10-16 10:37 UTC by Nag Pavan Chilakam
Modified:	2018-11-30 05:43 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Clones:	1272407 1274629 (view as bug list)
Environment:
Last Closed:	2016-05-17 12:36:16 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2015-10-16 10:37:01 UTC

Description of problem:
=========================
On the longevity/stress setup we are getting the error message 
[2015-10-14 18:15:09.270412] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-1: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:09.270483] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-1: Failed to update version and size [Input/output e
rror]
[2015-10-14 18:15:10.873293] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-0: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:10.873470] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-0: Failed to update version and size [Input/output e
rror]
[2015-10-14 18:15:10.875723] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-0: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:10.875742] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-0: Failed to update version and size [Input/output e
rror]
[2015-10-14 18:15:10.876542] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-0: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:10.876567] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-0: Failed to update version and size [Input/output e
rror]
[2015-10-14 18:15:10.882593] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-1: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:10.882645] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-1: Failed to update version and size [Input/output e
rror]
[2015-10-14 18:15:10.885247] E [MSGID: 122034] [ec-common.c:439:ec_child_select] 0-tiervolume-disperse-1: Insufficient available childs for this request (have 1, need 6
)
[2015-10-14 18:15:10.885293] E [MSGID: 122037] [ec-common.c:1502:ec_update_size_version_done] 0-tiervolume-disperse-1: Failed to update version and size [Input/output e
rror]





Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-0.19.git0f5c3e8.el7.centos.x86_64



Steps Carried:
==============

1. Created 12 node cluster
2. Create tiered volume with Hot tier as (6 x 2) and Cold tier as (2 x (6 + 2) = 16)
3. Fuse Mount the volume on 3 clients RHEL7.2,RHEl7.1 and RHEL6.7
4. Start creating data from each client:

Client 1:
=========
[root@dj ~]# crefi --multi -n 10 -b 10 -d 10 --max=1024k --min=5k --random -T 5 -t text -I 5 --fop=create /mnt/fuse/

Client 2:
=========
[root@mia ~]# cd /mnt/fuse/
[root@mia fuse]# for i in {1..10}; do cp -rf /etc etc.$i ; sleep 100 ; done

Client 3:
=========
[root@wingo fuse]# for i in {1..999}; do dd if=/dev/zero of=dd.$i bs=1M count=1 ; sleep 10 ; done

5. After a while, the data creation of client 1 and client 2 should be completed while the data creation from client 3 will still be inprogress

6. At this point the data creation will be of only 1 file from client 3 in every 10 sec.

Comment 2 Vijay Bellur 2015-10-28 10:32:38 UTC

REVIEW: http://review.gluster.org/12440 (cluster/ec: update version and size on good bricks) posted (#1) for review on release-3.7 by Ashish Pandey (aspandey)

Comment 3 Vijay Bellur 2015-11-02 05:51:05 UTC

COMMIT: http://review.gluster.org/12440 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 9a0e3a7ecc61e47a0780708f86efc0170b8a85db
Author: Ashish Pandey <aspandey>
Date:   Fri Oct 23 13:27:51 2015 +0530

    cluster/ec: update version and size on good bricks
    
    Problem: readdir/readdirp fops calls [f]xattrop with
    fop->good which contain only one brick for these operations.
    That causes xattrop to be failed as it requires at least
    "minimum" number of brick.
    
    Solution: Use lock->good_mask to call xattrop. lock->good_mask
    contain all the good locked bricks on which the previous write
    opearion was successfull.
    
    Change-Id: If1b500391aa6fca6bd863702e030957b694ab499
    BUG: 1272404
    Signed-off-by: Ashish Pandey <aspandey>
    Reviewed-on: http://review.gluster.org/12419
    Tested-by: NetBSD Build System <jenkins.org>
    Reviewed-by: Xavier Hernandez <xhernandez>
    Tested-by: Xavier Hernandez <xhernandez>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Reviewed-on: http://review.gluster.org/12440
    Tested-by: Gluster Build System <jenkins.com>

Comment 4 Mike McCune 2016-03-28 22:22:31 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Note You need to log in before you can comment on or make changes to this bug.