Bug 808977

Summary: I/O errors on the mount point after rebalancing a distributed-replicate with one child down
Product: [Community] GlusterFS Reporter: shylesh <shmohan>
Component: coreAssignee: shishir gowda <sgowda>
Status: CLOSED CURRENTRELEASE QA Contact: shylesh <shmohan>
Severity: urgent Docs Contact:
Priority: high    
Version: pre-releaseCC: gluster-bugs, nsathyan, shwetha.h.panduranga
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 18:00:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: 3.3.0qa42 Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    
Attachments:
Description Flags
rebalancing dist-rep none

Description shylesh 2012-04-02 05:14:41 UTC
Created attachment 574405 [details]
rebalancing dist-rep

Description of problem:
Brought down one of the child while rebalance is happening, again bringing up, after rebalance finishes I/O errors on the mount point along with the crash in rebalance

Version-Release number of selected component (if applicable):
3.3.0qa32

How reproducible:


Steps to Reproduce:
1.create a 2x2 distribute-replicate volume 
2.start creating files on the mount point, in mycase i created 5000 files of 50MB each
3.Now add 2 more bricks now the volume is 3x2 dist-rep
4. Initiate the rebalance 
5. while rebalance is happening bring down one the brick from of any pair (in my    case first child of a pair)
6. After some time bring back the brick by volume start force
7. Let the rebalance finish(status complete), then try I/O on the mount point

Actual results:
I/O errors on the mount point 

Expected results:


Additional info:

Attached the logs.

Comment 1 shylesh 2012-04-02 05:21:16 UTC
After if i try to remount the volume mount fails saying file types differs on subvolumes.  



[2012-04-01 11:05:02.239511] I [afr-common.c:1866:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-0: a
dded root inode
[2012-04-01 11:05:02.239707] E [afr-common.c:1115:afr_lookup_update_lk_counts] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup_cbk+0x6f1) [0x7effee03afd3] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6973d) [0x7effeddf273d]))) 0-: Assertion failed: xattr
[2012-04-01 11:05:02.239749] W [dict.c:458:dict_ref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6975d) [0x7effeddf275d] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x69496) [0x7effeddf2496]))) 0-dict: dict is NULL
[2012-04-01 11:05:02.239764] W [afr-common.c:1400:afr_conflicting_iattrs] 0-dist-rep-replicate-0: /: filetype differs on subvolumes (0, 1)
[2012-04-01 11:05:02.244120] E [afr-common.c:1115:afr_lookup_update_lk_counts] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup_cbk+0x6f1) [0x7effee03afd3] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6973d) [0x7effeddf273d]))) 0-: Assertion failed: xattr
[2012-04-01 11:05:02.244162] W [dict.c:458:dict_ref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6975d) [0x7effeddf275d] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x69496) [0x7effeddf2496]))) 0-dict: dict is NULL
[2012-04-01 11:05:02.244175] W [afr-common.c:1400:afr_conflicting_iattrs] 0-dist-rep-replicate-0: /: filetype differs on subvolumes (0, 1)
[2012-04-01 11:05:02.244245] W [fuse-bridge.c:490:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Input/output error)
[2012-04-01 11:05:02.252143] I [fuse-bridge.c:3980:fuse_thread_proc] 0-fuse: unmounting /mnt

Comment 2 Anand Avati 2012-05-04 04:29:28 UTC
CHANGE: http://review.gluster.com/3263 (glusterd/rebalance: Switch off afr self heal in rebalance process.) merged in master by Vijay Bellur (vijay)

Comment 3 shishir gowda 2012-05-08 04:36:50 UTC
*** Bug 810103 has been marked as a duplicate of this bug. ***

Comment 4 shylesh 2012-05-24 11:08:09 UTC
No I/O error will be seen on the mount point