Bug 808977 - I/O errors on the mount point after rebalancing a distributed-replicate with one child down
Summary: I/O errors on the mount point after rebalancing a distributed-replicate with ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: pre-release
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
Assignee: shishir gowda
QA Contact: shylesh
URL:
Whiteboard:
: 810103 (view as bug list)
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-04-02 05:14 UTC by shylesh
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 18:00:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 3.3.0qa42
Embargoed:


Attachments (Terms of Use)
rebalancing dist-rep (766.61 KB, application/x-gzip)
2012-04-02 05:14 UTC, shylesh
no flags Details

Description shylesh 2012-04-02 05:14:41 UTC
Created attachment 574405 [details]
rebalancing dist-rep

Description of problem:
Brought down one of the child while rebalance is happening, again bringing up, after rebalance finishes I/O errors on the mount point along with the crash in rebalance

Version-Release number of selected component (if applicable):
3.3.0qa32

How reproducible:


Steps to Reproduce:
1.create a 2x2 distribute-replicate volume 
2.start creating files on the mount point, in mycase i created 5000 files of 50MB each
3.Now add 2 more bricks now the volume is 3x2 dist-rep
4. Initiate the rebalance 
5. while rebalance is happening bring down one the brick from of any pair (in my    case first child of a pair)
6. After some time bring back the brick by volume start force
7. Let the rebalance finish(status complete), then try I/O on the mount point

Actual results:
I/O errors on the mount point 

Expected results:


Additional info:

Attached the logs.

Comment 1 shylesh 2012-04-02 05:21:16 UTC
After if i try to remount the volume mount fails saying file types differs on subvolumes.  



[2012-04-01 11:05:02.239511] I [afr-common.c:1866:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-0: a
dded root inode
[2012-04-01 11:05:02.239707] E [afr-common.c:1115:afr_lookup_update_lk_counts] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup_cbk+0x6f1) [0x7effee03afd3] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6973d) [0x7effeddf273d]))) 0-: Assertion failed: xattr
[2012-04-01 11:05:02.239749] W [dict.c:458:dict_ref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6975d) [0x7effeddf275d] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x69496) [0x7effeddf2496]))) 0-dict: dict is NULL
[2012-04-01 11:05:02.239764] W [afr-common.c:1400:afr_conflicting_iattrs] 0-dist-rep-replicate-0: /: filetype differs on subvolumes (0, 1)
[2012-04-01 11:05:02.244120] E [afr-common.c:1115:afr_lookup_update_lk_counts] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup_cbk+0x6f1) [0x7effee03afd3] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6973d) [0x7effeddf273d]))) 0-: Assertion failed: xattr
[2012-04-01 11:05:02.244162] W [dict.c:458:dict_ref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6975d) [0x7effeddf275d] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x69496) [0x7effeddf2496]))) 0-dict: dict is NULL
[2012-04-01 11:05:02.244175] W [afr-common.c:1400:afr_conflicting_iattrs] 0-dist-rep-replicate-0: /: filetype differs on subvolumes (0, 1)
[2012-04-01 11:05:02.244245] W [fuse-bridge.c:490:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Input/output error)
[2012-04-01 11:05:02.252143] I [fuse-bridge.c:3980:fuse_thread_proc] 0-fuse: unmounting /mnt

Comment 2 Anand Avati 2012-05-04 04:29:28 UTC
CHANGE: http://review.gluster.com/3263 (glusterd/rebalance: Switch off afr self heal in rebalance process.) merged in master by Vijay Bellur (vijay)

Comment 3 shishir gowda 2012-05-08 04:36:50 UTC
*** Bug 810103 has been marked as a duplicate of this bug. ***

Comment 4 shylesh 2012-05-24 11:08:09 UTC
No I/O error will be seen on the mount point


Note You need to log in before you can comment on or make changes to this bug.