Bug 808977 - I/O errors on the mount point after rebalancing a distributed-replicate with one child down
I/O errors on the mount point after rebalancing a distributed-replicate with ...
Product: GlusterFS
Classification: Community
Component: core (Show other bugs)
x86_64 Linux
high Severity urgent
: ---
: ---
Assigned To: shishir gowda
: 810103 (view as bug list)
Depends On:
Blocks: 817967
  Show dependency treegraph
Reported: 2012-04-02 01:14 EDT by shylesh
Modified: 2015-12-01 11:45 EST (History)
3 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-07-24 14:00:24 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions: 3.3.0qa42
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
rebalancing dist-rep (766.61 KB, application/x-gzip)
2012-04-02 01:14 EDT, shylesh
no flags Details

  None (edit)
Description shylesh 2012-04-02 01:14:41 EDT
Created attachment 574405 [details]
rebalancing dist-rep

Description of problem:
Brought down one of the child while rebalance is happening, again bringing up, after rebalance finishes I/O errors on the mount point along with the crash in rebalance

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.create a 2x2 distribute-replicate volume 
2.start creating files on the mount point, in mycase i created 5000 files of 50MB each
3.Now add 2 more bricks now the volume is 3x2 dist-rep
4. Initiate the rebalance 
5. while rebalance is happening bring down one the brick from of any pair (in my    case first child of a pair)
6. After some time bring back the brick by volume start force
7. Let the rebalance finish(status complete), then try I/O on the mount point

Actual results:
I/O errors on the mount point 

Expected results:

Additional info:

Attached the logs.
Comment 1 shylesh 2012-04-02 01:21:16 EDT
After if i try to remount the volume mount fails saying file types differs on subvolumes.  

[2012-04-01 11:05:02.239511] I [afr-common.c:1866:afr_set_root_inode_on_first_lookup] 0-dist-rep-replicate-0: a
dded root inode
[2012-04-01 11:05:02.239707] E [afr-common.c:1115:afr_lookup_update_lk_counts] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup_cbk+0x6f1) [0x7effee03afd3] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6973d) [0x7effeddf273d]))) 0-: Assertion failed: xattr
[2012-04-01 11:05:02.239749] W [dict.c:458:dict_ref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6975d) [0x7effeddf275d] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x69496) [0x7effeddf2496]))) 0-dict: dict is NULL
[2012-04-01 11:05:02.239764] W [afr-common.c:1400:afr_conflicting_iattrs] 0-dist-rep-replicate-0: /: filetype differs on subvolumes (0, 1)
[2012-04-01 11:05:02.244120] E [afr-common.c:1115:afr_lookup_update_lk_counts] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/protocol/client.so(client3_1_lookup_cbk+0x6f1) [0x7effee03afd3] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6973d) [0x7effeddf273d]))) 0-: Assertion failed: xattr
[2012-04-01 11:05:02.244162] W [dict.c:458:dict_ref] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(afr_lookup_cbk+0xb5) [0x7effeddf284a] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x6975d) [0x7effeddf275d] (-->/usr/local/lib/glusterfs/3.3.0qa32/xlator/cluster/replicate.so(+0x69496) [0x7effeddf2496]))) 0-dict: dict is NULL
[2012-04-01 11:05:02.244175] W [afr-common.c:1400:afr_conflicting_iattrs] 0-dist-rep-replicate-0: /: filetype differs on subvolumes (0, 1)
[2012-04-01 11:05:02.244245] W [fuse-bridge.c:490:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Input/output error)
[2012-04-01 11:05:02.252143] I [fuse-bridge.c:3980:fuse_thread_proc] 0-fuse: unmounting /mnt
Comment 2 Anand Avati 2012-05-04 00:29:28 EDT
CHANGE: http://review.gluster.com/3263 (glusterd/rebalance: Switch off afr self heal in rebalance process.) merged in master by Vijay Bellur (vijay@gluster.com)
Comment 3 shishir gowda 2012-05-08 00:36:50 EDT
*** Bug 810103 has been marked as a duplicate of this bug. ***
Comment 4 shylesh 2012-05-24 07:08:09 EDT
No I/O error will be seen on the mount point

Note You need to log in before you can comment on or make changes to this bug.