Bug 823404

Summary: I/O fails on the mount point while remove brick migrates data and committed
Product: [Community] GlusterFS Reporter: shylesh <shmohan>
Component: coreAssignee: shishir gowda <sgowda>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: pre-releaseCC: gluster-bugs, nsathyan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 854647 (view as bug list) Environment:
Last Closed: 2012-12-26 10:28:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 854647    
Attachments:
Description Flags
mnt logs : volume name is "test"
none
rebalance logs none

Description shylesh 2012-05-21 07:34:41 UTC
Description of problem:

Untarring the kernel on the mount point and remove-brick start then commit of a brick makes I/O fail

Version-Release number of selected component (if applicable):
3.3.0qa42

How reproducible:


Steps to Reproduce:
1. create a 2 brick distribute volume (say b1 & b2)
2. Start kernel untar on the mount point
3. While I/O is happening remove one of the brick with start option so the data gets migrated
4. After status says completed commit the remove-brick operation

Actual results:
Untaring fails 

Expected results:


Additional info:

Comment 1 shylesh 2012-05-21 07:35:54 UTC
Created attachment 585754 [details]
mnt logs : volume name is "test"

Comment 2 shylesh 2012-05-21 08:59:08 UTC
Created attachment 585767 [details]
rebalance logs

Comment 3 shishir gowda 2012-05-22 05:00:00 UTC
After remove-brick commit, the connection to the only available peer is reset, and it looks like it takes sometime to come up again.

Log msg:

[2012-05-22 10:24:04.989658] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-05-22 10:24:04.998531] W [socket.c:195:__socket_rwv] 0-new-client-0: readv failed (Connection reset by peer)
[2012-05-22 10:24:04.998617] W [socket.c:1512:__socket_proto_state_machine] 0-new-client-0: reading from socket failed. Error (Connection reset by pee
r), peer (127.0.0.1:24010)
[2012-05-22 10:24:04.998972] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7f0e0b4a3517] (-->/usr/
local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f0e0b4a2a8c] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f0e0b4a251
a]))) 0-new-client-0: forced unwinding frame type(GlusterFS 3.1) op(FLUSH(15)) called at 2012-05-22 10:24:04.996732 (xid=0x249573x)
[2012-05-22 10:24:04.999016] W [client3_1-fops.c:881:client3_1_flush_cbk] 0-new-client-0: remote operation failed: Transport endpoint is not connected
[2012-05-22 10:24:04.999504] I [socket.c:2315:socket_submit_request] 0-new-client-0: not connected (priv->connected = 0)
[2012-05-22 10:24:04.999537] W [rpc-clnt.c:1498:rpc_clnt_submit] 0-new-client-0: failed to submit rpc-request (XID: 0x249575x Program: GlusterFS 3.1, 
ProgVers: 330, Proc: 41) to rpc-transport (new-client-0)
[2012-05-22 10:24:04.999621] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7f0e0b4a3517] (-->/usr/
local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f0e0b4a2a8c] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f0e0b4a251
a]))) 0-new-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2012-05-22 10:24:04.996978 (xid=0x249574x)
[2012-05-22 10:24:04.999653] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-new-client-0: remote operation failed: Transport endpoint is not connected
[2012-05-22 10:24:04.999713] I [client.c:2090:client_rpc_notify] 0-new-client-0: disconnected
[2012-05-22 10:24:04.999747] E [socket.c:1715:socket_connect_finish] 0-new-client-0: connection to 127.0.0.1:24010 failed (Connection refused)
[2012-05-22 10:24:05.000211] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-new-client-0: remote operation failed: Transport endpoint is not connect
ed. Path: /linux-2.6.31.1/drivers/infiniband/hw/ipath/ipath_iba7220.c (00000000-0000-0000-0000-000000000000)

Comment 4 shishir gowda 2012-07-11 03:51:31 UTC
Can you please check if the issue still exists on the latest git repo?

Comment 5 shishir gowda 2012-12-26 10:28:22 UTC
This works fine with 3.4.0qa5. Please re-open if found otherwise.