Bug 823404 - I/O fails on the mount point while remove brick migrates data and committed
I/O fails on the mount point while remove brick migrates data and committed
Status: CLOSED WORKSFORME
Product: GlusterFS
Classification: Community
Component: core (Show other bugs)
pre-release
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: shishir gowda
: Triaged
Depends On:
Blocks: 854647
  Show dependency treegraph
 
Reported: 2012-05-21 03:34 EDT by shylesh
Modified: 2013-12-08 20:32 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 854647 (view as bug list)
Environment:
Last Closed: 2012-12-26 05:28:22 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
mnt logs : volume name is "test" (3.79 MB, application/x-gzip)
2012-05-21 03:35 EDT, shylesh
no flags Details
rebalance logs (36.71 KB, application/x-gzip)
2012-05-21 04:59 EDT, shylesh
no flags Details

  None (edit)
Description shylesh 2012-05-21 03:34:41 EDT
Description of problem:

Untarring the kernel on the mount point and remove-brick start then commit of a brick makes I/O fail

Version-Release number of selected component (if applicable):
3.3.0qa42

How reproducible:


Steps to Reproduce:
1. create a 2 brick distribute volume (say b1 & b2)
2. Start kernel untar on the mount point
3. While I/O is happening remove one of the brick with start option so the data gets migrated
4. After status says completed commit the remove-brick operation

Actual results:
Untaring fails 

Expected results:


Additional info:
Comment 1 shylesh 2012-05-21 03:35:54 EDT
Created attachment 585754 [details]
mnt logs : volume name is "test"
Comment 2 shylesh 2012-05-21 04:59:08 EDT
Created attachment 585767 [details]
rebalance logs
Comment 3 shishir gowda 2012-05-22 01:00:00 EDT
After remove-brick commit, the connection to the only available peer is reset, and it looks like it takes sometime to come up again.

Log msg:

[2012-05-22 10:24:04.989658] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-05-22 10:24:04.998531] W [socket.c:195:__socket_rwv] 0-new-client-0: readv failed (Connection reset by peer)
[2012-05-22 10:24:04.998617] W [socket.c:1512:__socket_proto_state_machine] 0-new-client-0: reading from socket failed. Error (Connection reset by pee
r), peer (127.0.0.1:24010)
[2012-05-22 10:24:04.998972] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7f0e0b4a3517] (-->/usr/
local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f0e0b4a2a8c] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f0e0b4a251
a]))) 0-new-client-0: forced unwinding frame type(GlusterFS 3.1) op(FLUSH(15)) called at 2012-05-22 10:24:04.996732 (xid=0x249573x)
[2012-05-22 10:24:04.999016] W [client3_1-fops.c:881:client3_1_flush_cbk] 0-new-client-0: remote operation failed: Transport endpoint is not connected
[2012-05-22 10:24:04.999504] I [socket.c:2315:socket_submit_request] 0-new-client-0: not connected (priv->connected = 0)
[2012-05-22 10:24:04.999537] W [rpc-clnt.c:1498:rpc_clnt_submit] 0-new-client-0: failed to submit rpc-request (XID: 0x249575x Program: GlusterFS 3.1, 
ProgVers: 330, Proc: 41) to rpc-transport (new-client-0)
[2012-05-22 10:24:04.999621] E [rpc-clnt.c:373:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x123) [0x7f0e0b4a3517] (-->/usr/
local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x155) [0x7f0e0b4a2a8c] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f0e0b4a251
a]))) 0-new-client-0: forced unwinding frame type(GlusterFS 3.1) op(STAT(1)) called at 2012-05-22 10:24:04.996978 (xid=0x249574x)
[2012-05-22 10:24:04.999653] W [client3_1-fops.c:474:client3_1_stat_cbk] 0-new-client-0: remote operation failed: Transport endpoint is not connected
[2012-05-22 10:24:04.999713] I [client.c:2090:client_rpc_notify] 0-new-client-0: disconnected
[2012-05-22 10:24:04.999747] E [socket.c:1715:socket_connect_finish] 0-new-client-0: connection to 127.0.0.1:24010 failed (Connection refused)
[2012-05-22 10:24:05.000211] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-new-client-0: remote operation failed: Transport endpoint is not connect
ed. Path: /linux-2.6.31.1/drivers/infiniband/hw/ipath/ipath_iba7220.c (00000000-0000-0000-0000-000000000000)
Comment 4 shishir gowda 2012-07-10 23:51:31 EDT
Can you please check if the issue still exists on the latest git repo?
Comment 5 shishir gowda 2012-12-26 05:28:22 EST
This works fine with 3.4.0qa5. Please re-open if found otherwise.

Note You need to log in before you can comment on or make changes to this bug.