Hide Forgot
The source brick (with pump) crashes on an assert conditon on afr_lookup_save_gfid. Seen on master as on Nov 3, commit id 3200a2be434c462b43bf3ffe0343ddc8900c5d88 Steps to reproduce: 1) Create a pure-replicate volume. (cluster.self-heal-daemon must be on) 2) Start replace-brick operation with one of the replica. Info: root@trantor:~# gluster volume info Volume Name: vol Type: Replicate Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: trantor:/gfs/brick1 Brick2: trantor:/gfs/brick2 Options Reconfigured: diagnostics.brick-log-level: DEBUG Last seen activity in self-heal-daemon, <snip> [2011-11-03 15:39:08.734125] I [afr-common.c:3479:afr_notify] 0-vol-replicate-0: subvol 1 came up, start crawl [2011-11-03 15:39:08.734148] I [afr-self-heald.c:487:afr_proactive_self_heal] 0-vol-replicate-0: starting crawl for 1 [2011-11-03 15:39:09.043706] W [socket.c:1510:__socket_proto_state_machine] 0-vol-client-1: reading from socket failed. Error (Transport endpoint is not connected), peer (192.168.1.84:24011) [2011-11-03 15:39:09.043989] E [rpc-clnt.c:380:saved_frames_unwind] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x13c) [0x7f35adfabe01] (-->/usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x110) [0x7f35adfab326] (-->/usr/local/lib/libgfrpc.so.0(saved_frames_destroy+0x1f) [0x7f35adfaadb1]))) 0-vol-client-1: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-11-03 15:39:08.849370 [2011-11-03 15:39:09.044016] W [client3_1-fops.c:2250:client3_1_lookup_cbk] 0-vol-client-1: remote operation failed: Transport endpoint is not connected. Path: /file12 [2011-11-03 15:39:09.044089] I [client.c:1885:client_rpc_notify] 0-vol-client-1: disconnected </snip> Last seen activity on server (brick2), <snip> [2011-11-03 15:39:08.845637] D [inodelk.c:297:__inode_unlock_lock] 0-vol-locks: Matching lock found for unlock [2011-11-03 15:39:08.849437] D [afr-common.c:128:afr_lookup_xattr_req_prepare] 0-vol-pump: /file12: failed to get the gfid from dict </snip> (gdb) bt #0 0x00007f0136053ba5 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f01360576b0 in abort () at abort.c:92 #2 0x00007f013604ca71 in __assert_fail (assertion=0x7f0130c23bbd "new && !uuid_is_null (new)", file=<value optimized out>, line=145, function=0x7f0130c27a00 "afr_lookup_save_gfid") at assert.c:81 #3 0x00007f0130bf48fa in afr_lookup_save_gfid (dst=0x7f011c00de38 "", new=0x0, inode=0x7f012cea2238) at afr-common.c:145 #4 0x00007f0130bf9a44 in afr_lookup (frame=0x7f01355228d8, this=0xd1ad40, loc=0x7f011c005fa0, xattr_req=0x7f011c00bc90) at afr-common.c:2017 #5 0x00007f0130c189d3 in pump_lookup (frame=0x7f01355228d8, this=0xd1ad40, loc=0x7f011c005fa0, xattr_req=0x7f011c00bc90) at pump.c:1754 #6 0x00007f013098794d in marker_lookup (frame=0x7f0135525d40, this=0xd1c000, loc=0x7f011c005fa0, xattr_req=0x7f011c00bc90) at marker.c:2193 #7 0x00007f013076adf9 in io_stats_lookup (frame=0x7f013553b744, this=0xd1d630, loc=0x7f011c005fa0, xattr_req=0x7f011c00bc90) at io-stats.c:1822 #8 0x00007f0130548828 in server_lookup_resume (frame=0x7f013529b394, bound_xl=0xd1d630) at server3_1-fops.c:2665 #9 0x00007f01305340fe in server_resolve_done (frame=0x7f013529b394) at server-resolve.c:597 #10 0x00007f01305341ff in server_resolve_all (frame=0x7f013529b394) at server-resolve.c:632 #11 0x00007f0130534092 in server_resolve (frame=0x7f013529b394) at server-resolve.c:579 #12 0x00007f01305341d6 in server_resolve_all (frame=0x7f013529b394) at server-resolve.c:628 #13 0x00007f0130533caf in server_resolve_entry (frame=0x7f013529b394) at server-resolve.c:453 #14 0x00007f0130533fa7 in server_resolve (frame=0x7f013529b394) at server-resolve.c:561 #15 0x00007f0130534181 in server_resolve_all (frame=0x7f013529b394) at server-resolve.c:621 #16 0x00007f0130534297 in resolve_and_resume (frame=0x7f013529b394, fn=0x7f01305485d7 <server_lookup_resume>) at server-resolve.c:651 #17 0x00007f013054eae6 in server_lookup (req=0x7f0136f3904c) at server3_1-fops.c:5119 #18 0x00007f01369e6170 in rpcsvc_handle_rpc_call (svc=0xd21500, trans=0xf7f960, msg=0x7f011c00a200) at rpcsvc.c:507 #19 0x00007f01369e6513 in rpcsvc_notify (trans=0xf7f960, mydata=0xd21500, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f011c00a200) at rpcsvc.c:603 #20 0x00007f01369ebfa9 in rpc_transport_notify (this=0xf7f960, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f011c00a200) at rpc-transport.c:498 #21 0x00007f0133f3a3cc in socket_event_poll_in (this=0xf7f960) at socket.c:1675 #22 0x00007f0133f3a950 in socket_event_handler (fd=21, idx=6, data=0xf7f960, poll_in=1, poll_out=0, poll_err=0) at socket.c:1790 #23 0x00007f0136c44d92 in event_dispatch_epoll_handler (event_pool=0xd0a150, events=0xd0ede0, i=0) at event.c:794 #24 0x00007f0136c44fb5 in event_dispatch_epoll (event_pool=0xd0a150) at event.c:856 #25 0x00007f0136c45340 in event_dispatch (event_pool=0xd0a150) at event.c:956 #26 0x0000000000407d2c in main (argc=17, argv=0x7ffffbf9a578) at glusterfsd.c:1592
This does not happen after the fix to 3783