+++ This bug was initially created as a clone of Bug #786068 +++ Created attachment 558599 [details] glusterd log file Description of problem: I was mounting and umounting the fuse client in a for loop. From another machine issued a replace-brick. replace-brick status got hung for a long time and it simply exited with non-zero exit status. There was no data on the mountpoint. Version-Release number of selected component (if applicable): glusterfs-3.3.0qa20 How reproducible: 1/1 Steps to Reproduce: 1. Create and start a stripe-rep volume with rdma transport type. 2. In a for loop mount the volume sleep for sometime and unmount it. 3. After some time issue replace-brick start from another machine. 4. Issue replace-brick status from same machine. Actual results: replace-brick started successfully but status failed. [root@client4 /]# gluster v replace-brick hosdu 10.1.10.24:/data/export-brick/hosdu_brick4 10.1.10.21:/data/export-brick/hosdu_brick5 status [root@client4 /]# echo $? 110 Subsequent replace-brick status got hung for ever. Expected results: replace brick status should give the status of the replace-brick. It should not fail. Additional info: log entries from replace brick temporary mount. 2012-01-31 03:24:48.586673] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20 [2012-01-31 03:24:48.606171] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile [2012-01-31 03:24:51.524821] I [client.c:1937:notify] 0-mnt-client: parent translators are ready, attempting connect on transport Given volfile: +------------------------------------------------------------------------------+ 1: volume mnt-client 2: type protocol/client 3: option remote-host 10.1.10.24 4: option remote-subvolume /data/export-brick/hosdu_brick4 5: option remote-port 24010 6: option transport-type rdma 7: end-volume 8: volume mnt-wb 9: type performance/write-behind 10: subvolumes mnt-client 11: end-volume +------------------------------------------------------------------------------+ [2012-01-31 03:24:52.301751] I [client-handshake.c:1085:select_server_supported_programs] 0-mnt-client: Using Program GlusterFS 3.3.0qa20, Num (1298437), Version (310) [2012-01-31 03:24:52.305480] I [client-handshake.c:917:client_setvolume_cbk] 0-mnt-client: Connected to 10.1.10.24:24010, attached to remote volume '/data/export-brick/hosdu_brick4'. [2012-01-31 03:24:52.311484] I [fuse-bridge.c:3718:fuse_graph_setup] 0-fuse: switched to graph 0 [2012-01-31 03:24:52.311728] I [fuse-bridge.c:3297:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13 [2012-01-31 03:24:53.739019] I [fuse-bridge.c:3617:fuse_thread_proc] 0-fuse: unmounting /etc/glusterd/vols/hosdu/rb_mount [2012-01-31 03:24:53.752118] W [glusterfsd.c:783:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x31940e577d] (-->/lib64/libpthread.so.0() [0x31948077e1] (-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xfc) [0x40716f]))) 0-: received signum (15), shutting down [2012-01-31 03:35:14.757741] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20 [2012-01-31 03:35:14.838450] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile log entries in rb dest brick [2012-01-31 03:25:34.746459] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20 [2012-01-31 03:25:34.836311] I [graph.c:250:gf_add_cmdline_options] 0-src-server: adding option 'listen-port' for volume 'src-server' with value '24011' [2012-01-31 03:25:34.842592] W [options.c:661:xl_opt_validate] 0-src-server: option 'listen-port' is deprecated, preferred is 'transport.rdma.listen-port', continuing with correction Given volfile: +------------------------------------------------------------------------------+ 1: volume src-posix 2: type storage/posix 3: option directory /data/export-brick/hosdu_brick5 4: end-volume 5: volume /data/export-brick/hosdu_brick5 6: type features/locks 7: subvolumes src-posix 8: end-volume 9: volume src-server 10: type protocol/server 11: option auth.addr./data/export-brick/hosdu_brick5.allow * 12: option transport-type rdma 13: subvolumes /data/export-brick/hosdu_brick5 14: end-volume +------------------------------------------------------------------------------+ [2012-01-31 03:25:45.215499] I [server-handshake.c:540:server_setvolume] 0-src-server: accepted client from 10.1.10.24:980 (version: 3.3.0qa20) I have attached the glusterd logs from the machine where I have issued the replace-brick command.
not planning to work on fixing replace-brick for rhs-2.1... technically, replace-brick should be removed... for now, moving it to future
Retargeting for 2.1.z U2 (Corbett) release.