Bug 786068
Summary: | replace-brick on a volume with rdma transport failed | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | M S Vishwanath Bhat <vbhat> | ||||
Component: | glusterd | Assignee: | krishnan parthasarathi <kparthas> | ||||
Status: | CLOSED EOL | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | pre-release | CC: | bugs, gluster-bugs, mzywusko, nsathyan, rwheeler, vbellur | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 852311 (view as bug list) | Environment: | |||||
Last Closed: | 2015-10-22 15:40:20 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 852311, 858478 | ||||||
Attachments: |
|
blocked on RDMA support on master to start testing. pre-release version is ambiguous and about to be removed as a choice. If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it. |
Created attachment 558599 [details] glusterd log file Description of problem: I was mounting and umounting the fuse client in a for loop. From another machine issued a replace-brick. replace-brick status got hung for a long time and it simply exited with non-zero exit status. There was no data on the mountpoint. Version-Release number of selected component (if applicable): glusterfs-3.3.0qa20 How reproducible: 1/1 Steps to Reproduce: 1. Create and start a stripe-rep volume with rdma transport type. 2. In a for loop mount the volume sleep for sometime and unmount it. 3. After some time issue replace-brick start from another machine. 4. Issue replace-brick status from same machine. Actual results: replace-brick started successfully but status failed. [root@client4 /]# gluster v replace-brick hosdu 10.1.10.24:/data/export-brick/hosdu_brick4 10.1.10.21:/data/export-brick/hosdu_brick5 status [root@client4 /]# echo $? 110 Subsequent replace-brick status got hung for ever. Expected results: replace brick status should give the status of the replace-brick. It should not fail. Additional info: log entries from replace brick temporary mount. 2012-01-31 03:24:48.586673] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20 [2012-01-31 03:24:48.606171] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile [2012-01-31 03:24:51.524821] I [client.c:1937:notify] 0-mnt-client: parent translators are ready, attempting connect on transport Given volfile: +------------------------------------------------------------------------------+ 1: volume mnt-client 2: type protocol/client 3: option remote-host 10.1.10.24 4: option remote-subvolume /data/export-brick/hosdu_brick4 5: option remote-port 24010 6: option transport-type rdma 7: end-volume 8: volume mnt-wb 9: type performance/write-behind 10: subvolumes mnt-client 11: end-volume +------------------------------------------------------------------------------+ [2012-01-31 03:24:52.301751] I [client-handshake.c:1085:select_server_supported_programs] 0-mnt-client: Using Program GlusterFS 3.3.0qa20, Num (1298437), Version (310) [2012-01-31 03:24:52.305480] I [client-handshake.c:917:client_setvolume_cbk] 0-mnt-client: Connected to 10.1.10.24:24010, attached to remote volume '/data/export-brick/hosdu_brick4'. [2012-01-31 03:24:52.311484] I [fuse-bridge.c:3718:fuse_graph_setup] 0-fuse: switched to graph 0 [2012-01-31 03:24:52.311728] I [fuse-bridge.c:3297:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13 [2012-01-31 03:24:53.739019] I [fuse-bridge.c:3617:fuse_thread_proc] 0-fuse: unmounting /etc/glusterd/vols/hosdu/rb_mount [2012-01-31 03:24:53.752118] W [glusterfsd.c:783:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x31940e577d] (-->/lib64/libpthread.so.0() [0x31948077e1] (-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xfc) [0x40716f]))) 0-: received signum (15), shutting down [2012-01-31 03:35:14.757741] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20 [2012-01-31 03:35:14.838450] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile log entries in rb dest brick [2012-01-31 03:25:34.746459] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20 [2012-01-31 03:25:34.836311] I [graph.c:250:gf_add_cmdline_options] 0-src-server: adding option 'listen-port' for volume 'src-server' with value '24011' [2012-01-31 03:25:34.842592] W [options.c:661:xl_opt_validate] 0-src-server: option 'listen-port' is deprecated, preferred is 'transport.rdma.listen-port', continuing with correction Given volfile: +------------------------------------------------------------------------------+ 1: volume src-posix 2: type storage/posix 3: option directory /data/export-brick/hosdu_brick5 4: end-volume 5: volume /data/export-brick/hosdu_brick5 6: type features/locks 7: subvolumes src-posix 8: end-volume 9: volume src-server 10: type protocol/server 11: option auth.addr./data/export-brick/hosdu_brick5.allow * 12: option transport-type rdma 13: subvolumes /data/export-brick/hosdu_brick5 14: end-volume +------------------------------------------------------------------------------+ [2012-01-31 03:25:45.215499] I [server-handshake.c:540:server_setvolume] 0-src-server: accepted client from 10.1.10.24:980 (version: 3.3.0qa20) I have attached the glusterd logs from the machine where I have issued the replace-brick command.