Bug 786068

Summary: replace-brick on a volume with rdma transport failed
Product: [Community] GlusterFS Reporter: M S Vishwanath Bhat <vbhat>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED EOL QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: pre-releaseCC: bugs, gluster-bugs, mzywusko, nsathyan, rwheeler, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 852311 (view as bug list) Environment:
Last Closed: 2015-10-22 15:40:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 852311, 858478    
Attachments:
Description Flags
glusterd log file none

Description M S Vishwanath Bhat 2012-01-31 11:03:47 UTC
Created attachment 558599 [details]
glusterd log file

Description of problem:
I was mounting and umounting the fuse client in a for loop. From another machine issued a replace-brick. replace-brick status got hung for a long time and it simply exited with non-zero exit status. There was no data on the mountpoint. 

Version-Release number of selected component (if applicable):
glusterfs-3.3.0qa20

How reproducible:
1/1

Steps to Reproduce:
1. Create and start a stripe-rep volume with rdma transport type.
2. In a for loop mount the volume sleep for sometime and unmount it.
3. After some time issue replace-brick start from another machine. 
4. Issue replace-brick status from same machine.

Actual results:
replace-brick started successfully but status failed.
[root@client4 /]# gluster v replace-brick hosdu 10.1.10.24:/data/export-brick/hosdu_brick4 10.1.10.21:/data/export-brick/hosdu_brick5 status
[root@client4 /]# echo $?
110
Subsequent replace-brick status got hung for ever.

Expected results:
replace brick status should give the status of the replace-brick. It should not fail.

Additional info:

log entries from replace brick temporary mount.

2012-01-31 03:24:48.586673] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:24:48.606171] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile
[2012-01-31 03:24:51.524821] I [client.c:1937:notify] 0-mnt-client: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
  1: volume mnt-client
  2:  type protocol/client
  3:  option remote-host 10.1.10.24
  4:  option remote-subvolume /data/export-brick/hosdu_brick4
  5:  option remote-port 24010
  6:  option transport-type rdma
  7: end-volume
  8: volume mnt-wb
  9:  type performance/write-behind
 10:  subvolumes mnt-client
 11: end-volume

+------------------------------------------------------------------------------+
[2012-01-31 03:24:52.301751] I [client-handshake.c:1085:select_server_supported_programs] 0-mnt-client: Using Program GlusterFS 3.3.0qa20, Num (1298437), Version (310)
[2012-01-31 03:24:52.305480] I [client-handshake.c:917:client_setvolume_cbk] 0-mnt-client: Connected to 10.1.10.24:24010, attached to remote volume '/data/export-brick/hosdu_brick4'.
[2012-01-31 03:24:52.311484] I [fuse-bridge.c:3718:fuse_graph_setup] 0-fuse: switched to graph 0
[2012-01-31 03:24:52.311728] I [fuse-bridge.c:3297:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2012-01-31 03:24:53.739019] I [fuse-bridge.c:3617:fuse_thread_proc] 0-fuse: unmounting /etc/glusterd/vols/hosdu/rb_mount
[2012-01-31 03:24:53.752118] W [glusterfsd.c:783:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x31940e577d] (-->/lib64/libpthread.so.0() [0x31948077e1] (-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xfc) [0x40716f]))) 0-: received signum (15), shutting down
[2012-01-31 03:35:14.757741] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:35:14.838450] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile


log entries in rb dest brick


[2012-01-31 03:25:34.746459] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:25:34.836311] I [graph.c:250:gf_add_cmdline_options] 0-src-server: adding option 'listen-port' for volume 'src-server' with value '24011'
[2012-01-31 03:25:34.842592] W [options.c:661:xl_opt_validate] 0-src-server: option 'listen-port' is deprecated, preferred is 'transport.rdma.listen-port', continuing with correction
Given volfile:
+------------------------------------------------------------------------------+
  1: volume src-posix
  2:  type storage/posix
  3:  option directory /data/export-brick/hosdu_brick5
  4: end-volume
  5: volume /data/export-brick/hosdu_brick5
  6:  type features/locks
  7:  subvolumes src-posix
  8: end-volume
  9: volume src-server
 10:  type protocol/server
 11:  option auth.addr./data/export-brick/hosdu_brick5.allow *
 12:  option transport-type rdma
 13:  subvolumes /data/export-brick/hosdu_brick5
 14: end-volume

+------------------------------------------------------------------------------+
[2012-01-31 03:25:45.215499] I [server-handshake.c:540:server_setvolume] 0-src-server: accepted client from 10.1.10.24:980 (version: 3.3.0qa20)

I have attached the glusterd logs from the machine where I have issued the replace-brick command.

Comment 1 Amar Tumballi 2013-02-26 10:34:47 UTC
blocked on RDMA support on master to start testing.

Comment 2 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.