Bug 852311 - replace-brick on a volume with rdma transport failed
replace-brick on a volume with rdma transport failed
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rdma (Show other bugs)
Unspecified Unspecified
low Severity medium
: ---
: ---
Assigned To: Raghavendra G
: ZStream
Depends On: 786068
Blocks: 858478
  Show dependency treegraph
Reported: 2012-08-28 03:25 EDT by Vidya Sakar
Modified: 2015-02-13 05:18 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 786068
: 858478 (view as bug list)
Last Closed: 2015-02-13 05:18:02 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Vidya Sakar 2012-08-28 03:25:38 EDT
+++ This bug was initially created as a clone of Bug #786068 +++

Created attachment 558599 [details]
glusterd log file

Description of problem:
I was mounting and umounting the fuse client in a for loop. From another machine issued a replace-brick. replace-brick status got hung for a long time and it simply exited with non-zero exit status. There was no data on the mountpoint. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create and start a stripe-rep volume with rdma transport type.
2. In a for loop mount the volume sleep for sometime and unmount it.
3. After some time issue replace-brick start from another machine. 
4. Issue replace-brick status from same machine.

Actual results:
replace-brick started successfully but status failed.
[root@client4 /]# gluster v replace-brick hosdu status
[root@client4 /]# echo $?
Subsequent replace-brick status got hung for ever.

Expected results:
replace brick status should give the status of the replace-brick. It should not fail.

Additional info:

log entries from replace brick temporary mount.

2012-01-31 03:24:48.586673] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:24:48.606171] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile
[2012-01-31 03:24:51.524821] I [client.c:1937:notify] 0-mnt-client: parent translators are ready, attempting connect on transport
Given volfile:
  1: volume mnt-client
  2:  type protocol/client
  3:  option remote-host
  4:  option remote-subvolume /data/export-brick/hosdu_brick4
  5:  option remote-port 24010
  6:  option transport-type rdma
  7: end-volume
  8: volume mnt-wb
  9:  type performance/write-behind
 10:  subvolumes mnt-client
 11: end-volume

[2012-01-31 03:24:52.301751] I [client-handshake.c:1085:select_server_supported_programs] 0-mnt-client: Using Program GlusterFS 3.3.0qa20, Num (1298437), Version (310)
[2012-01-31 03:24:52.305480] I [client-handshake.c:917:client_setvolume_cbk] 0-mnt-client: Connected to, attached to remote volume '/data/export-brick/hosdu_brick4'.
[2012-01-31 03:24:52.311484] I [fuse-bridge.c:3718:fuse_graph_setup] 0-fuse: switched to graph 0
[2012-01-31 03:24:52.311728] I [fuse-bridge.c:3297:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2012-01-31 03:24:53.739019] I [fuse-bridge.c:3617:fuse_thread_proc] 0-fuse: unmounting /etc/glusterd/vols/hosdu/rb_mount
[2012-01-31 03:24:53.752118] W [glusterfsd.c:783:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x31940e577d] (-->/lib64/libpthread.so.0() [0x31948077e1] (-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xfc) [0x40716f]))) 0-: received signum (15), shutting down
[2012-01-31 03:35:14.757741] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:35:14.838450] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile

log entries in rb dest brick

[2012-01-31 03:25:34.746459] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:25:34.836311] I [graph.c:250:gf_add_cmdline_options] 0-src-server: adding option 'listen-port' for volume 'src-server' with value '24011'
[2012-01-31 03:25:34.842592] W [options.c:661:xl_opt_validate] 0-src-server: option 'listen-port' is deprecated, preferred is 'transport.rdma.listen-port', continuing with correction
Given volfile:
  1: volume src-posix
  2:  type storage/posix
  3:  option directory /data/export-brick/hosdu_brick5
  4: end-volume
  5: volume /data/export-brick/hosdu_brick5
  6:  type features/locks
  7:  subvolumes src-posix
  8: end-volume
  9: volume src-server
 10:  type protocol/server
 11:  option auth.addr./data/export-brick/hosdu_brick5.allow *
 12:  option transport-type rdma
 13:  subvolumes /data/export-brick/hosdu_brick5
 14: end-volume

[2012-01-31 03:25:45.215499] I [server-handshake.c:540:server_setvolume] 0-src-server: accepted client from (version: 3.3.0qa20)

I have attached the glusterd logs from the machine where I have issued the replace-brick command.
Comment 2 Amar Tumballi 2012-10-05 13:12:28 EDT
not planning to work on fixing replace-brick for rhs-2.1... technically, replace-brick should be removed... for now, moving it to future
Comment 5 Scott Haines 2013-09-23 19:18:48 EDT
Retargeting for 2.1.z U2 (Corbett) release.

Note You need to log in before you can comment on or make changes to this bug.