Bug 852311 - replace-brick on a volume with rdma transport failed
Summary: replace-brick on a volume with rdma transport failed
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rdma
Version: 2.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: ---
Assignee: Raghavendra G
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On: 786068
Blocks: 858478
TreeView+ depends on / blocked
 
Reported: 2012-08-28 07:25 UTC by Vidya Sakar
Modified: 2015-02-13 10:18 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 786068
: 858478 (view as bug list)
Environment:
Last Closed: 2015-02-13 10:18:02 UTC
Embargoed:


Attachments (Terms of Use)

Description Vidya Sakar 2012-08-28 07:25:38 UTC
+++ This bug was initially created as a clone of Bug #786068 +++

Created attachment 558599 [details]
glusterd log file

Description of problem:
I was mounting and umounting the fuse client in a for loop. From another machine issued a replace-brick. replace-brick status got hung for a long time and it simply exited with non-zero exit status. There was no data on the mountpoint. 

Version-Release number of selected component (if applicable):
glusterfs-3.3.0qa20

How reproducible:
1/1

Steps to Reproduce:
1. Create and start a stripe-rep volume with rdma transport type.
2. In a for loop mount the volume sleep for sometime and unmount it.
3. After some time issue replace-brick start from another machine. 
4. Issue replace-brick status from same machine.

Actual results:
replace-brick started successfully but status failed.
[root@client4 /]# gluster v replace-brick hosdu 10.1.10.24:/data/export-brick/hosdu_brick4 10.1.10.21:/data/export-brick/hosdu_brick5 status
[root@client4 /]# echo $?
110
Subsequent replace-brick status got hung for ever.

Expected results:
replace brick status should give the status of the replace-brick. It should not fail.

Additional info:

log entries from replace brick temporary mount.

2012-01-31 03:24:48.586673] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:24:48.606171] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile
[2012-01-31 03:24:51.524821] I [client.c:1937:notify] 0-mnt-client: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
  1: volume mnt-client
  2:  type protocol/client
  3:  option remote-host 10.1.10.24
  4:  option remote-subvolume /data/export-brick/hosdu_brick4
  5:  option remote-port 24010
  6:  option transport-type rdma
  7: end-volume
  8: volume mnt-wb
  9:  type performance/write-behind
 10:  subvolumes mnt-client
 11: end-volume

+------------------------------------------------------------------------------+
[2012-01-31 03:24:52.301751] I [client-handshake.c:1085:select_server_supported_programs] 0-mnt-client: Using Program GlusterFS 3.3.0qa20, Num (1298437), Version (310)
[2012-01-31 03:24:52.305480] I [client-handshake.c:917:client_setvolume_cbk] 0-mnt-client: Connected to 10.1.10.24:24010, attached to remote volume '/data/export-brick/hosdu_brick4'.
[2012-01-31 03:24:52.311484] I [fuse-bridge.c:3718:fuse_graph_setup] 0-fuse: switched to graph 0
[2012-01-31 03:24:52.311728] I [fuse-bridge.c:3297:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2012-01-31 03:24:53.739019] I [fuse-bridge.c:3617:fuse_thread_proc] 0-fuse: unmounting /etc/glusterd/vols/hosdu/rb_mount
[2012-01-31 03:24:53.752118] W [glusterfsd.c:783:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x31940e577d] (-->/lib64/libpthread.so.0() [0x31948077e1] (-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xfc) [0x40716f]))) 0-: received signum (15), shutting down
[2012-01-31 03:35:14.757741] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:35:14.838450] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile


log entries in rb dest brick


[2012-01-31 03:25:34.746459] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:25:34.836311] I [graph.c:250:gf_add_cmdline_options] 0-src-server: adding option 'listen-port' for volume 'src-server' with value '24011'
[2012-01-31 03:25:34.842592] W [options.c:661:xl_opt_validate] 0-src-server: option 'listen-port' is deprecated, preferred is 'transport.rdma.listen-port', continuing with correction
Given volfile:
+------------------------------------------------------------------------------+
  1: volume src-posix
  2:  type storage/posix
  3:  option directory /data/export-brick/hosdu_brick5
  4: end-volume
  5: volume /data/export-brick/hosdu_brick5
  6:  type features/locks
  7:  subvolumes src-posix
  8: end-volume
  9: volume src-server
 10:  type protocol/server
 11:  option auth.addr./data/export-brick/hosdu_brick5.allow *
 12:  option transport-type rdma
 13:  subvolumes /data/export-brick/hosdu_brick5
 14: end-volume

+------------------------------------------------------------------------------+
[2012-01-31 03:25:45.215499] I [server-handshake.c:540:server_setvolume] 0-src-server: accepted client from 10.1.10.24:980 (version: 3.3.0qa20)

I have attached the glusterd logs from the machine where I have issued the replace-brick command.

Comment 2 Amar Tumballi 2012-10-05 17:12:28 UTC
not planning to work on fixing replace-brick for rhs-2.1... technically, replace-brick should be removed... for now, moving it to future

Comment 5 Scott Haines 2013-09-23 23:18:48 UTC
Retargeting for 2.1.z U2 (Corbett) release.


Note You need to log in before you can comment on or make changes to this bug.