Bug 786068 - replace-brick on a volume with rdma transport failed
replace-brick on a volume with rdma transport failed
Status: CLOSED EOL
Product: GlusterFS
Classification: Community
Component: glusterd (Show other bugs)
pre-release
Unspecified Unspecified
low Severity medium
: ---
: ---
Assigned To: krishnan parthasarathi
: Triaged
Depends On:
Blocks: 852311 858478
  Show dependency treegraph
 
Reported: 2012-01-31 06:03 EST by M S Vishwanath Bhat
Modified: 2015-11-03 18:06 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 852311 (view as bug list)
Environment:
Last Closed: 2015-10-22 11:40:20 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
glusterd log file (115.93 KB, text/x-log)
2012-01-31 06:03 EST, M S Vishwanath Bhat
no flags Details

  None (edit)
Description M S Vishwanath Bhat 2012-01-31 06:03:47 EST
Created attachment 558599 [details]
glusterd log file

Description of problem:
I was mounting and umounting the fuse client in a for loop. From another machine issued a replace-brick. replace-brick status got hung for a long time and it simply exited with non-zero exit status. There was no data on the mountpoint. 

Version-Release number of selected component (if applicable):
glusterfs-3.3.0qa20

How reproducible:
1/1

Steps to Reproduce:
1. Create and start a stripe-rep volume with rdma transport type.
2. In a for loop mount the volume sleep for sometime and unmount it.
3. After some time issue replace-brick start from another machine. 
4. Issue replace-brick status from same machine.

Actual results:
replace-brick started successfully but status failed.
[root@client4 /]# gluster v replace-brick hosdu 10.1.10.24:/data/export-brick/hosdu_brick4 10.1.10.21:/data/export-brick/hosdu_brick5 status
[root@client4 /]# echo $?
110
Subsequent replace-brick status got hung for ever.

Expected results:
replace brick status should give the status of the replace-brick. It should not fail.

Additional info:

log entries from replace brick temporary mount.

2012-01-31 03:24:48.586673] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:24:48.606171] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile
[2012-01-31 03:24:51.524821] I [client.c:1937:notify] 0-mnt-client: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
  1: volume mnt-client
  2:  type protocol/client
  3:  option remote-host 10.1.10.24
  4:  option remote-subvolume /data/export-brick/hosdu_brick4
  5:  option remote-port 24010
  6:  option transport-type rdma
  7: end-volume
  8: volume mnt-wb
  9:  type performance/write-behind
 10:  subvolumes mnt-client
 11: end-volume

+------------------------------------------------------------------------------+
[2012-01-31 03:24:52.301751] I [client-handshake.c:1085:select_server_supported_programs] 0-mnt-client: Using Program GlusterFS 3.3.0qa20, Num (1298437), Version (310)
[2012-01-31 03:24:52.305480] I [client-handshake.c:917:client_setvolume_cbk] 0-mnt-client: Connected to 10.1.10.24:24010, attached to remote volume '/data/export-brick/hosdu_brick4'.
[2012-01-31 03:24:52.311484] I [fuse-bridge.c:3718:fuse_graph_setup] 0-fuse: switched to graph 0
[2012-01-31 03:24:52.311728] I [fuse-bridge.c:3297:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2012-01-31 03:24:53.739019] I [fuse-bridge.c:3617:fuse_thread_proc] 0-fuse: unmounting /etc/glusterd/vols/hosdu/rb_mount
[2012-01-31 03:24:53.752118] W [glusterfsd.c:783:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x31940e577d] (-->/lib64/libpthread.so.0() [0x31948077e1] (-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xfc) [0x40716f]))) 0-: received signum (15), shutting down
[2012-01-31 03:35:14.757741] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:35:14.838450] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile


log entries in rb dest brick


[2012-01-31 03:25:34.746459] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:25:34.836311] I [graph.c:250:gf_add_cmdline_options] 0-src-server: adding option 'listen-port' for volume 'src-server' with value '24011'
[2012-01-31 03:25:34.842592] W [options.c:661:xl_opt_validate] 0-src-server: option 'listen-port' is deprecated, preferred is 'transport.rdma.listen-port', continuing with correction
Given volfile:
+------------------------------------------------------------------------------+
  1: volume src-posix
  2:  type storage/posix
  3:  option directory /data/export-brick/hosdu_brick5
  4: end-volume
  5: volume /data/export-brick/hosdu_brick5
  6:  type features/locks
  7:  subvolumes src-posix
  8: end-volume
  9: volume src-server
 10:  type protocol/server
 11:  option auth.addr./data/export-brick/hosdu_brick5.allow *
 12:  option transport-type rdma
 13:  subvolumes /data/export-brick/hosdu_brick5
 14: end-volume

+------------------------------------------------------------------------------+
[2012-01-31 03:25:45.215499] I [server-handshake.c:540:server_setvolume] 0-src-server: accepted client from 10.1.10.24:980 (version: 3.3.0qa20)

I have attached the glusterd logs from the machine where I have issued the replace-brick command.
Comment 1 Amar Tumballi 2013-02-26 05:34:47 EST
blocked on RDMA support on master to start testing.
Comment 2 Kaleb KEITHLEY 2015-10-22 11:40:20 EDT
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

Note You need to log in before you can comment on or make changes to this bug.