Bug 786068 - replace-brick on a volume with rdma transport failed
Summary: replace-brick on a volume with rdma transport failed
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: pre-release
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: krishnan parthasarathi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 852311 858478
TreeView+ depends on / blocked
 
Reported: 2012-01-31 11:03 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:57 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
: 852311 (view as bug list)
Environment:
Last Closed: 2015-10-22 15:40:20 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
glusterd log file (115.93 KB, text/x-log)
2012-01-31 11:03 UTC, M S Vishwanath Bhat
no flags Details

Description M S Vishwanath Bhat 2012-01-31 11:03:47 UTC
Created attachment 558599 [details]
glusterd log file

Description of problem:
I was mounting and umounting the fuse client in a for loop. From another machine issued a replace-brick. replace-brick status got hung for a long time and it simply exited with non-zero exit status. There was no data on the mountpoint. 

Version-Release number of selected component (if applicable):
glusterfs-3.3.0qa20

How reproducible:
1/1

Steps to Reproduce:
1. Create and start a stripe-rep volume with rdma transport type.
2. In a for loop mount the volume sleep for sometime and unmount it.
3. After some time issue replace-brick start from another machine. 
4. Issue replace-brick status from same machine.

Actual results:
replace-brick started successfully but status failed.
[root@client4 /]# gluster v replace-brick hosdu 10.1.10.24:/data/export-brick/hosdu_brick4 10.1.10.21:/data/export-brick/hosdu_brick5 status
[root@client4 /]# echo $?
110
Subsequent replace-brick status got hung for ever.

Expected results:
replace brick status should give the status of the replace-brick. It should not fail.

Additional info:

log entries from replace brick temporary mount.

2012-01-31 03:24:48.586673] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:24:48.606171] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile
[2012-01-31 03:24:51.524821] I [client.c:1937:notify] 0-mnt-client: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
  1: volume mnt-client
  2:  type protocol/client
  3:  option remote-host 10.1.10.24
  4:  option remote-subvolume /data/export-brick/hosdu_brick4
  5:  option remote-port 24010
  6:  option transport-type rdma
  7: end-volume
  8: volume mnt-wb
  9:  type performance/write-behind
 10:  subvolumes mnt-client
 11: end-volume

+------------------------------------------------------------------------------+
[2012-01-31 03:24:52.301751] I [client-handshake.c:1085:select_server_supported_programs] 0-mnt-client: Using Program GlusterFS 3.3.0qa20, Num (1298437), Version (310)
[2012-01-31 03:24:52.305480] I [client-handshake.c:917:client_setvolume_cbk] 0-mnt-client: Connected to 10.1.10.24:24010, attached to remote volume '/data/export-brick/hosdu_brick4'.
[2012-01-31 03:24:52.311484] I [fuse-bridge.c:3718:fuse_graph_setup] 0-fuse: switched to graph 0
[2012-01-31 03:24:52.311728] I [fuse-bridge.c:3297:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2012-01-31 03:24:53.739019] I [fuse-bridge.c:3617:fuse_thread_proc] 0-fuse: unmounting /etc/glusterd/vols/hosdu/rb_mount
[2012-01-31 03:24:53.752118] W [glusterfsd.c:783:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x31940e577d] (-->/lib64/libpthread.so.0() [0x31948077e1] (-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xfc) [0x40716f]))) 0-: received signum (15), shutting down
[2012-01-31 03:35:14.757741] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:35:14.838450] W [write-behind.c:2892:init] 0-mnt-wb: dangling volume. check volfile


log entries in rb dest brick


[2012-01-31 03:25:34.746459] I [glusterfsd.c:1578:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 3.3.0qa20
[2012-01-31 03:25:34.836311] I [graph.c:250:gf_add_cmdline_options] 0-src-server: adding option 'listen-port' for volume 'src-server' with value '24011'
[2012-01-31 03:25:34.842592] W [options.c:661:xl_opt_validate] 0-src-server: option 'listen-port' is deprecated, preferred is 'transport.rdma.listen-port', continuing with correction
Given volfile:
+------------------------------------------------------------------------------+
  1: volume src-posix
  2:  type storage/posix
  3:  option directory /data/export-brick/hosdu_brick5
  4: end-volume
  5: volume /data/export-brick/hosdu_brick5
  6:  type features/locks
  7:  subvolumes src-posix
  8: end-volume
  9: volume src-server
 10:  type protocol/server
 11:  option auth.addr./data/export-brick/hosdu_brick5.allow *
 12:  option transport-type rdma
 13:  subvolumes /data/export-brick/hosdu_brick5
 14: end-volume

+------------------------------------------------------------------------------+
[2012-01-31 03:25:45.215499] I [server-handshake.c:540:server_setvolume] 0-src-server: accepted client from 10.1.10.24:980 (version: 3.3.0qa20)

I have attached the glusterd logs from the machine where I have issued the replace-brick command.

Comment 1 Amar Tumballi 2013-02-26 10:34:47 UTC
blocked on RDMA support on master to start testing.

Comment 2 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.


Note You need to log in before you can comment on or make changes to this bug.