Bug 977543

Summary: RDMA Start/Stop Volume not Reliable
Product: [Community] GlusterFS Reporter: Ryan <ryade>
Component: rdmaAssignee: GlusterFS Bugs list <gluster-bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.0-betaCC: bugs, gluster-bugs, kkeithle
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-14 13:13:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ryan 2013-06-24 20:46:21 UTC
Description of problem:

When I create an RDMA only volume the volume will not start/stop reliably.


Version-Release number of selected component (if applicable): 3.4beta3


How reproducible: Very


Steps to Reproduce:
1. use the PPA for Debian 3.4beta3
2. create rdma *only* volume of any type (bug does not exist for tcp/rdma mixed volumes)
3. issue a start/stop command to start/stop the volume

Actual results:

root@cs1-p:/var/log/glusterfs/bricks# gluster volume info test2
 
Volume Name: test2
Type: Stripe
Volume ID: fd2a2fc2-c5b3-4241-9e02-e2d936972357
Status: Stopped
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: cs1-i:/gluster/test2
Brick2: cs2-i:/gluster/test2

root@cs1-p:/var/log/glusterfs/bricks# gluster volume start test2
[2013-06-24 20:43:06.736688] I [glusterfsd.c:1878:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.4.0beta3 (/usr/sbin/glusterfsd -s cs1-i --volfile-id test2.cs1-i.gluster-test2 -p /var/lib/glusterd/vols/test2/run/cs1-i-gluster-test2.pid -S /var/run/b3bd7ac43203f9d2ad826ca1afa3578b.socket --brick-name /gluster/test2 -l /var/log/glusterfs/bricks/gluster-test2.log --xlator-option *-posix.glusterd-uuid=aa3ee3a2-b409-4e92-ac65-fe5cec361e74 --brick-port 49156 --xlator-option test2-server.listen-port=49156)
[2013-06-24 20:43:06.737885] I [socket.c:3480:socket_init] 0-socket.glusterfsd: SSL support is NOT enabled
[2013-06-24 20:43:06.737917] I [socket.c:3495:socket_init] 0-socket.glusterfsd: using system polling thread
[2013-06-24 20:43:06.737998] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
[2013-06-24 20:43:06.738011] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
[2013-06-24 20:43:06.740208] I [graph.c:239:gf_add_cmdline_options] 0-test2-server: adding option 'listen-port' for volume 'test2-server' with value '49156'
[2013-06-24 20:43:06.740226] I [graph.c:239:gf_add_cmdline_options] 0-test2-posix: adding option 'glusterd-uuid' for volume 'test2-posix' with value 'aa3ee3a2-b409-4e92-ac65-fe5cec361e74'
[2013-06-24 20:43:06.741268] W [options.c:848:xl_opt_validate] 0-test2-server: option 'listen-port' is deprecated, preferred is 'transport.rdma.listen-port', continuing with correction
Given volfile:
+------------------------------------------------------------------------------+
  1: volume test2-posix
  2:     type storage/posix
  3:     option volume-id fd2a2fc2-c5b3-4241-9e02-e2d936972357
  4:     option directory /gluster/test2
  5: end-volume
  6: 
  7: volume test2-access-control
  8:     type features/access-control
  9:     subvolumes test2-posix
 10: end-volume
 11: 
 12: volume test2-locks
 13:     type features/locks
 14:     subvolumes test2-access-control
 15: end-volume
 16: 
 17: volume test2-io-threads
 18:     type performance/io-threads
 19:     subvolumes test2-locks
 20: end-volume
 21: 
 22: volume test2-index
 23:     type features/index
 24:     option index-base /gluster/test2/.glusterfs/indices
 25:     subvolumes test2-io-threads
 26: end-volume
 27: 
 28: volume test2-marker
 29:     type features/marker
 30:     option quota off
 31:     option xtime off
 32:     option timestamp-file /var/lib/glusterd/vols/test2/marker.tstamp
 33:     option volume-uuid fd2a2fc2-c5b3-4241-9e02-e2d936972357
 34:     subvolumes test2-index
 35: end-volume
 36: 
 37: volume /gluster/test2
 38:     type debug/io-stats
 39:     option count-fop-hits off
 40:     option latency-measurement off
 41:     subvolumes test2-marker
 42: end-volume
 43: 
 44: volume test2-server
 45:     type protocol/server
 46:     option auth.addr./gluster/test2.allow *
 47:     option auth.login.74c6f975-4a3d-4597-ba18-b1e720680673.password 77e91996-e69b-4554-a5a8-e98b6b76ca6a
 48:     option auth.login./gluster/test2.allow 74c6f975-4a3d-4597-ba18-b1e720680673
 49:     option transport-type rdma
 50:     subvolumes /gluster/test2
 51: end-volume

+------------------------------------------------------------------------------+
volume start: test2: failed
root@cs1-p:/var/log/glusterfs/bricks# [2013-06-24 20:43:12.564942] I [server-handshake.c:567:server_setvolume] 0-test2-server: accepted client from cs1-p-25099-2013/06/24-20:43:07:872561-test2-client-0-0 (version: 3.4.0beta3)

root@cs1-p:/var/log/glusterfs/bricks# gluster volume info test2
 
Volume Name: test2
Type: Stripe
Volume ID: fd2a2fc2-c5b3-4241-9e02-e2d936972357
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: cs1-i:/gluster/test2
Brick2: cs2-i:/gluster/test2

root@cs1-p:/var/log/glusterfs/bricks# gluster volume stop test2
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: test2: failed
root@cs1-p:/var/log/glusterfs/bricks# gluster volume info test2
 
Volume Name: test2
Type: Stripe
Volume ID: fd2a2fc2-c5b3-4241-9e02-e2d936972357
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: cs1-i:/gluster/test2
Brick2: cs2-i:/gluster/test2

root@cs1-p:/var/log/glusterfs/bricks# gluster volume stop test2 force
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
[2013-06-24 20:45:00.765789] W [glusterfsd.c:970:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(+0x48650) [0x7feb26855650] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(synctask_wrap+0x12) [0x7feb272581b2] (-->/usr/sbin/glusterfsd(glusterfs_handle_terminate+0x15) [0x7feb276bef55]))) 0-: received signum (15), shutting down
[2013-06-24 20:45:00.765867] W [glusterfsd.c:970:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7feb26900ccd] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7feb26bd3e9a] (-->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xc5) [0x7feb276bcd95]))) 0-: received signum (15), shutting down
volume stop: test2: success
root@cs1-p:/var/log/glusterfs/bricks# gluster volume info test2
 
Volume Name: test2
Type: Stripe
Volume ID: fd2a2fc2-c5b3-4241-9e02-e2d936972357
Status: Stopped
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: cs1-i:/gluster/test2
Brick2: cs2-i:/gluster/test2


Expected results:

The volume should start/stop reliably as it does with TCP volumes.

Additional info: additional debug info seen inline: root@cs1-p:/var/log/glusterfs/bricks# jobs
[1]+  Running                 tail -f gluster-test2.log &

Comment 1 Ryan 2013-07-15 19:26:40 UTC
I have retested this as of 3.4.0 with no status change:

root@cs1-p:~# gluster volume start perftest
volume start: perftest: failed

root@cs1-p:~# gluster volume info perftest
 
Volume Name: perftest
Type: Distributed-Replicate
Volume ID: ef206a76-7b26-4c12-9ccf-b3d250f36403
Status: Started
Number of Bricks: 50 x 2 = 100
Transport-type: rdma
Bricks:
Brick1: cs1-i:/gluster/perftest
Brick2: cs2-i:/gluster/perftest
Brick3: cs3-i:/gluster/perftest
Brick4: cs4-i:/gluster/perftest
Brick5: cs5-i:/gluster/perftest
Brick6: cs6-i:/gluster/perftest
.....
Brick97: cs97-i:/gluster/perftest
Brick98: cs98-i:/gluster/perftest
Brick99: cs99-i:/gluster/perftest
Brick100: cs100-i:/gluster/perftest

[2013-07-15 19:11:32.243142] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.4.0 (/usr/sbin/glusterfsd -s cs1-i --volfile-id perftest.cs1-i.gluster-perftest -p /var/lib/glusterd/vols/perftest/run/cs1-i-gluster-perftest.pid -S /var/run/5a1d1df34e29c65c2acaf3da6530ec15.socket --brick-name /gluster/perftest -l /var/log/glusterfs/bricks/gluster-perftest.log --xlator-option *-posix.glusterd-uuid=aab4ef7d-6041-45dd-ad05-b25f603ed8e8 --brick-port 49155 --xlator-option perftest-server.listen-port=49155)
[2013-07-15 19:11:32.244396] I [socket.c:3480:socket_init] 0-socket.glusterfsd: SSL support is NOT enabled
[2013-07-15 19:11:32.244431] I [socket.c:3495:socket_init] 0-socket.glusterfsd: using system polling thread
[2013-07-15 19:11:32.244529] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
[2013-07-15 19:11:32.244543] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
[2013-07-15 19:11:32.246864] I [graph.c:239:gf_add_cmdline_options] 0-perftest-server: adding option 'listen-port' for volume 'perftest-server' with value '49155'
[2013-07-15 19:11:32.246883] I [graph.c:239:gf_add_cmdline_options] 0-perftest-posix: adding option 'glusterd-uuid' for volume 'perftest-posix' with value 'aab4ef7d-6041-45dd-ad05-b25f603ed8e8'
[2013-07-15 19:11:32.247980] W [options.c:848:xl_opt_validate] 0-perftest-server: option 'listen-port' is deprecated, preferred is 'transport.rdma.listen-port', continuing with correction
Given volfile:
+------------------------------------------------------------------------------+
  1: volume perftest-posix
  2:     type storage/posix
  3:     option volume-id ef206a76-7b26-4c12-9ccf-b3d250f36403
  4:     option directory /gluster/perftest
  5: end-volume
  6: 
  7: volume perftest-access-control
  8:     type features/access-control
  9:     subvolumes perftest-posix
 10: end-volume
 11: 
 12: volume perftest-locks
 13:     type features/locks
 14:     subvolumes perftest-access-control
 15: end-volume
 16: 
 17: volume perftest-io-threads
 18:     type performance/io-threads
 19:     subvolumes perftest-locks
 20: end-volume
 21: 
 22: volume perftest-index
 23:     type features/index
 24:     option index-base /gluster/perftest/.glusterfs/indices
 25:     subvolumes perftest-io-threads
 26: end-volume
 27: 
 28: volume perftest-marker
 29:     type features/marker
 30:     option quota off
 31:     option xtime off
 32:     option timestamp-file /var/lib/glusterd/vols/perftest/marker.tstamp
 33:     option volume-uuid ef206a76-7b26-4c12-9ccf-b3d250f36403
 34:     subvolumes perftest-index
 35: end-volume
 36: 
 37: volume /gluster/perftest
 38:     type debug/io-stats
 39:     option count-fop-hits off
 40:     option latency-measurement off
 41:     subvolumes perftest-marker
 42: end-volume
 43: 
 44: volume perftest-server
 45:     type protocol/server
 46:     option auth.addr./gluster/perftest.allow *
 47:     option auth.login.545778aa-6feb-49ba-9017-19405eb5e70c.password 8d3617ca-4203-4a87-a53a-72432673775c
 48:     option auth.login./gluster/perftest.allow 545778aa-6feb-49ba-9017-19405eb5e70c
 49:     option transport-type rdma
 50:     subvolumes /gluster/perftest
 51: end-volume

+------------------------------------------------------------------------------+
[2013-07-15 19:11:34.599269] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs1-p-18307-2013/07/12-19:12:34:753662-perftest-client-0-0 (version: 3.4.0beta4)
[2013-07-15 19:11:42.360334] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs71-p-24042-2013/07/15-19:11:37:89939-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:11:43.698185] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs93-p-3539-2013/07/15-19:11:37:645851-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:11:46.391855] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs1-p-1346-2013/07/15-19:11:35:213516-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:11:48.158220] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs33-p-22368-2013/07/15-19:11:37:110944-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:11:48.899066] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs90-p-5022-2013/07/15-19:11:37:162206-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:11:51.235350] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs30-p-29423-2013/07/15-19:11:37:641448-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:11:52.902307] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs41-p-15820-2013/07/15-19:11:38:124524-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:11:53.083115] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs64-p-13864-2013/07/15-19:11:37:195751-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:11:54.086491] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs51-p-2182-2013/07/15-19:11:37:184898-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:11:55.154370] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs50-p-16867-2013/07/15-19:11:38:120286-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:12:08.537211] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs50-p-16860-2013/07/15-19:11:37:90255-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:12:10.272735] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs87-p-21820-2013/07/15-19:11:36:716770-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:12:14.991785] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs28-p-28721-2013/07/15-19:11:36:619289-perftest-client-0-0 (version: 3.4.0)
[2013-07-15 19:12:32.321067] I [server-handshake.c:567:server_setvolume] 0-perftest-server: accepted client from cs1-p-1334-2013/07/15-19:11:34:182573-perftest-client-0-0 (version: 3.4.0)

Comment 2 Kaleb KEITHLEY 2014-10-14 13:13:26 UTC
RDMA on IB works in 3.5. Please upgrade to a current 3.5 release if you need RDMA.