Description of problem: glusterd : 'gluster volume status <vol_name>' is showing 'N/A' under Port column for all volumes. - same result after gluster volume start <vol_name> force. stopping and starting all volume again shows Port number Version-Release number of selected component (if applicable): 3.4.0.30rhs-2.el6_4.x86_64 How reproducible: not always Steps to Reproduce: server :- 10.70.37.66 [root@DHT2 ~]# gluster v status Status of volume: dht Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.66:/rhs/brick1/d1 N/A Y1151 Brick 10.70.37.66:/rhs/brick1/d2 N/A Y1158 Brick 10.70.37.195:/rhs/brick1/d4 49158 Y5496 NFS Server on localhost 2049 Y1305 NFS Server on 10.70.37.195 2049 Y28732 There are no active volume tasks Status of volume: sanity Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.195:/rhs/brick1/s1 49155 Y4422 Brick 10.70.37.195:/rhs/brick1/s2 49156 Y4433 Brick 10.70.37.195:/rhs/brick1/s3 49157 Y4444 Brick 10.70.37.66:/rhs/brick1/s1 N/A Y1173 Brick 10.70.37.66:/rhs/brick1/s2 N/A Y1178 NFS Server on localhost 2049 Y1305 NFS Server on 10.70.37.195 2049 Y28732 There are no active volume tasks Status of volume: test1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.195:/rhs/brick1/t1 49159 Y19367 Brick 10.70.37.195:/rhs/brick1/t2 49160 Y17821 Brick 10.70.37.66:/rhs/brick1/t1 N/A Y1167 NFS Server on localhost 2049 Y1305 NFS Server on 10.70.37.195 2049 Y28732 There are no active volume tasks ---> start force volume [root@DHT2 ~]# gluster volume start dht force volume start: dht: success [root@DHT2 ~]# gluster v status dht Status of volume: dht Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.66:/rhs/brick1/d1 N/A Y 1151 Brick 10.70.37.66:/rhs/brick1/d2 N/A Y 1158 Brick 10.70.37.195:/rhs/brick1/d4 49158 Y 5496 NFS Server on localhost 2049 Y 1768 NFS Server on 10.70.37.195 2049 Y 29057 There are no active volume tasks [root@DHT2 ~]# gluster volume start test1 force volume start: test1: success [root@DHT2 ~]# gluster v status test1 Status of volume: test1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.195:/rhs/brick1/t1 49159 Y 19367 Brick 10.70.37.195:/rhs/brick1/t2 49160 Y 17821 Brick 10.70.37.66:/rhs/brick1/t1 N/A Y 1167 NFS Server on localhost 2049 Y 1828 NFS Server on 10.70.37.195 2049 Y 29098 There are no active volume tasks ---> stop and start again [root@DHT2 ~]# gluster volume stop dht Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: dht: success [root@DHT2 ~]# gluster volume start dht volume start: dht: success [root@DHT2 ~]# gluster volume status dht Status of volume: dht Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.66:/rhs/brick1/d1 49159 Y 2080 Brick 10.70.37.66:/rhs/brick1/d2 49160 Y 2091 Brick 10.70.37.195:/rhs/brick1/d4 49158 Y 29257 NFS Server on localhost 2049 Y 2104 NFS Server on 10.70.37.195 2049 Y 29269 There are no active volume tasks Actual results: showing 'N/A' under Port column for all volumes. - same result after gluster volume start <vol_name> force. stopping and starting all volume again shows Port number Expected results: status should show Port numbers Additional info: log snippet :- (start force) [2013-09-05 02:39:46.025972] E [glusterd-utils.c:3797:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/fcdb25f1afb31c273d9517c141c6a7de.socket error: Permission denied [2013-09-05 02:39:46.026730] I [glusterd-utils.c:3831:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully [2013-09-05 02:39:46.027206] I [glusterd-utils.c:3836:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully [2013-09-05 02:39:46.027645] I [glusterd-utils.c:3841:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully [2013-09-05 02:39:46.027952] I [glusterd-utils.c:3846:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully [2013-09-05 02:39:46.028244] I [glusterd-utils.c:3851:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully [2013-09-05 02:39:46.028583] I [glusterd-utils.c:3856:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully [2013-09-05 02:39:46.036150] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-09-05 02:39:46.036339] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled [2013-09-05 02:39:46.036372] I [socket.c:3502:socket_init] 0-management: using system polling thread [2013-09-05 02:39:46.037496] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0 [2013-09-05 02:39:46.037573] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2013-09-05 02:39:46.037742] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-09-05 02:39:46.044595] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=dht --first=no [2013-09-05 02:39:46.067226] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=dht --first=no [2013-09-05 02:40:12.237981] I [glusterd-handler.c:3498:__glusterd_handle_status_volume] 0-management: Received status volume req for volume dht [2013-09-05 02:40:31.950070] E [glusterd-utils.c:3797:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/fcdb25f1afb31c273d9517c141c6a7de.socket error: Permission denied [2013-09-05 02:40:31.950664] I [glusterd-utils.c:3831:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully [2013-09-05 02:40:31.950977] I [glusterd-utils.c:3836:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully [2013-09-05 02:40:31.951272] I [glusterd-utils.c:3841:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully [2013-09-05 02:40:31.951559] I [glusterd-utils.c:3846:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully [2013-09-05 02:40:31.951727] I [glusterd-utils.c:3851:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully [2013-09-05 02:40:31.951892] I [glusterd-utils.c:3856:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully [2013-09-05 02:40:31.959398] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0 [2013-09-05 02:40:31.959537] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-09-05 02:40:31.959563] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2013-09-05 02:40:31.959773] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled [2013-09-05 02:40:31.959824] I [socket.c:3502:socket_init] 0-management: using system polling thread [2013-09-05 02:40:31.960275] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now stop and start:- [2013-09-05 02:43:19.740118] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh --volname=dht --last=no [2013-09-05 02:43:19.773990] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=dht --last=no [2013-09-05 02:43:19.774297] E [glusterd-utils.c:1337:glusterd_brick_unlink_socket_file] 0-management: Failed to remove /var/run/8c6feb351ffd74009e694bc1147e8a8c.socket error: Permission denied [2013-09-05 02:43:19.774548] E [glusterd-utils.c:1337:glusterd_brick_unlink_socket_file] 0-management: Failed to remove /var/run/4136d34e8b779dd3af69cb621602bfd5.socket error: Permission denied [2013-09-05 02:43:20.872900] E [glusterd-utils.c:3797:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/fcdb25f1afb31c273d9517c141c6a7de.socket error: Permission denied [2013-09-05 02:43:20.873475] I [glusterd-utils.c:3831:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully [2013-09-05 02:43:20.873828] I [glusterd-utils.c:3836:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully [2013-09-05 02:43:20.874130] I [glusterd-utils.c:3841:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully [2013-09-05 02:43:20.874497] I [glusterd-utils.c:3846:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully [2013-09-05 02:43:20.874755] I [glusterd-utils.c:3851:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully [2013-09-05 02:43:20.875065] I [glusterd-utils.c:3856:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully [2013-09-05 02:43:20.881908] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-09-05 02:43:20.882011] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled [2013-09-05 02:43:20.882027] I [socket.c:3502:socket_init] 0-management: using system polling thread [2013-09-05 02:43:20.888250] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=1 total=1 [2013-09-05 02:43:20.888377] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=1 total=1 [2013-09-05 02:43:20.893408] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=1 total=1 [2013-09-05 02:43:20.893459] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=1 total=1 [2013-09-05 02:43:20.899042] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /rhs/brick1/d2 on port 49153 [2013-09-05 02:43:20.900470] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /rhs/brick1/d1 on port 49152 [2013-09-05 02:43:20.901604] W [socket.c:522:__socket_rwv] 0-management: readv on /var/run/8c6feb351ffd74009e694bc1147e8a8c.socket failed (No data available) [2013-09-05 02:43:20.903824] W [socket.c:522:__socket_rwv] 0-management: readv on /var/run/4136d34e8b779dd3af69cb621602bfd5.socket failed (No data available) [2013-09-05 02:43:20.905946] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0 [2013-09-05 02:43:20.906000] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2013-09-05 02:43:20.906132] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-09-05 02:43:21.106485] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-09-05 02:43:21.108224] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d1 [2013-09-05 02:43:21.110769] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-09-05 02:43:21.113016] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d2 [2013-09-05 02:43:24.106998] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-09-05 02:43:24.109136] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d1 [2013-09-05 02:43:24.110929] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-09-05 02:43:24.112517] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d2 [2013-09-05 02:43:27.107210] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-09-05 02:43:27.109140] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d1 [2013-09-05 02:43:27.112126] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-09-05 02:43:27.114104] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d2 [2013-09-05 02:43:30.108076] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-09-05 02:43:30.110269] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d1 [2013-09-05 02:43:30.112422] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-09-05 02:43:30.114172] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d2 [2013-09-05 02:43:32.023855] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /rhs/brick1/d1 on port 49159 [2013-09-05 02:43:32.025852] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-09-05 02:43:32.026039] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled [2013-09-05 02:43:32.026115] I [socket.c:3502:socket_init] 0-management: using system polling thread [2013-09-05 02:43:32.059390] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /rhs/brick1/d2 on port 49160 [2013-09-05 02:43:32.061535] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-09-05 02:43:32.061780] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled [2013-09-05 02:43:32.061850] I [socket.c:3502:socket_init] 0-management: using system polling thread [2013-09-05 02:43:33.142722] E [glusterd-utils.c:3797:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/fcdb25f1afb31c273d9517c141c6a7de.socket error: Permission denied [2013-09-05 02:43:33.143240] I [glusterd-utils.c:3831:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully [2013-09-05 02:43:33.143648] I [glusterd-utils.c:3836:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully [2013-09-05 02:43:33.143988] I [glusterd-utils.c:3841:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully [2013-09-05 02:43:33.144281] I [glusterd-utils.c:3846:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully [2013-09-05 02:43:33.144615] I [glusterd-utils.c:3851:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully [2013-09-05 02:43:33.144855] I [glusterd-utils.c:3856:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully [2013-09-05 02:43:33.153140] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2013-09-05 02:43:33.153258] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled [2013-09-05 02:43:33.153283] I [socket.c:3502:socket_init] 0-management: using system polling thread [2013-09-05 02:43:33.158507] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0 [2013-09-05 02:43:33.158596] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0 [2013-09-05 02:43:33.161343] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=dht --first=no [2013-09-05 02:43:33.163744] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now [2013-09-05 02:43:33.190595] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=dht --first=no [2013-09-05 02:43:44.334751] I [glusterd-handler.c:3498:__glusterd_handle_status_volume] 0-management: Received status volume req for volume dht
I have a customer that had a similar problem, and I managed to reproduce it in the labs. I found out that this may have been solved in bz923164 with the change http://review.gluster.org/#/c/6786/ "glusterd: Fix race in pid file update", so I proceeded to backport that change to a test package based on glusterfs*-3.4.0.59rhs-1.el6rhs in RHS 2.1 and with the test package I created, the problem hasn't appeared any more. The errata from bz923164 is https://rhn.redhat.com/errata/RHEA-2014-1278.html and was released only for RHS 3.0. If more testing proves that the fix resolves the issue in RHS 2.x, would it be possible to backport this fix?
This behavior is also seen in scenarios where glusterd is down, and the bricks are up. Following are the steps to reproduce it. 1. "pkill glusterd" on one node. 2. Perform any volume set operation from another node. #gluster volume set test_vol diagnostics.brick-log-level DEBUG 3. Now bring back glusterd on that node. #service glusterd start 4. Check volume status. The port for the brick(s) hosted on that node will show N/A. This happens because while importing new volume information from another node, even though we retain the already running bricks that are still part of the volume, we don't retain their port number.
The fix is already into upstream with http://review.gluster.org/9297 BZ should be moved to 'Modified' the moment branching takes place.