Bug 1004650

Summary: glusterd : 'gluster volume status <vol_name>' is showing 'N/A' under Port column for all volumes. - same result after gluster volume start <vol_name> force
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: glusterdAssignee: Avra Sengupta <asengupt>
Status: CLOSED CURRENTRELEASE QA Contact: amainkar
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.1CC: amukherj, asengupt, dblack, jcastillo, nsathyan, rhs-bugs, sasundar, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1175700 (view as bug list) Environment:
Last Closed: 2015-04-17 07:11:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1175700    

Description Rachana Patel 2013-09-05 07:03:52 UTC
Description of problem:
glusterd : 'gluster volume status <vol_name>' is showing 'N/A' under Port column for all volumes. - same result after gluster volume start <vol_name> force.
stopping and starting all volume again shows Port number

Version-Release number of selected component (if applicable):
3.4.0.30rhs-2.el6_4.x86_64

How reproducible:
not always

Steps to Reproduce:
server :- 10.70.37.66

[root@DHT2 ~]# gluster v status
Status of volume: dht
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.66:/rhs/brick1/d1			N/A	Y1151
Brick 10.70.37.66:/rhs/brick1/d2			N/A	Y1158
Brick 10.70.37.195:/rhs/brick1/d4			49158	Y5496
NFS Server on localhost					2049	Y1305
NFS Server on 10.70.37.195				2049	Y28732
 
There are no active volume tasks
Status of volume: sanity
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.195:/rhs/brick1/s1			49155	Y4422
Brick 10.70.37.195:/rhs/brick1/s2			49156	Y4433
Brick 10.70.37.195:/rhs/brick1/s3			49157	Y4444
Brick 10.70.37.66:/rhs/brick1/s1			N/A	Y1173
Brick 10.70.37.66:/rhs/brick1/s2			N/A	Y1178
NFS Server on localhost					2049	Y1305
NFS Server on 10.70.37.195				2049	Y28732
 
There are no active volume tasks
Status of volume: test1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.195:/rhs/brick1/t1			49159	Y19367
Brick 10.70.37.195:/rhs/brick1/t2			49160	Y17821
Brick 10.70.37.66:/rhs/brick1/t1			N/A	Y1167
NFS Server on localhost					2049	Y1305
NFS Server on 10.70.37.195				2049	Y28732
 
There are no active volume tasks

---> start force volume

[root@DHT2 ~]# gluster volume start dht force
volume start: dht: success
[root@DHT2 ~]# gluster v status dht
Status of volume: dht
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.66:/rhs/brick1/d1			N/A	Y	1151
Brick 10.70.37.66:/rhs/brick1/d2			N/A	Y	1158
Brick 10.70.37.195:/rhs/brick1/d4			49158	Y	5496
NFS Server on localhost					2049	Y	1768
NFS Server on 10.70.37.195				2049	Y	29057
 
There are no active volume tasks
[root@DHT2 ~]# gluster volume start test1 force
volume start: test1: success
[root@DHT2 ~]# gluster v status test1
Status of volume: test1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.195:/rhs/brick1/t1			49159	Y	19367
Brick 10.70.37.195:/rhs/brick1/t2			49160	Y	17821
Brick 10.70.37.66:/rhs/brick1/t1			N/A	Y	1167
NFS Server on localhost					2049	Y	1828
NFS Server on 10.70.37.195				2049	Y	29098
 
There are no active volume tasks


---> stop and start again
[root@DHT2 ~]# gluster volume stop dht
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: dht: success
[root@DHT2 ~]# gluster volume start dht
volume start: dht: success
[root@DHT2 ~]# gluster volume status dht
Status of volume: dht
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.66:/rhs/brick1/d1			49159	Y	2080
Brick 10.70.37.66:/rhs/brick1/d2			49160	Y	2091
Brick 10.70.37.195:/rhs/brick1/d4			49158	Y	29257
NFS Server on localhost					2049	Y	2104
NFS Server on 10.70.37.195				2049	Y	29269
 
There are no active volume tasks


Actual results:
 showing 'N/A' under Port column for all volumes. - same result after gluster volume start <vol_name> force.
stopping and starting all volume again shows Port number

Expected results:
status should show Port numbers

Additional info:
log snippet :- (start force)

[2013-09-05 02:39:46.025972] E [glusterd-utils.c:3797:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/fcdb25f1afb31c273d9517c141c6a7de.socket error: Permission denied
[2013-09-05 02:39:46.026730] I [glusterd-utils.c:3831:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully
[2013-09-05 02:39:46.027206] I [glusterd-utils.c:3836:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully
[2013-09-05 02:39:46.027645] I [glusterd-utils.c:3841:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully
[2013-09-05 02:39:46.027952] I [glusterd-utils.c:3846:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully
[2013-09-05 02:39:46.028244] I [glusterd-utils.c:3851:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully
[2013-09-05 02:39:46.028583] I [glusterd-utils.c:3856:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully
[2013-09-05 02:39:46.036150] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-09-05 02:39:46.036339] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled
[2013-09-05 02:39:46.036372] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-09-05 02:39:46.037496] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0
[2013-09-05 02:39:46.037573] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2013-09-05 02:39:46.037742] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-09-05 02:39:46.044595] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=dht --first=no
[2013-09-05 02:39:46.067226] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=dht --first=no
[2013-09-05 02:40:12.237981] I [glusterd-handler.c:3498:__glusterd_handle_status_volume] 0-management: Received status volume req for volume dht






[2013-09-05 02:40:31.950070] E [glusterd-utils.c:3797:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/fcdb25f1afb31c273d9517c141c6a7de.socket error: Permission denied
[2013-09-05 02:40:31.950664] I [glusterd-utils.c:3831:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully
[2013-09-05 02:40:31.950977] I [glusterd-utils.c:3836:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully
[2013-09-05 02:40:31.951272] I [glusterd-utils.c:3841:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully
[2013-09-05 02:40:31.951559] I [glusterd-utils.c:3846:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully
[2013-09-05 02:40:31.951727] I [glusterd-utils.c:3851:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully
[2013-09-05 02:40:31.951892] I [glusterd-utils.c:3856:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully
[2013-09-05 02:40:31.959398] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0
[2013-09-05 02:40:31.959537] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-09-05 02:40:31.959563] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2013-09-05 02:40:31.959773] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled
[2013-09-05 02:40:31.959824] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-09-05 02:40:31.960275] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now


stop and start:-
[2013-09-05 02:43:19.740118] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh --volname=dht --last=no
[2013-09-05 02:43:19.773990] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=dht --last=no
[2013-09-05 02:43:19.774297] E [glusterd-utils.c:1337:glusterd_brick_unlink_socket_file] 0-management: Failed to remove /var/run/8c6feb351ffd74009e694bc1147e8a8c.socket error: Permission denied
[2013-09-05 02:43:19.774548] E [glusterd-utils.c:1337:glusterd_brick_unlink_socket_file] 0-management: Failed to remove /var/run/4136d34e8b779dd3af69cb621602bfd5.socket error: Permission denied
[2013-09-05 02:43:20.872900] E [glusterd-utils.c:3797:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/fcdb25f1afb31c273d9517c141c6a7de.socket error: Permission denied
[2013-09-05 02:43:20.873475] I [glusterd-utils.c:3831:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully
[2013-09-05 02:43:20.873828] I [glusterd-utils.c:3836:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully
[2013-09-05 02:43:20.874130] I [glusterd-utils.c:3841:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully
[2013-09-05 02:43:20.874497] I [glusterd-utils.c:3846:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully
[2013-09-05 02:43:20.874755] I [glusterd-utils.c:3851:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully
[2013-09-05 02:43:20.875065] I [glusterd-utils.c:3856:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully
[2013-09-05 02:43:20.881908] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-09-05 02:43:20.882011] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled
[2013-09-05 02:43:20.882027] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-09-05 02:43:20.888250] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=1 total=1
[2013-09-05 02:43:20.888377] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=1 total=1
[2013-09-05 02:43:20.893408] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=1 total=1
[2013-09-05 02:43:20.893459] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=1 total=1
[2013-09-05 02:43:20.899042] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /rhs/brick1/d2 on port 49153
[2013-09-05 02:43:20.900470] I [glusterd-pmap.c:271:pmap_registry_remove] 0-pmap: removing brick /rhs/brick1/d1 on port 49152
[2013-09-05 02:43:20.901604] W [socket.c:522:__socket_rwv] 0-management: readv on /var/run/8c6feb351ffd74009e694bc1147e8a8c.socket failed (No data available)
[2013-09-05 02:43:20.903824] W [socket.c:522:__socket_rwv] 0-management: readv on /var/run/4136d34e8b779dd3af69cb621602bfd5.socket failed (No data available)
[2013-09-05 02:43:20.905946] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0
[2013-09-05 02:43:20.906000] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2013-09-05 02:43:20.906132] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-09-05 02:43:21.106485] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-09-05 02:43:21.108224] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d1
[2013-09-05 02:43:21.110769] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-09-05 02:43:21.113016] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d2
[2013-09-05 02:43:24.106998] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-09-05 02:43:24.109136] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d1
[2013-09-05 02:43:24.110929] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-09-05 02:43:24.112517] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d2



[2013-09-05 02:43:27.107210] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-09-05 02:43:27.109140] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d1
[2013-09-05 02:43:27.112126] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-09-05 02:43:27.114104] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d2
[2013-09-05 02:43:30.108076] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-09-05 02:43:30.110269] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d1
[2013-09-05 02:43:30.112422] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-09-05 02:43:30.114172] I [glusterd-handler.c:3681:__glusterd_brick_rpc_notify] 0-management: Disconnected from 10.70.37.66:/rhs/brick1/d2
[2013-09-05 02:43:32.023855] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /rhs/brick1/d1 on port 49159
[2013-09-05 02:43:32.025852] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-09-05 02:43:32.026039] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled
[2013-09-05 02:43:32.026115] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-09-05 02:43:32.059390] I [glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick /rhs/brick1/d2 on port 49160
[2013-09-05 02:43:32.061535] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-09-05 02:43:32.061780] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled
[2013-09-05 02:43:32.061850] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-09-05 02:43:33.142722] E [glusterd-utils.c:3797:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/fcdb25f1afb31c273d9517c141c6a7de.socket error: Permission denied
[2013-09-05 02:43:33.143240] I [glusterd-utils.c:3831:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV3 successfully
[2013-09-05 02:43:33.143648] I [glusterd-utils.c:3836:glusterd_nfs_pmap_deregister] 0-: De-registered MOUNTV1 successfully
[2013-09-05 02:43:33.143988] I [glusterd-utils.c:3841:glusterd_nfs_pmap_deregister] 0-: De-registered NFSV3 successfully
[2013-09-05 02:43:33.144281] I [glusterd-utils.c:3846:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v4 successfully
[2013-09-05 02:43:33.144615] I [glusterd-utils.c:3851:glusterd_nfs_pmap_deregister] 0-: De-registered NLM v1 successfully
[2013-09-05 02:43:33.144855] I [glusterd-utils.c:3856:glusterd_nfs_pmap_deregister] 0-: De-registered ACL v3 successfully
[2013-09-05 02:43:33.153140] I [rpc-clnt.c:974:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2013-09-05 02:43:33.153258] I [socket.c:3487:socket_init] 0-management: SSL support is NOT enabled
[2013-09-05 02:43:33.153283] I [socket.c:3502:socket_init] 0-management: using system polling thread
[2013-09-05 02:43:33.158507] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=2236 max=0 total=0
[2013-09-05 02:43:33.158596] I [mem-pool.c:539:mem_pool_destroy] 0-management: size=124 max=0 total=0
[2013-09-05 02:43:33.161343] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=dht --first=no
[2013-09-05 02:43:33.163744] I [socket.c:2237:socket_event_handler] 0-transport: disconnecting now
[2013-09-05 02:43:33.190595] I [run.c:190:runner_log] 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=dht --first=no



[2013-09-05 02:43:44.334751] I [glusterd-handler.c:3498:__glusterd_handle_status_volume] 0-management: Received status volume req for volume dht

Comment 3 Jose Castillo 2014-12-17 15:47:30 UTC
I have a customer that had a similar problem, and I managed to reproduce it in the labs. I found out that this may have been solved in bz923164 with the change http://review.gluster.org/#/c/6786/ "glusterd: Fix race in pid file update", so I proceeded to backport that change to a test package based on glusterfs*-3.4.0.59rhs-1.el6rhs in RHS 2.1 and with the test package I created, the problem hasn't appeared any more. 

The errata from bz923164 is https://rhn.redhat.com/errata/RHEA-2014-1278.html and was released only for RHS 3.0. If more testing proves that the fix resolves the issue in RHS 2.x, would it be possible to backport this fix?

Comment 4 Avra Sengupta 2014-12-18 11:01:33 UTC
This behavior is also seen in scenarios where glusterd is down, and the bricks are up. Following are the steps to reproduce it.

1. "pkill glusterd" on one node.
2. Perform any volume set operation from another node.
#gluster volume set test_vol diagnostics.brick-log-level DEBUG
3. Now bring back glusterd on that node.
#service glusterd start
4. Check volume status. The port for the brick(s) hosted on that node will show N/A.

This happens because while importing new volume information from another node, even though we retain the already running bricks that are still part of the volume, we don't retain their port number.

Comment 5 Atin Mukherjee 2015-04-02 04:16:54 UTC
The fix is already into upstream with http://review.gluster.org/9297
BZ should be moved to 'Modified' the moment branching takes place.