1402687 – "gluster get-state" is capturing the port number for the stopped state brick process.

Bug 1402687 - "gluster get-state" is capturing the port number for the stopped state brick process.

Summary: "gluster get-state" is capturing the port number for the stopped state brick ...

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Atin Mukherjee
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:	USM-Gluster integration
Depends On:
Blocks:	1402688
TreeView+	depends on / blocked

Reported:	2016-12-08 07:11 UTC by Byreddy
Modified:	2017-08-30 05:26 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1402688 (view as bug list)
Environment:
Last Closed:	2017-08-09 09:55:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Byreddy 2016-12-08 07:11:31 UTC

Description of problem:
=======================
"gluster get-state" is capturing the port number for the stopped state brick process.

Volume1.Brick1.path: 10.70.41.198:/bricks/brick0/q0
Volume1.Brick1.hostname: 10.70.41.198
Volume1.Brick1.port: 49152         <==========
Volume1.Brick1.rdma_port: 0
Volume1.Brick1.status: Stopped   <===========
Volume1.Brick1.signedin: False
Volume1.Brick2.path: 10.70.41.217:/bricks/brick0/q1
Volume1.Brick2.hostname: 10.70.41.217
Volume1.Brick3.path: 10.70.41.198:/bricks/brick1/q2
Volume1.Brick3.hostname: 10.70.41.198
Volume1.Brick3.port: 49153
Volume1.Brick3.rdma_port: 0
Volume1.Brick3.status: Stopped
Volume1.Brick3.signedin: False
Volume1.Brick4.path: 10.70.41.217:/bricks/brick1/q3
Volume1.Brick4.hostname: 10.70.41.217 


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-7.el7rhgs.x86_64

How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have two node cluster
2. Create a 2 *2 volume and start it
3. enable the server side quorum
4. stop glusterd in one of the cluster node.
5. take the gluster local state using "gluster get-state" and check for the local bricks status and port details 

Actual results:
===============
"gluster get-state" is capturing the port number for the stopped state brick process.

Expected results:
=================
port details  should not show if brick process is in stopped state.


Additional info:

Comment 2 Atin Mukherjee 2016-12-08 07:17:52 UTC

upstream mainline patch http://review.gluster.org/#/c/16064 posted for review.

Comment 3 Atin Mukherjee 2016-12-08 13:01:28 UTC

Initially I thought that we can reset the port value to 0 in case a rpc disconnect is received. But it looks like we can't reset the port value to 0, if we end up with stale port entries (in case of abrupt shutdown of daemons when pmap_signout will not be received by glusterd), having ports reset to 0 will not help in cleaning up the entries. It looks like we have to live with this problem.

Comment 4 Atin Mukherjee 2016-12-08 13:06:04 UTC

And I have to say here this is not a flaw in gluster get-state CLI rather in glusterd itself. gluster get-state CLI picks up the data from in memory which still has the last port value allocated. I'd like to close this bug as won't fix. Let me know your thoughts.

Comment 5 Byreddy 2016-12-09 03:33:23 UTC

(In reply to Atin Mukherjee from comment #4)
> And I have to say here this is not a flaw in gluster get-state CLI rather in
> glusterd itself. gluster get-state CLI picks up the data from in memory
> which still has the last port value allocated. I'd like to close this bug as
> won't fix. Let me know your thoughts.

I have few questions before closing this one:

1)When server side quorum is not met, glusterd will stop the bricks processes gracefully right? if yes, then why the last allocated port details will be there in memory in this case? 


2) How USM will behave/understand if it gets the port numbers for the stopped state brick processes?

Comment 6 Atin Mukherjee 2016-12-09 04:59:21 UTC

(In reply to Byreddy from comment #5)
> (In reply to Atin Mukherjee from comment #4)
> > And I have to say here this is not a flaw in gluster get-state CLI rather in
> > glusterd itself. gluster get-state CLI picks up the data from in memory
> > which still has the last port value allocated. I'd like to close this bug as
> > won't fix. Let me know your thoughts.
> 
> I have few questions before closing this one:
> 
> 1)When server side quorum is not met, glusterd will stop the bricks
> processes gracefully right? if yes, then why the last allocated port details
> will be there in memory in this case? 

As I said earlier be it a graceful or abrupt shutdown we don't reset the port back to 0. There is no way for glusterd to understand if its a graceful shutdown or abrupt shutdown. On graceful shutdown glusterd receives pmap_signout which cleans up the portmap entry and in that case resetting the port to 0 has no issues, but if its an abrupt shutdown there is no way to clean up the stale entry and that's the reason when the process is brought back we first check if the last port allocated for it has been removed from the portmap entry and this is where the fix which I sent in will go for a toss as the we will be unable to clean up in this case. So server side quorum is of irrelevance in this case.

> 
> 
> 2) How USM will behave/understand if it gets the port numbers for the
> stopped state brick processes?

USM/tendrl will filter it out just like gluster volume status CLI does. gluster volume status never shows the brick port if the brick process is not running.

Hope this clarifies your question.

Comment 7 Byreddy 2016-12-09 05:30:43 UTC

(In reply to Atin Mukherjee from comment #6)
> (In reply to Byreddy from comment #5)
> > (In reply to Atin Mukherjee from comment #4)
> > > And I have to say here this is not a flaw in gluster get-state CLI rather in
> > > glusterd itself. gluster get-state CLI picks up the data from in memory
> > > which still has the last port value allocated. I'd like to close this bug as
> > > won't fix. Let me know your thoughts.
> > 
> > I have few questions before closing this one:
> > 
> > 1)When server side quorum is not met, glusterd will stop the bricks
> > processes gracefully right? if yes, then why the last allocated port details
> > will be there in memory in this case? 
> 
> As I said earlier be it a graceful or abrupt shutdown we don't reset the
> port back to 0. There is no way for glusterd to understand if its a graceful
> shutdown or abrupt shutdown. On graceful shutdown glusterd receives
> pmap_signout which cleans up the portmap entry and in that case resetting
> the port to 0 has no issues, but if its an abrupt shutdown there is no way
> to clean up the stale entry and that's the reason when the process is
> brought back we first check if the last port allocated for it has been
> removed from the portmap entry and this is where the fix which I sent in
> will go for a toss as the we will be unable to clean up in this case. So
> server side quorum is of irrelevance in this case.
> 
> > 
I done some testing on graceful and abrupt shutdown of brick process, got the below result, for graceful shutdown, brick signedin is setting to false. I am OK  for abrupt shutdown of brick process to have port number BUT ***for graceful shutdown, port should have zero instead of old port allocated*** 

For abrupt shutdown of brick:
Volume1.Brick1.port: 49152
Volume1.Brick1.rdma_port: 0
Volume1.Brick1.status: Stopped
Volume1.Brick1.signedin: True    <====

For graceful shutdown of brick:
Volume1.Brick1.port: 49152
Volume1.Brick1.rdma_port: 0
Volume1.Brick1.status: Stopped
Volume1.Brick1.signedin: False  <=======
 



> > 
> > 2) How USM will behave/understand if it gets the port numbers for the
> > stopped state brick processes?
> 
> USM/tendrl will filter it out just like gluster volume status CLI does.
> gluster volume status never shows the brick port if the brick process is not
> running.
> 
> Hope this clarifies your question.

Comment 12 Gaurav Yadav 2017-08-09 09:55:14 UTC

Fix will be available in next release.

Note You need to log in before you can comment on or make changes to this bug.