Description of problem: ======================= "gluster get-state" is capturing the port number for the stopped state brick process. Volume1.Brick1.path: 10.70.41.198:/bricks/brick0/q0 Volume1.Brick1.hostname: 10.70.41.198 Volume1.Brick1.port: 49152 <========== Volume1.Brick1.rdma_port: 0 Volume1.Brick1.status: Stopped <=========== Volume1.Brick1.signedin: False Volume1.Brick2.path: 10.70.41.217:/bricks/brick0/q1 Volume1.Brick2.hostname: 10.70.41.217 Volume1.Brick3.path: 10.70.41.198:/bricks/brick1/q2 Volume1.Brick3.hostname: 10.70.41.198 Volume1.Brick3.port: 49153 Volume1.Brick3.rdma_port: 0 Volume1.Brick3.status: Stopped Volume1.Brick3.signedin: False Volume1.Brick4.path: 10.70.41.217:/bricks/brick1/q3 Volume1.Brick4.hostname: 10.70.41.217 Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.8.4-7.el7rhgs.x86_64 How reproducible: ================= Always Steps to Reproduce: =================== 1. Have two node cluster 2. Create a 2 *2 volume and start it 3. enable the server side quorum 4. stop glusterd in one of the cluster node. 5. take the gluster local state using "gluster get-state" and check for the local bricks status and port details Actual results: =============== "gluster get-state" is capturing the port number for the stopped state brick process. Expected results: ================= port details should not show if brick process is in stopped state. Additional info:
upstream mainline patch http://review.gluster.org/#/c/16064 posted for review.
Initially I thought that we can reset the port value to 0 in case a rpc disconnect is received. But it looks like we can't reset the port value to 0, if we end up with stale port entries (in case of abrupt shutdown of daemons when pmap_signout will not be received by glusterd), having ports reset to 0 will not help in cleaning up the entries. It looks like we have to live with this problem.
And I have to say here this is not a flaw in gluster get-state CLI rather in glusterd itself. gluster get-state CLI picks up the data from in memory which still has the last port value allocated. I'd like to close this bug as won't fix. Let me know your thoughts.
(In reply to Atin Mukherjee from comment #4) > And I have to say here this is not a flaw in gluster get-state CLI rather in > glusterd itself. gluster get-state CLI picks up the data from in memory > which still has the last port value allocated. I'd like to close this bug as > won't fix. Let me know your thoughts. I have few questions before closing this one: 1)When server side quorum is not met, glusterd will stop the bricks processes gracefully right? if yes, then why the last allocated port details will be there in memory in this case? 2) How USM will behave/understand if it gets the port numbers for the stopped state brick processes?
(In reply to Byreddy from comment #5) > (In reply to Atin Mukherjee from comment #4) > > And I have to say here this is not a flaw in gluster get-state CLI rather in > > glusterd itself. gluster get-state CLI picks up the data from in memory > > which still has the last port value allocated. I'd like to close this bug as > > won't fix. Let me know your thoughts. > > I have few questions before closing this one: > > 1)When server side quorum is not met, glusterd will stop the bricks > processes gracefully right? if yes, then why the last allocated port details > will be there in memory in this case? As I said earlier be it a graceful or abrupt shutdown we don't reset the port back to 0. There is no way for glusterd to understand if its a graceful shutdown or abrupt shutdown. On graceful shutdown glusterd receives pmap_signout which cleans up the portmap entry and in that case resetting the port to 0 has no issues, but if its an abrupt shutdown there is no way to clean up the stale entry and that's the reason when the process is brought back we first check if the last port allocated for it has been removed from the portmap entry and this is where the fix which I sent in will go for a toss as the we will be unable to clean up in this case. So server side quorum is of irrelevance in this case. > > > 2) How USM will behave/understand if it gets the port numbers for the > stopped state brick processes? USM/tendrl will filter it out just like gluster volume status CLI does. gluster volume status never shows the brick port if the brick process is not running. Hope this clarifies your question.
(In reply to Atin Mukherjee from comment #6) > (In reply to Byreddy from comment #5) > > (In reply to Atin Mukherjee from comment #4) > > > And I have to say here this is not a flaw in gluster get-state CLI rather in > > > glusterd itself. gluster get-state CLI picks up the data from in memory > > > which still has the last port value allocated. I'd like to close this bug as > > > won't fix. Let me know your thoughts. > > > > I have few questions before closing this one: > > > > 1)When server side quorum is not met, glusterd will stop the bricks > > processes gracefully right? if yes, then why the last allocated port details > > will be there in memory in this case? > > As I said earlier be it a graceful or abrupt shutdown we don't reset the > port back to 0. There is no way for glusterd to understand if its a graceful > shutdown or abrupt shutdown. On graceful shutdown glusterd receives > pmap_signout which cleans up the portmap entry and in that case resetting > the port to 0 has no issues, but if its an abrupt shutdown there is no way > to clean up the stale entry and that's the reason when the process is > brought back we first check if the last port allocated for it has been > removed from the portmap entry and this is where the fix which I sent in > will go for a toss as the we will be unable to clean up in this case. So > server side quorum is of irrelevance in this case. > > > I done some testing on graceful and abrupt shutdown of brick process, got the below result, for graceful shutdown, brick signedin is setting to false. I am OK for abrupt shutdown of brick process to have port number BUT ***for graceful shutdown, port should have zero instead of old port allocated*** For abrupt shutdown of brick: Volume1.Brick1.port: 49152 Volume1.Brick1.rdma_port: 0 Volume1.Brick1.status: Stopped Volume1.Brick1.signedin: True <==== For graceful shutdown of brick: Volume1.Brick1.port: 49152 Volume1.Brick1.rdma_port: 0 Volume1.Brick1.status: Stopped Volume1.Brick1.signedin: False <======= > > > > 2) How USM will behave/understand if it gets the port numbers for the > > stopped state brick processes? > > USM/tendrl will filter it out just like gluster volume status CLI does. > gluster volume status never shows the brick port if the brick process is not > running. > > Hope this clarifies your question.
Fix will be available in next release.