Bug 1637459 - volume status doesnt show bricks and shd from one node while it shows from other nodes
Summary: volume status doesnt show bricks and shd from one node while it shows from ot...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHGS 3.4.z Batch Update 2
Assignee: Sanju
QA Contact: Bala Konda Reddy M
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-09 10:03 UTC by Nag Pavan Chilakam
Modified: 2018-12-17 17:07 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.12.2-27
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-17 17:07:11 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3827 0 None None None 2018-12-17 17:07:27 UTC

Description Nag Pavan Chilakam 2018-10-09 10:03:54 UTC
Description of problem:
=========================
i have landed my testbed to a situation where I dont see bricks info in vol status when issued from one node.
However I am able to see from another node
Also, the shd deamon for all nodes is not seen in vol status from this node , but is seen from another node


problematic node view:
===========================
In below case I dont even see the bricks
root@dhcp35-38 glusterfs]# gluster v status z-rep3-9
Status of volume: z-rep3-9
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Self-heal Daemon on localhost               N/A       N/A        Y       963  
Self-heal Daemon on dhcp35-184.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       19618
Self-heal Daemon on dhcp35-140.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       5480 
 
Task Status of Volume z-rep3-9
------------------------------------------------------------------------------
There are no active volume tasks
 
In below case I don't see shd of remaining 3 nodes ie nodes which don't host bricks

[root@dhcp35-38 glusterfs]# gluster v status y-rep3-8
Status of volume: y-rep3-8
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-140.lab.eng.blr.redhat.com:/gl
uster/brick8/y-rep3-8                       49152     0          Y       3216 
Brick dhcp35-38.lab.eng.blr.redhat.com:/glu
ster/brick8/y-rep3-8                        49153     0          Y       26507
Brick dhcp35-184.lab.eng.blr.redhat.com:/gl
uster/brick8/y-rep3-8                       49152     0          Y       3266 
Self-heal Daemon on localhost               N/A       N/A        Y       963  
Self-heal Daemon on dhcp35-184.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       19618
Self-heal Daemon on dhcp35-140.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       5480 
 
Task Status of Volume y-rep3-8
------------------------------------------------------------------------------
There are no active volume tasks
 

same info fetched from good node:
================================
[root@dhcp35-140 test-scripts]# gluster v status y-rep3-8
Status of volume: y-rep3-8
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-140.lab.eng.blr.redhat.com:/gl
uster/brick8/y-rep3-8                       49152     0          Y       3216
Brick dhcp35-38.lab.eng.blr.redhat.com:/glu
ster/brick8/y-rep3-8                        49153     0          Y       26507
Brick dhcp35-184.lab.eng.blr.redhat.com:/gl
uster/brick8/y-rep3-8                       49152     0          Y       3266
Self-heal Daemon on localhost               N/A       N/A        Y       5480
Self-heal Daemon on dhcp35-218.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       10764
Self-heal Daemon on dhcp35-83.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       22560
Self-heal Daemon on dhcp35-127.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       5931
Self-heal Daemon on dhcp35-38.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       963
Self-heal Daemon on dhcp35-184.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       19618

Task Status of Volume y-rep3-8
------------------------------------------------------------------------------
There are no active volume tasks

[root@dhcp35-140 test-scripts]#  gluster v status z-rep3-9
Status of volume: z-rep3-9
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-83.lab.eng.blr.redhat.com:/glu
ster/brick9/z-rep3-9                        49152     0          Y       3260
Brick dhcp35-127.lab.eng.blr.redhat.com:/gl
uster/brick9/z-rep3-9                       49152     0          Y       3237
Brick dhcp35-218.lab.eng.blr.redhat.com:/gl
uster/brick9/z-rep3-9                       49152     0          Y       3247
Self-heal Daemon on localhost               N/A       N/A        Y       5480
Self-heal Daemon on dhcp35-184.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       19618
Self-heal Daemon on dhcp35-38.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       963
Self-heal Daemon on dhcp35-218.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       10764
Self-heal Daemon on dhcp35-127.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       5931
Self-heal Daemon on dhcp35-83.lab.eng.blr.r
edhat.com                                   N/A       N/A        Y       22560

Task Status of Volume z-rep3-9
------------------------------------------------------------------------------
There are no active volume tasks



peer status shows all connected


Version-Release number of selected component (if applicable):
=======================
3.12.2-18.1


How reproducible:
===============
hit it once


Steps to Reproduce:
1.have created about 22 volumes of 1x3
2. started to pump IOs from 8 fuse clients to 8 of those volumes
3.started creation of ~90 new volumes, start and deleted, did that in about 3 iterations
4. during the start of 4th iteration stopped this process of vol create,deletes
5. during this process, had killed a brick on n2 and was bringing brick up for some vols using start force
6. then did a glusterd restart on n2



Actual results:
==============
seeing wrong vol status when issued from n2

Comment 2 Nag Pavan Chilakam 2018-10-09 11:01:09 UTC
sosreports and gluster-health reports @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1637459

Comment 3 Atin Mukherjee 2018-10-10 03:46:16 UTC
I'd like to highlight that when we write down steps to reproduce, please mention explicitly from which nodes the commands are originated as that's a very crucial data point to start analyzing the issue. Unfortunately it's not clear to me that what exactly has been tested here.

Comment 6 Atin Mukherjee 2018-10-12 03:26:05 UTC
Now I understand that we have a fix posted for this mismatch in caps value at https://review.gluster.org/#/c/21336/ . It'd be worth if you can also update the reproducer steps here, Sanju.

Comment 7 Sanju 2018-10-13 09:09:09 UTC
Reproducing steps:

1. In a cluster of n nodes, create a volume using bricks hosted on any n-1 nodes
2. After volume creation, restart glusterd on any node
3. Check peer status from node, where glusterd has been restarted. Peer which is not hosting any bricks will be in rejected state.

Thanks,
Sanju

Comment 13 errata-xmlrpc 2018-12-17 17:07:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3827


Note You need to log in before you can comment on or make changes to this bug.