+++ This bug was initially created as a clone of Bug #1726219 +++ Description of problem: Volume info o/p is not consistent across the cluster, output from two nodes says volume is in stopped state, whereas one node says volume is in start state. Node1: [root@dhcp35-50 ~]# gluster v info test3 Volume Name: test3 Type: Replicate Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.35.50:/bricks/brick1/tes3 Brick2: 10.70.46.216:/bricks/brick1/tes3 Brick3: 10.70.46.132:/bricks/brick1/tes3 Options Reconfigured: transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off [root@dhcp35-50 ~]# gluster v status test3 Staging failed on 10.70.46.216. Error: Volume test3 is not started Staging failed on 10.70.46.132. Error: Volume test3 is not started Node 2: [root@dhcp46-216 ~]# gluster v info test3 | egrep 'Volume ID|Status' Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983 Status: Stopped Node3: [root@dhcp46-132 ~]# gluster v info test3 | egrep 'Volume ID|Status' Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983 Status: Stopped ================================================== Version-Release number of selected component (if applicable): How reproducible: 2/2 Steps to Reproduce: 1. Create 2 replica 3 vols 2. Stop 1 volume, execute command on node 1 (35.50) [root@dhcp35-50 ~]# gluster v stop test3 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: test3: success 3. Kill shd on one node kill -15 5928 4. Check #gluster v info from all 3 nodes Volume is in stopped state, as seen from o/p of all three nodes 5. Now start volume from node 1 # gluster v start test3 volume start: test3: failed: Commit failed on localhost. Please check log file for details. O/p says volume start failed. 6. Now check vol info o/p on all three nodes Node1: [root@dhcp35-50 ~]# gluster v info test3 | egrep 'Volume ID|Status' Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983 Status: Started Node2: [root@dhcp46-216 ~]# gluster v info test3 | egrep 'Volume ID|Status' Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983 Status: Stopped Node3: [root@dhcp46-132 ~]# gluster v info test3 | egrep 'Volume ID|Status' Volume ID: 11e30537-ce20-42d6-8a5e-a2668dc6b983 Status: Stopped Actual results: As described above in Steps to reproduce Expected results: 1. Volume should start without any error (confirmed that volume starts in older release (glusterfs-fuse-3.12.2-47.2.el7rhgs.x86_64) 2. Command o/p should be consistent when executed from any nodes, (As all automation cases randomly take any node as master for command execution) 3. Volume start force should bring up shd on a node where it was killed (confirmed on older release glusterfs-fuse-3.12.2-47.2.el7rhgs.x86_64) Additional info: Also there is discrepancy in output of vol status when executed from different nodes. [root@dhcp35-50 ~]# gluster v status test3 Staging failed on 10.70.46.132. Error: Volume test3 is not started Staging failed on 10.70.46.216. Error: Volume test3 is not started [root@dhcp46-132 ~]# gluster v status test3 Volume test3 is not started [root@dhcp46-216 ~]# gluster v status test3 Volume test3 is not started [root@dhcp46-216 ~]# gluster v start test3 force volume start: test3: failed: Commit failed on dhcp35-50.lab.eng.blr.redhat.com. Please check log file for details.
REVIEW: https://review.gluster.org/23007 (glusterd/shd: Return null proc if process is not running.) posted (#2) for review on master by mohammed rafi kc
REVIEW: https://review.gluster.org/23007 (glusterd/shd: Return null proc if process is not running.) merged (#5) on master by Amar Tumballi