Description of problem: ********************************* On an existing nfs-ganesha cluster with one volume I disabled nfs-ganesha and shared_storage and enabled brick multiplexing on the cluster. After enabling BM I created multiple volumes. The new volumes got similar PID except the existing one(which is expected as per devel). Now when I tried to enable shared storage I was not able to enable and it was showing error: Another transaction in progress. After that I enabled shared_storage from the vol file and restarted glusterd. This caused all the volume brick to go offline. I then did gluster vol start force but that did not bring the bricks up. I also disabled brick-multiplexing and enabled again , and restarted glusterd along with volume start force but the bricks doesn't come up. *********************************************************************** Status of volume: vol2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.132:/gluster/brick2/b1 N/A N/A N N/A Brick 10.70.46.128:/gluster/brick2/b2 N/A N/A N N/A Brick 10.70.46.138:/gluster/brick2/b3 N/A N/A N N/A Brick 10.70.46.140:/gluster/brick2/b4 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 5324 Self-heal Daemon on dhcp46-140.lab.eng.blr. redhat.com N/A N/A Y 3096 Self-heal Daemon on dhcp46-128.lab.eng.blr. redhat.com N/A N/A Y 2740 Self-heal Daemon on dhcp46-138.lab.eng.blr. redhat.com N/A N/A Y 2576 Task Status of Volume vol2 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol3 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.132:/gluster/brick3/b1 N/A N/A N N/A Brick 10.70.46.128:/gluster/brick3/b2 N/A N/A N N/A Brick 10.70.46.138:/gluster/brick3/b3 N/A N/A N N/A Brick 10.70.46.140:/gluster/brick3/b4 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 5324 Self-heal Daemon on dhcp46-128.lab.eng.blr. redhat.com N/A N/A Y 2740 Self-heal Daemon on dhcp46-140.lab.eng.blr. redhat.com N/A N/A Y 3096 Self-heal Daemon on dhcp46-138.lab.eng.blr. redhat.com N/A N/A Y 2576 Task Status of Volume vol3 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol4 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.132:/gluster/brick4/b1 N/A N/A N N/A Brick 10.70.46.128:/gluster/brick4/b2 N/A N/A N N/A Brick 10.70.46.138:/gluster/brick4/b3 N/A N/A N N/A Brick 10.70.46.140:/gluster/brick4/b4 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 5324 Self-heal Daemon on dhcp46-140.lab.eng.blr. redhat.com N/A N/A Y 3096 Self-heal Daemon on dhcp46-128.lab.eng.blr. redhat.com N/A N/A Y 2740 Self-heal Daemon on dhcp46-138.lab.eng.blr. redhat.com N/A N/A Y 2576 Task Status of Volume vol4 ------------------------------------------------------------------------------ There are no active volume tasks Version-Release number of selected component (if applicable): glusterfs-3.8.4-22.el7rhgs.x86_64 How reproducible: Tried once Steps to Reproduce: 1.4 node ganesha cluster , create a volume. 2.Disable ganesha, disable shared storage 3.enable brick multiplexing , create multiple volumes 4. Enable shared storage 5. issue seen, then restart glusterd 6. gluster vol start force Actual results: ************************ After glusterd restart all the bricks for the volumes which were created after enabling brick multiplexing went down and never came up. Expected results: *************************** glusterd restart should not make volume bricks to go down. gluster vol start force should bring back the brick up. Additional info: Sosreports to follow
Sosreports available @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1443972/
Looks similar to BZ 1442787
Refer https://bugzilla.redhat.com/show_bug.cgi?id=1443991#c6 for the initial analysis
upstream patch : https://review.gluster.org/#/c/17101/
Upstream patches : https://review.gluster.org/#/q/topic:bug-1444596 Downstream patches: https://code.engineering.redhat.com/gerrit/#/c/105595/ https://code.engineering.redhat.com/gerrit/#/c/105596/
have retried the same on 3.8.4-32. not seeing the issue anymore. hence marking as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774