+++ This bug was initially created as a clone of Bug #1444596 +++ +++ This bug was initially created as a clone of Bug #1443972 +++ Description of problem: ********************************* On an existing nfs-ganesha cluster with one volume I disabled nfs-ganesha and shared_storage and enabled brick multiplexing on the cluster. After enabling BM I created multiple volumes. The new volumes got similar PID except the existing one(which is expected as per devel). Now when I tried to enable shared storage I was not able to enable and it was showing error: Another transaction in progress. After that I enabled shared_storage from the vol file and restarted glusterd. This caused all the volume brick to go offline. I then did gluster vol start force but that did not bring the bricks up. I also disabled brick-multiplexing and enabled again , and restarted glusterd along with volume start force but the bricks doesn't come up. *********************************************************************** Status of volume: vol2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.132:/gluster/brick2/b1 N/A N/A N N/A Brick 10.70.46.128:/gluster/brick2/b2 N/A N/A N N/A Brick 10.70.46.138:/gluster/brick2/b3 N/A N/A N N/A Brick 10.70.46.140:/gluster/brick2/b4 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 5324 Self-heal Daemon on dhcp46-140.lab.eng.blr. redhat.com N/A N/A Y 3096 Self-heal Daemon on dhcp46-128.lab.eng.blr. redhat.com N/A N/A Y 2740 Self-heal Daemon on dhcp46-138.lab.eng.blr. redhat.com N/A N/A Y 2576 Task Status of Volume vol2 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol3 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.132:/gluster/brick3/b1 N/A N/A N N/A Brick 10.70.46.128:/gluster/brick3/b2 N/A N/A N N/A Brick 10.70.46.138:/gluster/brick3/b3 N/A N/A N N/A Brick 10.70.46.140:/gluster/brick3/b4 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 5324 Self-heal Daemon on dhcp46-128.lab.eng.blr. redhat.com N/A N/A Y 2740 Self-heal Daemon on dhcp46-140.lab.eng.blr. redhat.com N/A N/A Y 3096 Self-heal Daemon on dhcp46-138.lab.eng.blr. redhat.com N/A N/A Y 2576 Task Status of Volume vol3 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol4 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.132:/gluster/brick4/b1 N/A N/A N N/A Brick 10.70.46.128:/gluster/brick4/b2 N/A N/A N N/A Brick 10.70.46.138:/gluster/brick4/b3 N/A N/A N N/A Brick 10.70.46.140:/gluster/brick4/b4 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 5324 Self-heal Daemon on dhcp46-140.lab.eng.blr. redhat.com N/A N/A Y 3096 Self-heal Daemon on dhcp46-128.lab.eng.blr. redhat.com N/A N/A Y 2740 Self-heal Daemon on dhcp46-138.lab.eng.blr. redhat.com N/A N/A Y 2576 Task Status of Volume vol4 ------------------------------------------------------------------------------ There are no active volume tasks Version-Release number of selected component (if applicable): glusterfs-3.8.4-22.el7rhgs.x86_64 How reproducible: Tried once Steps to Reproduce: 1.4 node ganesha cluster , create a volume. 2.Disable ganesha, disable shared storage 3.enable brick multiplexing , create multiple volumes 4. Enable shared storage 5. issue seen, then restart glusterd 6. gluster vol start force Actual results: ************************ After glusterd restart all the bricks for the volumes which were created after enabling brick multiplexing went down and never came up. Expected results: *************************** glusterd restart should not make volume bricks to go down. gluster vol start force should bring back the brick up. Additional info: Sosreports to follow --- Additional comment from Red Hat Bugzilla Rules Engine on 2017-04-20 07:16:37 EDT --- This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.3.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from surabhi on 2017-04-20 07:25:37 EDT --- Sosreports available @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1443972/ --- Additional comment from Atin Mukherjee on 2017-04-20 11:25:31 EDT --- Looks similar to BZ 1442787 --- Additional comment from Atin Mukherjee on 2017-04-21 02:37:45 EDT --- Refer https://bugzilla.redhat.com/show_bug.cgi?id=1443991#c6 for the initial analysis --- Additional comment from Worker Ant on 2017-04-22 23:45:48 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd: cli is not showing correct status after restart glusted while mux is on) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-23 00:20:04 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd: cli is not showing correct status after restart glusted while mux is on) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-28 06:31:58 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#3) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-28 06:38:08 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#4) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-29 00:30:47 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#5) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-29 00:44:16 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#6) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-29 00:56:36 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#7) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-29 11:10:21 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#8) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-29 21:32:31 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#9) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-29 21:58:09 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#10) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-30 04:01:11 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#11) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-30 05:15:51 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#12) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-30 08:12:01 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#13) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-30 09:06:26 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#14) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-30 09:20:21 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#15) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-04-30 23:59:00 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#16) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-01 03:31:21 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#17) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-01 08:05:49 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#18) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-02 00:03:39 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#19) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-02 00:57:52 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#20) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-02 03:15:24 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#21) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-02 05:06:51 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#22) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-02 07:02:21 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#23) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-02 12:20:30 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#24) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-03 06:42:02 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd: cli is not showing correct status after restart glusted while mux is on) posted (#25) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-03 14:42:02 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd(WIP): cli is not showing correct status after restart glusted while mux is on) posted (#27) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-05-03 14:42:07 EDT --- REVIEW: https://review.gluster.org/17168 (glusterd: cleanup pidfile on pmap signout) posted (#1) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-05-03 14:58:28 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick-multiplexing) posted (#28) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-05-03 15:02:30 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick-multiplexing) posted (#29) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-05-04 00:16:56 EDT --- REVIEW: https://review.gluster.org/17168 (glusterd: cleanup pidfile on pmap signout) posted (#2) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-05-04 00:17:01 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick-multiplexing) posted (#30) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-05-04 05:55:41 EDT --- REVIEW: https://review.gluster.org/17168 (glusterd: cleanup pidfile on pmap signout) posted (#3) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-05-04 05:55:46 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick-multiplexing feature) posted (#31) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-05-07 13:17:40 EDT --- REVIEW: https://review.gluster.org/17168 (glusterd: cleanup pidfile on pmap signout) posted (#4) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-05-07 13:17:45 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick multiplexing feature) posted (#32) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Worker Ant on 2017-05-08 09:13:44 EDT --- COMMIT: https://review.gluster.org/17168 committed in master by Jeff Darcy (jeff.us) ------ commit 3d35e21ffb15713237116d85711e9cd1dda1688a Author: Atin Mukherjee <amukherj> Date: Wed May 3 12:17:30 2017 +0530 glusterd: cleanup pidfile on pmap signout This patch ensures 1. brick pidfile is cleaned up on pmap signout 2. pmap signout evemt is sent for all the bricks when a brick process shuts down. Change-Id: I7606a60775b484651d4b9743b6037b40323931a2 BUG: 1444596 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: https://review.gluster.org/17168 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Jeff Darcy <jeff.us> --- Additional comment from Worker Ant on 2017-05-08 10:00:47 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick multiplexing feature) posted (#33) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-08 10:18:35 EDT --- REVIEW: https://review.gluster.org/17101 (glusterd: socketfile & pidfile related fixes for brick multiplexing feature) posted (#34) for review on master by MOHIT AGRAWAL (moagrawa) --- Additional comment from Worker Ant on 2017-05-08 21:30:05 EDT --- COMMIT: https://review.gluster.org/17101 committed in master by Atin Mukherjee (amukherj) ------ commit 21c7f7baccfaf644805e63682e5a7d2a9864a1e6 Author: Mohit Agrawal <moagrawa> Date: Mon May 8 19:29:22 2017 +0530 glusterd: socketfile & pidfile related fixes for brick multiplexing feature Problem: While brick-muliplexing is on after restarting glusterd, CLI is not showing pid of all brick processes in all volumes. Solution: While brick-mux is on all local brick process communicated through one UNIX socket but as per current code (glusterd_brick_start) it is trying to communicate with separate UNIX socket for each volume which is populated based on brick-name and vol-name.Because of multiplexing design only one UNIX socket is opened so it is throwing poller error and not able to fetch correct status of brick process through cli process. To resolve the problem write a new function glusterd_set_socket_filepath_for_mux that will call by glusterd_brick_start to validate about the existence of socketpath. To avoid the continuous EPOLLERR erros in logs update socket_connect code. Test: To reproduce the issue followed below steps 1) Create two distributed volumes(dist1 and dist2) 2) Set cluster.brick-multiplex is on 3) kill glusterd 4) run command gluster v status After apply the patch it shows correct pid for all volumes BUG: 1444596 Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2 Signed-off-by: Mohit Agrawal <moagrawa> Reviewed-on: https://review.gluster.org/17101 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Prashanth Pai <ppai> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj> --- Additional comment from Worker Ant on 2017-05-08 21:37:56 EDT --- REVIEW: https://review.gluster.org/17208 (posix: Send SIGKILL in 2nd attempt) posted (#1) for review on master by Atin Mukherjee (amukherj)
REVIEW: https://review.gluster.org/17209 (glusterd: cleanup pidfile on pmap signout) posted (#1) for review on release-3.10 by Atin Mukherjee (amukherj)
*** Bug 1449003 has been marked as a duplicate of this bug. ***
REVIEW: https://review.gluster.org/17210 (glusterd: socketfile & pidfile related fixes for brick multiplexing feature) posted (#1) for review on release-3.10 by MOHIT AGRAWAL (moagrawa)
COMMIT: https://review.gluster.org/17210 committed in release-3.10 by Raghavendra Talur (rtalur) ------ commit 38496dd45780e651647c294b782268557ce31836 Author: Mohit Agrawal <moagrawa> Date: Mon May 8 19:29:22 2017 +0530 glusterd: socketfile & pidfile related fixes for brick multiplexing feature Problem: While brick-muliplexing is on after restarting glusterd, CLI is not showing pid of all brick processes in all volumes. Solution: While brick-mux is on all local brick process communicated through one UNIX socket but as per current code (glusterd_brick_start) it is trying to communicate with separate UNIX socket for each volume which is populated based on brick-name and vol-name.Because of multiplexing design only one UNIX socket is opened so it is throwing poller error and not able to fetch correct status of brick process through cli process. To resolve the problem write a new function glusterd_set_socket_filepath_for_mux that will call by glusterd_brick_start to validate about the existence of socketpath. To avoid the continuous EPOLLERR erros in logs update socket_connect code. Test: To reproduce the issue followed below steps 1) Create two distributed volumes(dist1 and dist2) 2) Set cluster.brick-multiplex is on 3) kill glusterd 4) run command gluster v status After apply the patch it shows correct pid for all volumes > BUG: 1444596 > Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2 > Signed-off-by: Mohit Agrawal <moagrawa> > Reviewed-on: https://review.gluster.org/17101 > Smoke: Gluster Build System <jenkins.org> > Reviewed-by: Prashanth Pai <ppai> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Atin Mukherjee <amukherj> > (cherry picked from commit 21c7f7baccfaf644805e63682e5a7d2a9864a1e6) Change-Id: I1892c80b9ffa93974f20c92d421660bcf93c4cda BUG: 1449002 Signed-off-by: Mohit Agrawal <moagrawa> Reviewed-on: https://review.gluster.org/17210 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj> Reviewed-by: Prashanth Pai <ppai>
COMMIT: https://review.gluster.org/17209 committed in release-3.10 by Raghavendra Talur (rtalur) ------ commit 68047830e46f1ee2bd17d16ca6206cd0123e1ed2 Author: Atin Mukherjee <amukherj> Date: Wed May 3 12:17:30 2017 +0530 glusterd: cleanup pidfile on pmap signout This patch ensures 1. brick pidfile is cleaned up on pmap signout 2. pmap signout evemt is sent for all the bricks when a brick process shuts down. >Reviewed-on: https://review.gluster.org/17168 >Smoke: Gluster Build System <jenkins.org> >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >Reviewed-by: Jeff Darcy <jeff.us> >(cherry picked from commit 3d35e21ffb15713237116d85711e9cd1dda1688a) Change-Id: I7606a60775b484651d4b9743b6037b40323931a2 BUG: 1449002 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: https://review.gluster.org/17209 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Prashanth Pai <ppai> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Jeff Darcy <jeff.us>
REVIEW: https://review.gluster.org/17259 (posix: Send SIGKILL in 2nd attempt) posted (#1) for review on release-3.10 by Atin Mukherjee (amukherj)
COMMIT: https://review.gluster.org/17259 committed in release-3.10 by Raghavendra Talur (rtalur) ------ commit 92b2725a1a698954dc3073ee15f43972d1a427ce Author: Atin Mukherjee <amukherj> Date: Tue May 9 07:05:18 2017 +0530 posix: Send SIGKILL in 2nd attempt Commit 21c7f7ba changed the signal from SIGKILL to SIGTERM for the 2nd attempt to terminate the brick process if SIGTERM fails. This patch fixes this problem. >Reviewed-on: https://review.gluster.org/17208 >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >Reviewed-by: Pranith Kumar Karampuri <pkarampu> >Smoke: Gluster Build System <jenkins.org> >(cherry picked from commit 4f4ad03e0c4739d3fe1b0640ab8b4e1ffc985374) Change-Id: I856df607b7109a215f2a2a4827ba3ea42d8a9729 BUG: 1449002 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: https://review.gluster.org/17259 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Prashanth Pai <ppai> Reviewed-by: Raghavendra Talur <rtalur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.2, please open a new bug report.