Bug 1449002
Summary: | [Brick Multiplexing] : Bricks for multiple volumes going down after glusterd restart and not coming back up after volume start force | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Atin Mukherjee <amukherj> |
Component: | core | Assignee: | bugs <bugs> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 3.10 | CC: | amukherj, bugs, moagrawa, nchilaka, rhinduja, rtalur, sbairagy, sbhaloth, storage-qa-internal |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | brick-multiplexing | ||
Fixed In Version: | glusterfs-3.10.2 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1444596 | Environment: | |
Last Closed: | 2017-05-31 20:44:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1444596 | ||
Bug Blocks: | 1442603, 1449003 |
Description
Atin Mukherjee
2017-05-09 03:54:18 UTC
REVIEW: https://review.gluster.org/17209 (glusterd: cleanup pidfile on pmap signout) posted (#1) for review on release-3.10 by Atin Mukherjee (amukherj) *** Bug 1449003 has been marked as a duplicate of this bug. *** REVIEW: https://review.gluster.org/17210 (glusterd: socketfile & pidfile related fixes for brick multiplexing feature) posted (#1) for review on release-3.10 by MOHIT AGRAWAL (moagrawa) COMMIT: https://review.gluster.org/17210 committed in release-3.10 by Raghavendra Talur (rtalur) ------ commit 38496dd45780e651647c294b782268557ce31836 Author: Mohit Agrawal <moagrawa> Date: Mon May 8 19:29:22 2017 +0530 glusterd: socketfile & pidfile related fixes for brick multiplexing feature Problem: While brick-muliplexing is on after restarting glusterd, CLI is not showing pid of all brick processes in all volumes. Solution: While brick-mux is on all local brick process communicated through one UNIX socket but as per current code (glusterd_brick_start) it is trying to communicate with separate UNIX socket for each volume which is populated based on brick-name and vol-name.Because of multiplexing design only one UNIX socket is opened so it is throwing poller error and not able to fetch correct status of brick process through cli process. To resolve the problem write a new function glusterd_set_socket_filepath_for_mux that will call by glusterd_brick_start to validate about the existence of socketpath. To avoid the continuous EPOLLERR erros in logs update socket_connect code. Test: To reproduce the issue followed below steps 1) Create two distributed volumes(dist1 and dist2) 2) Set cluster.brick-multiplex is on 3) kill glusterd 4) run command gluster v status After apply the patch it shows correct pid for all volumes > BUG: 1444596 > Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2 > Signed-off-by: Mohit Agrawal <moagrawa> > Reviewed-on: https://review.gluster.org/17101 > Smoke: Gluster Build System <jenkins.org> > Reviewed-by: Prashanth Pai <ppai> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Atin Mukherjee <amukherj> > (cherry picked from commit 21c7f7baccfaf644805e63682e5a7d2a9864a1e6) Change-Id: I1892c80b9ffa93974f20c92d421660bcf93c4cda BUG: 1449002 Signed-off-by: Mohit Agrawal <moagrawa> Reviewed-on: https://review.gluster.org/17210 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj> Reviewed-by: Prashanth Pai <ppai> COMMIT: https://review.gluster.org/17209 committed in release-3.10 by Raghavendra Talur (rtalur) ------ commit 68047830e46f1ee2bd17d16ca6206cd0123e1ed2 Author: Atin Mukherjee <amukherj> Date: Wed May 3 12:17:30 2017 +0530 glusterd: cleanup pidfile on pmap signout This patch ensures 1. brick pidfile is cleaned up on pmap signout 2. pmap signout evemt is sent for all the bricks when a brick process shuts down. >Reviewed-on: https://review.gluster.org/17168 >Smoke: Gluster Build System <jenkins.org> >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >Reviewed-by: Jeff Darcy <jeff.us> >(cherry picked from commit 3d35e21ffb15713237116d85711e9cd1dda1688a) Change-Id: I7606a60775b484651d4b9743b6037b40323931a2 BUG: 1449002 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: https://review.gluster.org/17209 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Prashanth Pai <ppai> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Jeff Darcy <jeff.us> REVIEW: https://review.gluster.org/17259 (posix: Send SIGKILL in 2nd attempt) posted (#1) for review on release-3.10 by Atin Mukherjee (amukherj) COMMIT: https://review.gluster.org/17259 committed in release-3.10 by Raghavendra Talur (rtalur) ------ commit 92b2725a1a698954dc3073ee15f43972d1a427ce Author: Atin Mukherjee <amukherj> Date: Tue May 9 07:05:18 2017 +0530 posix: Send SIGKILL in 2nd attempt Commit 21c7f7ba changed the signal from SIGKILL to SIGTERM for the 2nd attempt to terminate the brick process if SIGTERM fails. This patch fixes this problem. >Reviewed-on: https://review.gluster.org/17208 >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >Reviewed-by: Pranith Kumar Karampuri <pkarampu> >Smoke: Gluster Build System <jenkins.org> >(cherry picked from commit 4f4ad03e0c4739d3fe1b0640ab8b4e1ffc985374) Change-Id: I856df607b7109a215f2a2a4827ba3ea42d8a9729 BUG: 1449002 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: https://review.gluster.org/17259 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Prashanth Pai <ppai> Reviewed-by: Raghavendra Talur <rtalur> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.2, please open a new bug report. |