Bug 1333711
Summary: | [scale] Brick process does not start after node reboot | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Prasanna Kumar Kalever <prasanna.kalever> |
Component: | glusterd | Assignee: | Prasanna Kumar Kalever <prasanna.kalever> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.8.0 | CC: | amukherj, bugs, nerawat, prasanna.kalever, sasundar, smohan, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8rc2 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 1322805 | Environment: | |
Last Closed: | 2016-06-16 14:05:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1322306, 1322805 | ||
Bug Blocks: | 1323564 |
Description
Prasanna Kumar Kalever
2016-05-06 08:00:34 UTC
REVIEW: http://review.gluster.org/14234 (rpc: define client port range) posted (#1) for review on release-3.8 by Prasanna Kumar Kalever (pkalever) REVIEW: http://review.gluster.org/14235 (glusterd: add defence mechanism to avoid brick port clashes) posted (#1) for review on release-3.8 by Prasanna Kumar Kalever (pkalever) COMMIT: http://review.gluster.org/14234 committed in release-3.8 by Raghavendra G (rgowdapp) ------ commit 4c58dd7f03e393b6dd5c01af3e7f4c786ba12e3f Author: Prasanna Kumar Kalever <prasanna.kalever> Date: Thu Apr 14 19:02:19 2016 +0530 rpc: define client port range Problem: when bind-insecure is 'off', all the clients bind to secure ports, if incase all the secure ports exhaust the client will no more bind to secure ports and tries gets a random port which is obviously insecure. we have seen the client obtaining a port number in the range 49152-65535 which are actually reserved as part of glusterd's pmap_registry for bricks, hence this will lead to port clashes between client and brick processes. Solution: If we can define different port ranges for clients incase where secure ports exhaust, we can avoid the maximum port clashes with in gluster processes. Still we are prone to have clashes with other non-gluster processes, but the chances being very low, but that's a different story on its own, which will be handled in upcoming patches. Backportof: > Change-Id: Ib5ce05991aa1290ccb17f6f04ffd65caf411feaf > BUG: 1322805 > Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever> > Reviewed-on: http://review.gluster.org/13998 > Smoke: Gluster Build System <jenkins.com> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.com> > Reviewed-by: Atin Mukherjee <amukherj> > Reviewed-by: Raghavendra G <rgowdapp> Change-Id: I2ab9608ddbefcdf5987d817c23dd066010148e19 BUG: 1333711 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever> Reviewed-on: http://review.gluster.org/14234 Tested-by: Prasanna Kumar Kalever <pkalever> Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Atin Mukherjee <amukherj> COMMIT: http://review.gluster.org/14235 committed in release-3.8 by Raghavendra G (rgowdapp) ------ commit 610a3f5bcc9f3443da55d857b162c83d50fa3a6b Author: Prasanna Kumar Kalever <prasanna.kalever> Date: Wed Apr 27 19:12:19 2016 +0530 glusterd: add defence mechanism to avoid brick port clashes Intro: Currently glusterd maintain the portmap registry which contains ports that are free to use between 49152 - 65535, this registry is initialized once, and updated accordingly as an then when glusterd sees they are been used. Glusterd first checks for a port within the portmap registry and gets a FREE port marked in it, then checks if that port is currently free using a connect() function then passes it to brick process which have to bind on it. Problem: We see that there is a time gap between glusterd checking the port with connect() and brick process actually binding on it. In this time gap it could be so possible that any process would have occupied this port because of which brick will fail to bind and exit. Case 1: To avoid the gluster client process occupying the port supplied by glusterd : we have separated the client port map range with brick port map range more @ http://review.gluster.org/#/c/13998/ Case 2: (Handled by this patch) To avoid the other foreign process occupying the port supplied by glusterd : To handle above situation this patch implements a mechanism to return EADDRINUSE error code to glusterd, upon which a new port is allocated and try to restart the brick process with the newly allocated port. Note: Incase of glusterd restarts i.e. runner_run_nowait() there is no way to handle Case 2, becuase runner_run_nowait() will not wait to get the return/exit code of the executed command (brick process). Hence as of now in such case, we cannot know with what error the brick has failed to connect. This patch also fix the runner_end() to perform some cleanup w.r.t return values. Backport of: > Change-Id: Iec52e7f5d87ce938d173f8ef16aa77fd573f2c5e > BUG: 1322805 > Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever> > Reviewed-on: http://review.gluster.org/14043 > Tested-by: Prasanna Kumar Kalever <pkalever> > Reviewed-by: Atin Mukherjee <amukherj> > Smoke: Gluster Build System <jenkins.com> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.com> > Reviewed-by: Raghavendra G <rgowdapp> Change-Id: Id7d8351a0082b44310177e714edc0571ad0f7195 BUG: 1333711 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever> Reviewed-on: http://review.gluster.org/14235 Tested-by: Prasanna Kumar Kalever <pkalever> Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Atin Mukherjee <amukherj> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |