Bug 1491060
Summary: | PID File handling: self-heal-deamon pid file leaves stale pid and indiscriminately kills pid when glusterd is started | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Ben Werthmann <ben> |
Component: | glusterd | Assignee: | bugs <bugs> |
Status: | CLOSED EOL | QA Contact: | |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.10 | CC: | amukherj, ben, bugs, joe, moagrawa, peljasz |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-06-20 18:30:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1258561, 1464072 | ||
Bug Blocks: |
Description
Ben Werthmann
2017-09-12 23:32:01 UTC
Not sure if these patches will help: Looks like there may be a fix for this already: https://review.gluster.org/#/c/13580/ https://review.gluster.org/#/c/17601 Specifically with the 'glusterd kills any process number in the stale pid file.' behavior. May also lead to situations like this: $ gluster vol heal $vol statistics Gathering crawl statistics on volume $vol has been unsuccessful on bricks that are down. Please check if all brick processes are running. or gluster v heal testvol statistics Gathering crawl statistics on volume testvol has been unsuccessful: Staging failed on vm1. Error: Self-heal daemon is not running. Check self-heal daemon log file./ Also occurs with 3.10.5 from ppa:gluster/glusterfs-3.10 Upgrading to urgent as this affects stability of gluster in general. commit 220d406ad13d840e950eef001a2b36f87570058d Author: Gaurav Kumar Garg <garg.gaurav52> Date: Wed Mar 2 17:42:07 2016 +0530 glusterd: Gluster should keep PID file in correct location Currently Gluster keeps process pid information of all the daemons and brick processes in Gluster configuration file directory (ie., /var/lib/glusterd/*). These pid files should be seperate from configuration files. Deletion of the configuration file directory might result into serious problems. Also, /var/run/gluster is the default placeholder directory for pid files. So, with this fix Gluster will keep all process pid information of all processes in /var/run/gluster/* directory. Change-Id: Idb09e3fccb6a7355fbac1df31082637c8d7ab5b4 BUG: 1258561 Signed-off-by: Gaurav Kumar Garg <ggarg> Signed-off-by: Saravanakumar Arumugam <sarumuga> Reviewed-on: https://review.gluster.org/13580 Tested-by: MOHIT AGRAWAL <moagrawa> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj> The above commit takes care of this issue. Please note this fix is available in release-3.12 branch. Since this is a major change in the way pidfiles are placed, I don't have a plan to cherry pick this into release-3.10 branch. Ben - Do you mind if I close this issue now? As I mentioned in the earlier comment, a stable release branch may not accept this change in the behaviour. So if you're fine with the workaround, you can choose to stick to release-3.10 branch otherwise please upgrade to release-3.12? I think there should be a minimal fix for 3.10. The minimal fix in this context is: - glusterd should only kill pid in glustershd pid file when the pid is a glusterfs process I will also run my tests with 3.12 and report results. I'll just chip in my vote - I emailed for help mailling list too - please fix this in 3.10. It's freaking frustrating problem. Killing indiscriminate processes should be considered a major bug and a fix should most definitely be implemented in all supported branches. Mohit - can you please backport https://review.gluster.org/13580 to release-3.10 branch? upstream patch : https://review.gluster.org/18025 (In reply to Atin Mukherjee from comment #11) > upstream patch : https://review.gluster.org/18025 This was a wrong patch link. https://review.gluster.org/18484 is the right one and it got merged in 3.10 branch. This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |