Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
Under some circumstances the cluster fail to relocate Apache Httpd and the service become unavailable.
Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
How reproducible:
Have the httpd service running on nodeA. This will create the pidfile.
Cut the power to nodeA. The httpd service will move to nodeB.
NodeB will fence and reboot nodeA.
If you try to relocate httpd on nodeA, after it has rejoined to the cluster, may cause the service to stop definitively.
The problem is, I think in /usr/share/cluster/utils/config-utils.sh.
When the cluster try to start the service on nodea again the script, in check_pid_file() function, check if pid file exists ( and exists because of the power off problem ). Than check if pid file is running, but not if the pid file belongs to httpd.
The check is made only if there is a process with that pid:
if [ ! -d /proc/`cat "$pid_file"` ]; then
rm "$pid_file"
ocf_log debug "PID File \"$pid_file\" Was Removed - PID Does Not Exist";
return 0;
fi
return 1;
and may occur than there is, after the reboot of the node, a process with the same pid, but that is not the httpd!
In my opinion will be nice to delete all pid file when cman startup.
@Luca:
Thanks, for reporting bug. Yours objection is correct and such scenario can occur. But I'm not sure about deleting all PID files. If user will run instance of application independetly of resource manager then we can cause a data corruption (e.g in database server).
What do you think about adding another condition where we will check if /proc/PID/cmdline contains given string (e.g. httpd)?
Created attachment 623377[details]
Proposed patch
Proposed patch that changes library and resource agent for httpd. But we will very likely need to change all agents.
Please test if it is possible
This patch shouldn't be necessary.
I don't understand how this issue could be possible. These pidfiles should be placed in /var/run/cluster and that directory is cleaned out on startup.
Hi David,
the proble was exactly this: the PID files were not deleted.
When reporting the problem I suggested : "In my opinion will be nice to delete all pid file when cman startup."
It's possibile that the dir /var/run/cluster has been cleaned out only in recent version ? The bug was reported two years ago.
(In reply to Luca Visconti from comment #9)
> Hi David,
> the proble was exactly this: the PID files were not deleted.
> When reporting the problem I suggested : "In my opinion will be nice to
> delete all pid file when cman startup."
>
> It's possibile that the dir /var/run/cluster has been cleaned out only in
> recent version ? The bug was reported two years ago.
Yep, my guess is in rhel5 the /var/run directory was not cleaned out, but that behavior was introduced in rhel6. Nothing should be required now that this bug is targeting rhel 6.
-- Vossel
(In reply to David Vossel from comment #11)
> Luca, Can you confirm this issue is not present in rhel6? If so we can close
> the issue.
>
> -- Vossel
Hi David,
I can't do that: I've no enviroment with this type of installation and RHEL6.