Hide Forgot
Description of problem: Under some circumstances the cluster fail to relocate Apache Httpd and the service become unavailable. Version-Release number of selected component (if applicable): Red Hat Enterprise Linux Server release 5.6 (Tikanga) How reproducible: Have the httpd service running on nodeA. This will create the pidfile. Cut the power to nodeA. The httpd service will move to nodeB. NodeB will fence and reboot nodeA. If you try to relocate httpd on nodeA, after it has rejoined to the cluster, may cause the service to stop definitively. The problem is, I think in /usr/share/cluster/utils/config-utils.sh. When the cluster try to start the service on nodea again the script, in check_pid_file() function, check if pid file exists ( and exists because of the power off problem ). Than check if pid file is running, but not if the pid file belongs to httpd. The check is made only if there is a process with that pid: if [ ! -d /proc/`cat "$pid_file"` ]; then rm "$pid_file" ocf_log debug "PID File \"$pid_file\" Was Removed - PID Does Not Exist"; return 0; fi return 1; and may occur than there is, after the reboot of the node, a process with the same pid, but that is not the httpd! In my opinion will be nice to delete all pid file when cman startup.
@Luca: Thanks, for reporting bug. Yours objection is correct and such scenario can occur. But I'm not sure about deleting all PID files. If user will run instance of application independetly of resource manager then we can cause a data corruption (e.g in database server). What do you think about adding another condition where we will check if /proc/PID/cmdline contains given string (e.g. httpd)?
@Marek Mine was a quick and dirt solution only for my own problem: I think that your suggestion is much better. Thank you.
Created attachment 623377 [details] Proposed patch Proposed patch that changes library and resource agent for httpd. But we will very likely need to change all agents. Please test if it is possible
This patch shouldn't be necessary. I don't understand how this issue could be possible. These pidfiles should be placed in /var/run/cluster and that directory is cleaned out on startup.
Hi David, the proble was exactly this: the PID files were not deleted. When reporting the problem I suggested : "In my opinion will be nice to delete all pid file when cman startup." It's possibile that the dir /var/run/cluster has been cleaned out only in recent version ? The bug was reported two years ago.
(In reply to Luca Visconti from comment #9) > Hi David, > the proble was exactly this: the PID files were not deleted. > When reporting the problem I suggested : "In my opinion will be nice to > delete all pid file when cman startup." > > It's possibile that the dir /var/run/cluster has been cleaned out only in > recent version ? The bug was reported two years ago. Yep, my guess is in rhel5 the /var/run directory was not cleaned out, but that behavior was introduced in rhel6. Nothing should be required now that this bug is targeting rhel 6. -- Vossel
Luca, Can you confirm this issue is not present in rhel6? If so we can close the issue. -- Vossel
(In reply to David Vossel from comment #11) > Luca, Can you confirm this issue is not present in rhel6? If so we can close > the issue. > > -- Vossel Hi David, I can't do that: I've no enviroment with this type of installation and RHEL6.