Bug 704582

Summary: Under some circumstances the cluster fail to relocate Apache Httpd and the service become unavailable.
Product: Red Hat Enterprise Linux 6 Reporter: Luca Visconti <l.visconti>
Component: resource-agentsAssignee: David Vossel <dvossel>
Status: CLOSED WORKSFORME QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.3CC: agk, cfeist, cluster-maint, edamato, fdinitto, lhh, lnovich, l.visconti, mgrac, mnovacek, tlavigne
Target Milestone: rc   
Target Release: 6.4   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-17 19:40:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Proposed patch mgrac: review?

Description Luca Visconti 2011-05-13 16:23:39 UTC
Description of problem:
Under some circumstances the cluster fail to relocate Apache Httpd and the service become unavailable.


Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 5.6 (Tikanga)

How reproducible:
Have the httpd service running on nodeA. This will create the pidfile.
Cut the power to nodeA. The httpd service will move to nodeB.
NodeB will fence and reboot nodeA. 
If you try to relocate httpd on nodeA, after it has rejoined to the cluster, may cause the service to stop definitively.

The problem is, I think in /usr/share/cluster/utils/config-utils.sh.
When the cluster try to start the service on nodea again the script, in check_pid_file() function, check if pid file exists ( and exists because of the power off problem ). Than check if pid file is running, but not if the pid file belongs to httpd. 

The check is made only  if there is a process with that pid:

if [ ! -d /proc/`cat "$pid_file"` ]; then	
   rm "$pid_file"
   ocf_log debug "PID File \"$pid_file\" Was Removed - PID Does Not Exist";
   return 0;
fi

return 1;

and may occur than there is, after the reboot of the node, a process with the same pid, but that is not the httpd!

In my opinion will be nice to delete all pid file when cman startup.

Comment 1 Marek Grac 2012-10-01 08:56:09 UTC
@Luca:

Thanks, for reporting bug. Yours objection is correct and such scenario can occur. But I'm not sure about deleting all PID files. If user will run instance of application independetly of resource manager then we can cause a data corruption (e.g in database server). 

What do you think about adding another condition where we will check if /proc/PID/cmdline contains given string (e.g. httpd)?

Comment 2 Luca Visconti 2012-10-01 11:11:11 UTC
@Marek
Mine was a quick and dirt solution only for my own problem: I think that your suggestion is much better.
Thank you.

Comment 3 Marek Grac 2012-10-08 10:11:39 UTC
Created attachment 623377 [details]
Proposed patch

Proposed patch that changes library and resource agent for httpd. But we will very likely need to change all agents. 

Please test if it is possible

Comment 8 David Vossel 2013-05-28 20:18:53 UTC
This patch shouldn't be necessary.

I don't understand how this issue could be possible.  These pidfiles should be placed in /var/run/cluster and that directory is cleaned out on startup.

Comment 9 Luca Visconti 2013-05-29 08:28:48 UTC
Hi David,
   the proble was exactly this: the PID files were not deleted.
When reporting the problem I suggested : "In my opinion will be nice to delete all pid file when cman startup."

It's possibile that the dir /var/run/cluster has been cleaned out only in recent version ? The bug was reported two years ago.

Comment 10 David Vossel 2013-05-29 13:23:22 UTC
(In reply to Luca Visconti from comment #9)
> Hi David,
>    the proble was exactly this: the PID files were not deleted.
> When reporting the problem I suggested : "In my opinion will be nice to
> delete all pid file when cman startup."
> 
> It's possibile that the dir /var/run/cluster has been cleaned out only in
> recent version ? The bug was reported two years ago.

Yep, my guess is in rhel5 the /var/run directory was not cleaned out, but that behavior was introduced in rhel6.  Nothing should be required now that this bug is targeting rhel 6.

-- Vossel

Comment 11 David Vossel 2013-05-29 20:40:56 UTC
Luca, Can you confirm this issue is not present in rhel6? If so we can close the issue.

-- Vossel

Comment 13 Luca Visconti 2013-06-17 08:52:47 UTC
(In reply to David Vossel from comment #11)
> Luca, Can you confirm this issue is not present in rhel6? If so we can close
> the issue.
> 
> -- Vossel

Hi David,
   I can't do that: I've no enviroment with this type of installation and RHEL6.