Bug 704582 - Under some circumstances the cluster fail to relocate Apache Httpd and the service become unavailable.
Summary: Under some circumstances the cluster fail to relocate Apache Httpd and the se...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: resource-agents
Version: 6.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 6.4
Assignee: David Vossel
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-13 16:23 UTC by Luca Visconti
Modified: 2013-07-18 23:11 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-17 19:40:05 UTC
Target Upstream Version:


Attachments (Terms of Use)
Proposed patch (2.11 KB, patch)
2012-10-08 10:11 UTC, Marek Grac
mgrac: review?
Details | Diff

Description Luca Visconti 2011-05-13 16:23:39 UTC
Description of problem:
Under some circumstances the cluster fail to relocate Apache Httpd and the service become unavailable.


Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 5.6 (Tikanga)

How reproducible:
Have the httpd service running on nodeA. This will create the pidfile.
Cut the power to nodeA. The httpd service will move to nodeB.
NodeB will fence and reboot nodeA. 
If you try to relocate httpd on nodeA, after it has rejoined to the cluster, may cause the service to stop definitively.

The problem is, I think in /usr/share/cluster/utils/config-utils.sh.
When the cluster try to start the service on nodea again the script, in check_pid_file() function, check if pid file exists ( and exists because of the power off problem ). Than check if pid file is running, but not if the pid file belongs to httpd. 

The check is made only  if there is a process with that pid:

if [ ! -d /proc/`cat "$pid_file"` ]; then	
   rm "$pid_file"
   ocf_log debug "PID File \"$pid_file\" Was Removed - PID Does Not Exist";
   return 0;
fi

return 1;

and may occur than there is, after the reboot of the node, a process with the same pid, but that is not the httpd!

In my opinion will be nice to delete all pid file when cman startup.

Comment 1 Marek Grac 2012-10-01 08:56:09 UTC
@Luca:

Thanks, for reporting bug. Yours objection is correct and such scenario can occur. But I'm not sure about deleting all PID files. If user will run instance of application independetly of resource manager then we can cause a data corruption (e.g in database server). 

What do you think about adding another condition where we will check if /proc/PID/cmdline contains given string (e.g. httpd)?

Comment 2 Luca Visconti 2012-10-01 11:11:11 UTC
@Marek
Mine was a quick and dirt solution only for my own problem: I think that your suggestion is much better.
Thank you.

Comment 3 Marek Grac 2012-10-08 10:11:39 UTC
Created attachment 623377 [details]
Proposed patch

Proposed patch that changes library and resource agent for httpd. But we will very likely need to change all agents. 

Please test if it is possible

Comment 8 David Vossel 2013-05-28 20:18:53 UTC
This patch shouldn't be necessary.

I don't understand how this issue could be possible.  These pidfiles should be placed in /var/run/cluster and that directory is cleaned out on startup.

Comment 9 Luca Visconti 2013-05-29 08:28:48 UTC
Hi David,
   the proble was exactly this: the PID files were not deleted.
When reporting the problem I suggested : "In my opinion will be nice to delete all pid file when cman startup."

It's possibile that the dir /var/run/cluster has been cleaned out only in recent version ? The bug was reported two years ago.

Comment 10 David Vossel 2013-05-29 13:23:22 UTC
(In reply to Luca Visconti from comment #9)
> Hi David,
>    the proble was exactly this: the PID files were not deleted.
> When reporting the problem I suggested : "In my opinion will be nice to
> delete all pid file when cman startup."
> 
> It's possibile that the dir /var/run/cluster has been cleaned out only in
> recent version ? The bug was reported two years ago.

Yep, my guess is in rhel5 the /var/run directory was not cleaned out, but that behavior was introduced in rhel6.  Nothing should be required now that this bug is targeting rhel 6.

-- Vossel

Comment 11 David Vossel 2013-05-29 20:40:56 UTC
Luca, Can you confirm this issue is not present in rhel6? If so we can close the issue.

-- Vossel

Comment 13 Luca Visconti 2013-06-17 08:52:47 UTC
(In reply to David Vossel from comment #11)
> Luca, Can you confirm this issue is not present in rhel6? If so we can close
> the issue.
> 
> -- Vossel

Hi David,
   I can't do that: I've no enviroment with this type of installation and RHEL6.


Note You need to log in before you can comment on or make changes to this bug.