Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 704582

Summary:

Under some circumstances the cluster fail to relocate Apache Httpd and the service become unavailable.

Product:

Red Hat Enterprise Linux 6

Reporter:

Luca Visconti <l.visconti>

Component:

resource-agents

Assignee:

David Vossel <dvossel>

Status:

CLOSED WORKSFORME

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.3

CC:

agk, cfeist, cluster-maint, edamato, fdinitto, lhh, lnovich, l.visconti, mgrac, mnovacek, tlavigne

Target Milestone:

Target Release:

6.4

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-07-17 19:40:05 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Proposed patch	mgrac: review?

Description Luca Visconti 2011-05-13 16:23:39 UTC

Description of problem:
Under some circumstances the cluster fail to relocate Apache Httpd and the service become unavailable.


Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 5.6 (Tikanga)

How reproducible:
Have the httpd service running on nodeA. This will create the pidfile.
Cut the power to nodeA. The httpd service will move to nodeB.
NodeB will fence and reboot nodeA. 
If you try to relocate httpd on nodeA, after it has rejoined to the cluster, may cause the service to stop definitively.

The problem is, I think in /usr/share/cluster/utils/config-utils.sh.
When the cluster try to start the service on nodea again the script, in check_pid_file() function, check if pid file exists ( and exists because of the power off problem ). Than check if pid file is running, but not if the pid file belongs to httpd. 

The check is made only  if there is a process with that pid:

if [ ! -d /proc/`cat "$pid_file"` ]; then	
   rm "$pid_file"
   ocf_log debug "PID File \"$pid_file\" Was Removed - PID Does Not Exist";
   return 0;
fi

return 1;

and may occur than there is, after the reboot of the node, a process with the same pid, but that is not the httpd!

In my opinion will be nice to delete all pid file when cman startup.

Comment 1 Marek Grac 2012-10-01 08:56:09 UTC

@Luca:

Thanks, for reporting bug. Yours objection is correct and such scenario can occur. But I'm not sure about deleting all PID files. If user will run instance of application independetly of resource manager then we can cause a data corruption (e.g in database server). 

What do you think about adding another condition where we will check if /proc/PID/cmdline contains given string (e.g. httpd)?

Comment 2 Luca Visconti 2012-10-01 11:11:11 UTC

@Marek
Mine was a quick and dirt solution only for my own problem: I think that your suggestion is much better.
Thank you.

Comment 3 Marek Grac 2012-10-08 10:11:39 UTC

Created attachment 623377 [details]
Proposed patch

Proposed patch that changes library and resource agent for httpd. But we will very likely need to change all agents. 

Please test if it is possible

Comment 8 David Vossel 2013-05-28 20:18:53 UTC

This patch shouldn't be necessary.

I don't understand how this issue could be possible.  These pidfiles should be placed in /var/run/cluster and that directory is cleaned out on startup.

Comment 9 Luca Visconti 2013-05-29 08:28:48 UTC

Hi David,
   the proble was exactly this: the PID files were not deleted.
When reporting the problem I suggested : "In my opinion will be nice to delete all pid file when cman startup."

It's possibile that the dir /var/run/cluster has been cleaned out only in recent version ? The bug was reported two years ago.

Comment 10 David Vossel 2013-05-29 13:23:22 UTC

(In reply to Luca Visconti from comment #9)
> Hi David,
>    the proble was exactly this: the PID files were not deleted.
> When reporting the problem I suggested : "In my opinion will be nice to
> delete all pid file when cman startup."
> 
> It's possibile that the dir /var/run/cluster has been cleaned out only in
> recent version ? The bug was reported two years ago.

Yep, my guess is in rhel5 the /var/run directory was not cleaned out, but that behavior was introduced in rhel6.  Nothing should be required now that this bug is targeting rhel 6.

-- Vossel

Comment 11 David Vossel 2013-05-29 20:40:56 UTC

Luca, Can you confirm this issue is not present in rhel6? If so we can close the issue.

-- Vossel

Comment 13 Luca Visconti 2013-06-17 08:52:47 UTC

(In reply to David Vossel from comment #11)
> Luca, Can you confirm this issue is not present in rhel6? If so we can close
> the issue.
> 
> -- Vossel

Hi David,
   I can't do that: I've no enviroment with this type of installation and RHEL6.