Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 687936

Summary:	Argument to exec occasionally incorrectly copied as NULL
Product:	Red Hat Enterprise Linux 5	Reporter:	Marc Milgram <mmilgram>
Component:	kernel	Assignee:	Red Hat Kernel Manager <kernel-mgr>
Status:	CLOSED DUPLICATE	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	high	Docs Contact:
Priority:	high
Version:	5.5
Target Milestone:	rc
Target Release:	---
Hardware:	s390
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-03-29 14:14:34 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Marc Milgram 2011-03-15 18:43:12 UTC

Description of problem:
Customer has a shell script wrapper around /bin/kill.  This is called about once a second with the arguments:
  killwrapper -0 <pid>

It is used to test if a given process is still running.

We used a systemtap script to determine the command lines.

In two cases, the call to exec for the kill wrapper had the expected arguments, but when the exec for the real kill command was called, the command line for the kill wrapper showed that one of the arguments was NULL instead of the original value.

The kill wrapper didn't modify its command line arguments.  When its first argument is not the expected argument, it writes data to a file.  In the observed cases, it did not write data.

Version-Release number of selected component (if applicable):
kernel-2.6.18-194.3.1.el5.s390

How reproducible:
Difficult to reproduce.  Reproduces at customer site every 2 weeks to 2 months using Oracle clustering

Steps to Reproduce:
1. Run Oracle rac clustering between several nodes
2. Beat on it for several weeks
  
Actual results:
Cluster nodes evicted

Expected results:
Cluster remains running

Additional info:
There is plenty of memory available.

Comment 5 Marc Milgram 2011-03-29 14:14:34 UTC

Supposedly the machines in question didn't have a problem with the -194 kernel, but have a problem with the -194.3.1 kernel.  This may be a regression caused by the fix for BZ 545527.

This appears to have been fixed in the -238 kernel with BZ 627298.

*** This bug has been marked as a duplicate of bug 627298 ***