Bug 687936 - Argument to exec occasionally incorrectly copied as NULL
Summary: Argument to exec occasionally incorrectly copied as NULL
Keywords:
Status: CLOSED DUPLICATE of bug 627298
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: s390
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-15 18:43 UTC by Marc Milgram
Modified: 2018-11-14 18:43 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-03-29 14:14:34 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Marc Milgram 2011-03-15 18:43:12 UTC
Description of problem:
Customer has a shell script wrapper around /bin/kill.  This is called about once a second with the arguments:
  killwrapper -0 <pid>

It is used to test if a given process is still running.

We used a systemtap script to determine the command lines.

In two cases, the call to exec for the kill wrapper had the expected arguments, but when the exec for the real kill command was called, the command line for the kill wrapper showed that one of the arguments was NULL instead of the original value.

The kill wrapper didn't modify its command line arguments.  When its first argument is not the expected argument, it writes data to a file.  In the observed cases, it did not write data.

Version-Release number of selected component (if applicable):
kernel-2.6.18-194.3.1.el5.s390

How reproducible:
Difficult to reproduce.  Reproduces at customer site every 2 weeks to 2 months using Oracle clustering

Steps to Reproduce:
1. Run Oracle rac clustering between several nodes
2. Beat on it for several weeks
  
Actual results:
Cluster nodes evicted

Expected results:
Cluster remains running

Additional info:
There is plenty of memory available.

Comment 5 Marc Milgram 2011-03-29 14:14:34 UTC
Supposedly the machines in question didn't have a problem with the -194 kernel, but have a problem with the -194.3.1 kernel.  This may be a regression caused by the fix for BZ 545527.

This appears to have been fixed in the -238 kernel with BZ 627298.

*** This bug has been marked as a duplicate of bug 627298 ***


Note You need to log in before you can comment on or make changes to this bug.