The following has be reported by IBM LTC:
In RHEL 3 U4 -- top command gave segmentation fault
we were running some fsstress and some ltp tests on x335b RHEL3 U4 (having
kernel--Linux x335b 2.4.21-21.ELsmp #1 SMP Fri Oct 1 09:28:06 EDT 2004
i386 GNU/Linux) mean time we had top command also running . After
some 10 to 12
hours we saw the message "segmentation fault" . This defect looks
defect filed against the SUSE with same problem (bug number is 9297)
but that is a 2.6 kernel.
Did you submit a fix to mainline and 2.4 ? Thanks.
No, I did not attempt to fix this bug in 2.4. The bug is fixed in
Didn't think to back port 2.6 fix to 2.4. I can create a patch for
2.4 kernel and send to Marcelo.
(In reply to comment #2)
> No, I did not attempt to fix this bug in 2.4. The bug is fixed in
> Didn't think to back port 2.6 fix to 2.4. I can create a patch for
> 2.4 kernel and send to Marcelo.
Please attach fix to this bug report and we let the test team test it
out first. Thanks.
assigning problem to you since you are providing the fix. Thanks.
Created an attachment (id=7311)
Patch for kernel-2.4.21-21.EL
I created this patch for the source in kernel-2.4.21-21.EL.src.rpm
that I cound on the ftp site. Hope this is the 'correct' kernel. I'm
not sure. Also, I have not tested this as I don't have easy access to
machine for testing. Can someone try it to ensure that it does solve
I'm attempting to move this to the FIXEDAWAITINGTEST state, due to
there being a patch available. If this patch doesn't work, or needs
to be for another kernel version, please let me know.
Please make sure your team test Mike's patch please.
We have a deadline to submit patch for RH.
Created attachment 106450 [details]
----- Additional Comments From email@example.com 2004-11-10 16:59 EDT -------
The India team is out on holiday, which presents a problem for us. We are
going try and test it here in Austin.
Created attachment 106464 [details]
----- Additional Comments From firstname.lastname@example.org(prefers email via email@example.com) 2004-11-10 20:09 EDT -------
Updated version of the patch
Better version of the patch that will apply with '-p1'. Note that the code
changes are the same, I just changed the format of the data.
----- Additional Comments From firstname.lastname@example.org 2004-11-11 11:47 EDT -------
I downloaded the patch and build a kernel with it. I then started top, fsstress
and a couple other tests. I did not see the message "segmentation fault" because
I hit another bug that I already have open - 11109. That is an assertion in
do_get_write_access caused by fsstress. That problem always occurs for me w/i an
hour or two. Until that bug is fixed, I will not be able to test this problem.
----- Additional Comments From email@example.com 2004-11-11 13:08 EDT -------
Thanks for trying, also looks like there are other problems besides 11109 e.g.
We may have to pick up multiple fixes etc. before we can test this one.
Let me know if you are willing to do that and re-test. Thanks.
----- Additional Comments From firstname.lastname@example.org(prefers email via email@example.com) 2004-11-11 13:24 EDT -------
You don't need to run your fsstress tests to recreate/test this problem. Here
is what you can do. Use the source code below to build two simple programs:
Source for program fe_long
if (c > 0)
execl("./fe", "./fe", NULL);
Source for program fe
Note that the program fe_long simply forks and execs the program fe in an
infinite loop. The key here is generating many instances where a program execs
another program with a shorter name.
After building the programs, start up top with no delay 'top -d 0'. Then start
up several instances of the program 'fe_long'. I would suggest 'n instances'
where n is the number of CPUs in the system. Also note that multiple CPUs is
almost required to recreate/test this program. I really wouldn't expect one to
recreate this on a single CPU system.
On a kernel without the fix, you should see top segfault within an hour.
Hopefully, much sooner (like 5 minutes). On a kernel with the fix, there should
be no segfault.
----- Additional Comments From firstname.lastname@example.org 2004-11-11 21:20 EDT -------
I will install a multiple cpu machine and retest the fix.
----- Additional Comments From email@example.com 2004-11-14 23:55 EDT -------
Thanks David! Marking this defect as TESTED.
----- Additional Comments From firstname.lastname@example.org 2004-11-15 00:30 EDT -------
Looks like this fix may not goto RHEL3 U4 since U4 is already closed (see
Last build of U4 was last week. No fix has yet been committed to U5.
----- Additional Comments From email@example.com(prefers email via firstname.lastname@example.org) 2004-11-16 13:45 EDT -------
Date: Tue, 16 Nov 2004 08:16:04 -0200
From: Marcelo Tosatti <email@example.com>
Subject: Re: [PATCH] Task name handling for 2.4
To: Mike Kravetz <firstname.lastname@example.org>
I've saved it to 2.4.29pre.
On Fri, Nov 12, 2004 at 09:31:16AM -0800, Mike Kravetz wrote:
> Hi Marcelo,
> There is a problem with task name handling in the /proc fs. See
> for the patch that eventually made its way into the 2.6 tree.
> We now have people experiencing the same problem/bug in 2.4. Here
> is a patch for 2.4 that implements the same fix. Please consider
> Signed-off-by: Mike Kravetz <email@example.com>
----- Additional Comments From firstname.lastname@example.org 2004-11-16 17:59 EDT -------
mainline accepted Mike's patch.
Can we please have this commited for U5 if too late for U4. Thanks.
----- Additional Comments From email@example.com(prefers email via firstname.lastname@example.org) 2004-11-15 12:00 EDT -------
FYI - On Friday I sent the patch to Marcelo for inclusion in 2.4 mainline.
---- Additional Comments From email@example.com 2005-03-18 04:11 EST -------
Verification is under progress with RHEL3 U5 (2.4.21-31).
What |Removed |Added
------- Additional Comments From firstname.lastname@example.org 2005-03-23 00:41 EST -------
Verified that top command is stable in RHEL3 U5.
Closing the defect report.
Glen, could you please explain what's going on here? No fix for this
problem has been committed to U5, so I'm not sure why anyone on your
end attempted to verify that the problem is fixed.
---- Additional Comments From email@example.com 2005-03-31 14:52 EST -------
sorry our test team was anxious to test this. We had tried to request a RHEL
3 U5 target.
---- Additional Comments From firstname.lastname@example.org(prefers email via email@example.com) 2005-07-11 14:56 EDT -------
Hi Michael, Prakash, Salina,
Do we know if this patch was picked up for RHEL3-U5? If so, let's set this bug
to "accepted". If not, let's move the target-milestone out to RHEL3-U6. Thanks!
---- Additional Comments From firstname.lastname@example.org 2005-07-11 16:01 EDT -------
which is RHEL 3 U5 kernel, still does not have Mike's patch.
Glen, no kernel fix related to this made it into U6.
---- Additional Comments From email@example.com(prefers email via firstname.lastname@example.org) 2005-09-13 20:47 EDT -------
Any update on this bug?
---- Additional Comments From email@example.com(prefers email via firstname.lastname@example.org) 2005-09-13 22:15 EDT -------
Not sure who you are talking to John. If there is anything else I (as bug
owner) can do to help, let me know. Patch has been provided and even accepted
It was proposed for RHEL-U7 on 9/12. No work has been done on it as of yet.
Please download and test the kernel found here:
In this location, you will find an i386 smp kernel, the associated kernel
src.rpm, and the patch that was applied:
So far I haven't been able to reproduce the problem with a 4-cpu system
running a kernel without the patch.
Please report your test results back to the Bugzilla.
BTW, my test consists of running 4 "fe_long" tasks, along with a "top -d 0",
on a 4-cpu box. It's still running strong on an unpatched kernel for well
over 2 hours.
1. I was able to reproduce the top segfault.
2. But the kernel above (kernel-smp-2.4.21-37.3.EL.bz138730.i686.rpm) may not
Apparently 2.4.21-37.1 introduced a patch associated with NX-in-kernel-code
and largepages, that causes some i386 machines to go into an infinite reboot
I will update the kernel in the http://people.redhat.com/anderson directory
I have replaced the test kernel binary, src.rpm and applied patch in:
I am testing it now, but we require the reporting partner's test and buy-in
of the test kernel.
Glen, if this bugzilla doesn't need to remain IBM-confidential, then
please uncheck the two "IBM Confidential Group" boxes below. Thanks.
Thanks, Glen. Completing transition to public bug.
*** Bug 162683 has been marked as a duplicate of this bug. ***
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.5.EL).
Is the fix for this the same patch that is attached to this report? If not is
it possible to point me to the patch that was finally used or the SRPM
See comment #30, and click on the link. The patch is "linux-kernel-test-patch".
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.