Bug 185481

Summary: Kernel crashed with "Fixing recursive fault but reboot is needed!" error
Product: [Fedora] Fedora Reporter: Erik A. Espinoza <phomey>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: low Docs Contact:
Priority: medium    
Version: 5CC: anderson, jonstanley, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: MassClosed
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-20 04:38:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Erik A. Espinoza 2006-03-15 02:34:39 UTC
Description of problem:
We have a cluster of 250 Dual Core Dual Opterons running jobs in a grid. One
crashed with an interesting kernel message that points to a software bug. All
machines are installed via kickstart and are identical in both
software/hardware. This problem has not been reproduced.

Version-Release number of selected component (if applicable):

How reproducible:
not reproducible as of yet. . .

Steps to Reproduce:
None Yet
Actual results:
Kernel BUG at mm/rmap.c:493
invalid operand: 0000 [1] SMP
last sysfs file: /block/loop7/dev
Modules linked in: loop nfsd exportfs autofs4 nfs lockd nfs_acl sunrpc dm_mod
video button battery ac ohci_hcd i2c_amd8111 i2c_amd756 i2c_core tg3 floppy ext3
jbd sata_mv libata 3w_9xxx sd_mod scsi_mod
Pid: 2024, comm: condor_exec.287 Tainted: G   M  2.6.15-1.1831_FC4smp #1
RIP: 0010:[<ffffffff8017dd87>] <ffffffff8017dd87>{page_remove_rmap+129}
RSP: 0000:ffff81006456dc38  EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff810001000000 RCX: ffffffff8044cf58
RDX: 0000000000000000 RSI: 0000000000000292 RDI: ffffffff8044cf40
RBP: 000000000d472000 R08: 0000000000000004 R09: 0000000000000004
R10: ffff81006456d908 R11: 0000000000000000 R12: ffff8100005fa390
R13: 000000000d600000 R14: ffff810001000000 R15: 0000000029fa1000
FS:  00002aaaab4cc1a0(0000) GS:ffffffff8059b000(0000) knlGS:00000000f7f259e0
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 000000000d472000 CR3: 0000000000101000 CR4: 00000000000006e0
Process condor_exec.287 (pid: 2024, threadinfo ffff81006456c000, task
Stack: 0000000000000000 ffffffff801767c0 0000000000000000 ffff81006456dd38
       ffffffffffffffff 0000000000000000 ffff81013f465d18 ffff81006456dd40
       000000000036d000 0000000100000001
Call Trace:<ffffffff801767c0>{unmap_vmas+1071} <ffffffff80179938>{exit_mmap+124}
       <ffffffff80139f66>{mmput+37} <ffffffff8013f1c4>{do_exit+584}
       <ffffffff8010f23a>{do_signal+116} <ffffffff80356a8b>{thread_return+158}
       <ffffffff8017bf34>{do_brk+474} <ffffffff8011017e>{retint_signal+61}

Code: 0f 0b 68 6e ff 37 80 c2 ed 01 48 c7 c6 ff ff ff ff bf 20 00
RIP <ffffffff8017dd87>{page_remove_rmap+129} RSP <ffff81006456dc38>
<1>Fixing recursive fault but reboot is needed!

Expected results:
Machine keeps running like the 250 neighbors in our grid

Additional info:

Comment 1 Dave Jones 2006-09-17 01:49:15 UTC
[This comment added as part of a mass-update to all open FC4 kernel bugs]

FC4 has now transitioned to the Fedora legacy project, which will continue to
release security related updates for the kernel.  As this bug is not security
related, it is unlikely to be fixed in an update for FC4, and has been migrated
to FC5.

Please retest with Fedora Core 5.

Thank you.

Comment 2 Dave Jones 2006-10-16 17:48:52 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 3 Jon Stanley 2008-01-20 04:38:14 UTC
(this is a mass-close to kernel bugs in NEEDINFO state)

As indicated previously there has been no update on the progress of this bug
therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue
still occurs for you and I will try to assist in its resolution. Thank you for
taking the time to report the initial bug.

If you believe that this bug was closed in error, please feel free to reopen
this bug.