227974 – Kernel BUG at mm/rmap.c:587

Bug 227974 - Kernel BUG at mm/rmap.c:587

Summary: Kernel BUG at mm/rmap.c:587

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	5
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:	bzcl34nup
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-02-09 08:32 UTC by Ronnie Mose
Modified:	2008-05-06 19:12 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-05-06 19:12:04 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
/var/log/messages for kernel BUG at mm/rmap.c:587! (5.87 KB, text/plain) 2007-02-16 22:24 UTC, Martin Burchell	no flags	Details
View All

Description Ronnie Mose 2007-02-09 08:32:32 UTC

Description of problem:
General protection fault. I am not sure, but I suspect this to be somehow
related to the sata_nv/nforce2 driver. When our systems (we currently have three
based on the nforce chipset, which all suffers from the problem) have been
running for some time (last it was 72 days), suddenly the kernel will panic and
start throwing disks off the arrays. A quick reboot and a rebuild will save it,
but failing to do so fast, and the kernel completely destroys the arrays. 

Version-Release number of selected component (if applicable):
Linux asp-a 2.6.18-1.2257.fc5 #1 SMP Fri Dec 15 16:07:14 EST 2006 x86_64 x86_64
x86_64 GNU/Linux

CPU: AMD Athlon(tm)64 X2 Dual Core Processor  3800+

/proc/mdstat
Personalities : [raid1] [raid0]
md1 : active raid0 sdb2[1] sda2[0]
      4096000 blocks 256k chunks

md0 : active raid1 sda1[0] sdb1[1]
      291001280 blocks [2/2] [UU]

How reproducible:
Difficult. Using different kind of loads have not helped me to provoke it. If
wanted I can provide access for developers to the affected machines over the
internet.

Steps to Reproduce:
1. Start a computer with sata_nv/nforce with raid0/1/5 and let it run for
two-three months without reboot...
  
Actual results:


Expected results:


Additional info:
Message from syslogd@bbeu at Fri Feb  9 08:53:02 2007 ...
bbeu kernel: Eeek! page_mapcount(page) went negative! (-1)

Message from syslogd@bbeu at Fri Feb  9 08:53:03 2007 ...
bbeu kernel:   page->flags = 3808000000083c

Message from syslogd@bbeu at Fri Feb  9 08:53:03 2007 ...
bbeu kernel:   page->count = 2

Message from syslogd@bbeu at Fri Feb  9 08:53:03 2007 ...
bbeu kernel:   page->mapping = ffff810063d67688

Message from syslogd@bbeu at Fri Feb  9 08:53:03 2007 ...
bbeu kernel: invalid opcode: 0000 [1] SMP
Feb  9 08:53:02 bbeu kernel: postmaster[13807] general protection rip:43c230
rsp:7fff4c499c50 error:0
Feb  9 08:53:02 bbeu kernel: Eeek! page_mapcount(page) went negative! (-1)
Feb  9 08:53:03 bbeu kernel:   page->flags = 3808000000083c
Feb  9 08:53:03 bbeu kernel:   page->count = 2
Feb  9 08:53:03 bbeu kernel:   page->mapping = ffff810063d67688
Feb  9 08:53:03 bbeu kernel: ----------- [cut here ] --------- [please bite here
] ---------
Feb  9 08:53:03 bbeu kernel: Kernel BUG at mm/rmap.c:587
Feb  9 08:53:03 bbeu kernel: invalid opcode: 0000 [1] SMP
Feb  9 08:53:03 bbeu kernel: last sysfs file: /block/md0/stat
Feb  9 08:53:03 bbeu kernel: CPU 0
Feb  9 08:53:03 bbeu kernel: Modules linked in: ipv6 sunrpc dm_mirror dm_mod lp
parport_pc parport floppy snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq ohci1394 ieee1394 snd_seq_device snd_pcm_oss
snd_mixer_oss ehci_hcd ohci_hcd snd_pcm sg snd_timer snd serio_raw pcspkr
forcedeth ide_cd k8_edac soundcore snd_page_alloc edac_mc cdrom i2c_nforce2
shpchp i2c_core raid0 raid1 reiserfs sata_nv libata sd_mod scsi_mod
Feb  9 08:53:03 bbeu kernel: Pid: 13807, comm: postmaster Not tainted
2.6.18-1.2257.fc5 #1
Feb  9 08:53:03 bbeu kernel: RIP: 0010:[<ffffffff8020aafd>] 
[<ffffffff8020aafd>] page_remove_rmap+0x94/0xb7
Feb  9 08:53:03 bbeu kernel: RSP: 0018:ffff81002a4fbc18  EFLAGS: 00010246
Feb  9 08:53:03 bbeu kernel: RAX: 0000000000000000 RBX: ffff810001e6cf00 RCX:
000000000000000d
Feb  9 08:53:03 bbeu kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffffffff8054b4d0
Feb  9 08:53:03 bbeu kernel: RBP: ffff8100790b58c8 R08: 0000000000000000 R09:
00002aaab358f000
Feb  9 08:53:03 bbeu kernel: R10: 0000000000000010 R11: 0000000000000000 R12:
00002aaab34fb000
Feb  9 08:53:03 bbeu kernel: R13: ffff810044ec27d8 R14: ffff81000301a400 R15:
00002aaab358f000
Feb  9 08:53:03 bbeu kernel: FS:  00002aaaaaab91f0(0000)
GS:ffffffff805d8000(0000) knlGS:00000000f7ff66c0
Feb  9 08:53:03 bbeu kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb  9 08:53:03 bbeu kernel: CR2: 00002aaab359e0fb CR3: 0000000000201000 CR4:
00000000000006e0
Feb  9 08:53:03 bbeu kernel: Process postmaster (pid: 13807, threadinfo
ffff81002a4fa000, task ffff810037c1e080)
Feb  9 08:53:03 bbeu kernel: Stack:  0000000000000286 ffff810001e6cf00
0000000039afc020 ffffffff80207c76
Feb  9 08:53:03 bbeu kernel:  0000000000000000 ffff81002a4fbd08 ffffffffffffffff
0000000000000000
Feb  9 08:53:03 bbeu kernel:  ffff8100790b58c8 ffff81002a4fbd10 000000000035e94a
0000000000000000
Feb  9 08:53:03 bbeu kernel: Call Trace:
Feb  9 08:53:03 bbeu kernel:  [<ffffffff80207c76>] unmap_vmas+0x48e/0x786
Feb  9 08:53:03 bbeu kernel:  [<ffffffff80239ed3>] exit_mmap+0x73/0xee
Feb  9 08:53:03 bbeu kernel:  [<ffffffff8023c058>] mmput+0x41/0x96
Feb  9 08:53:03 bbeu kernel:  [<ffffffff802150f4>] do_exit+0x293/0x928
Feb  9 08:53:03 bbeu kernel:  [<ffffffff8024816e>] cpuset_exit+0x0/0x6c
Feb  9 08:53:03 bbeu kernel:  [<000000000000000b>]
Feb  9 08:53:03 bbeu kernel:
Feb  9 08:53:03 bbeu kernel:
Feb  9 08:53:03 bbeu kernel: Code: 0f 0b 68 86 cb 47 80 c2 4b 02 8b 73 18 48 89
df 41 58 5b 5d
Feb  9 08:53:03 bbeu kernel: RIP  [<ffffffff8020aafd>] page_remove_rmap+0x94/0xb7
Feb  9 08:53:03 bbeu kernel:  RSP <ffff81002a4fbc18>
Feb  9 08:53:03 bbeu kernel:  <1>Fixing recursive fault but reboot is needed!

Comment 1 Ronnie Mose 2007-02-09 08:37:21 UTC

I accidentally used the uname output from another (identical server) instead of
the bbeu one. It is the same though:

Linux bbeu 2.6.18-1.2257.fc5 #1 SMP Fri Dec 15 16:07:14 EST 2006 x86_64 x86_64
x86_64 GNU/Linux

Comment 2 Dave Jones 2007-02-13 16:42:28 UTC

This 'cant happen' situation has been seen a number of times, and every time
it's come down to some hardware problem. (I even experienced it myself, and it
turned out to be bulging/oozing capacitors on the motherboard).

I suggest giving the system a workout with memtest86+ to see if that turns up
anything as a first call.

Comment 3 Martin Burchell 2007-02-16 22:24:08 UTC

Created attachment 148250 [details]
/var/log/messages for kernel BUG at mm/rmap.c:587!

Comment 4 Martin Burchell 2007-02-16 22:30:41 UTC

FWIW I also encountered a different "kernel BUG at mm/rmap.c:587!" this week.
I was using kernel-2.6.18-1.2257.fc5 but I've since upgraded to
kernel-2.6.19-1.2288.fc5

I ran memtest86+ overnight and no errors were reported.

/var/log/messages attached above.

Comment 5 Bug Zapper 2008-04-04 06:09:50 UTC

Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 6 Bug Zapper 2008-05-06 19:12:02 UTC

This bug is open for a Fedora version that is no longer maintained and
will not be fixed by Fedora. Therefore we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen thus bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.