Description of problem: General protection fault. I am not sure, but I suspect this to be somehow related to the sata_nv/nforce2 driver. When our systems (we currently have three based on the nforce chipset, which all suffers from the problem) have been running for some time (last it was 72 days), suddenly the kernel will panic and start throwing disks off the arrays. A quick reboot and a rebuild will save it, but failing to do so fast, and the kernel completely destroys the arrays. Version-Release number of selected component (if applicable): Linux asp-a 2.6.18-1.2257.fc5 #1 SMP Fri Dec 15 16:07:14 EST 2006 x86_64 x86_64 x86_64 GNU/Linux CPU: AMD Athlon(tm)64 X2 Dual Core Processor 3800+ /proc/mdstat Personalities : [raid1] [raid0] md1 : active raid0 sdb2[1] sda2[0] 4096000 blocks 256k chunks md0 : active raid1 sda1[0] sdb1[1] 291001280 blocks [2/2] [UU] How reproducible: Difficult. Using different kind of loads have not helped me to provoke it. If wanted I can provide access for developers to the affected machines over the internet. Steps to Reproduce: 1. Start a computer with sata_nv/nforce with raid0/1/5 and let it run for two-three months without reboot... Actual results: Expected results: Additional info: Message from syslogd@bbeu at Fri Feb 9 08:53:02 2007 ... bbeu kernel: Eeek! page_mapcount(page) went negative! (-1) Message from syslogd@bbeu at Fri Feb 9 08:53:03 2007 ... bbeu kernel: page->flags = 3808000000083c Message from syslogd@bbeu at Fri Feb 9 08:53:03 2007 ... bbeu kernel: page->count = 2 Message from syslogd@bbeu at Fri Feb 9 08:53:03 2007 ... bbeu kernel: page->mapping = ffff810063d67688 Message from syslogd@bbeu at Fri Feb 9 08:53:03 2007 ... bbeu kernel: invalid opcode: 0000 [1] SMP Feb 9 08:53:02 bbeu kernel: postmaster[13807] general protection rip:43c230 rsp:7fff4c499c50 error:0 Feb 9 08:53:02 bbeu kernel: Eeek! page_mapcount(page) went negative! (-1) Feb 9 08:53:03 bbeu kernel: page->flags = 3808000000083c Feb 9 08:53:03 bbeu kernel: page->count = 2 Feb 9 08:53:03 bbeu kernel: page->mapping = ffff810063d67688 Feb 9 08:53:03 bbeu kernel: ----------- [cut here ] --------- [please bite here ] --------- Feb 9 08:53:03 bbeu kernel: Kernel BUG at mm/rmap.c:587 Feb 9 08:53:03 bbeu kernel: invalid opcode: 0000 [1] SMP Feb 9 08:53:03 bbeu kernel: last sysfs file: /block/md0/stat Feb 9 08:53:03 bbeu kernel: CPU 0 Feb 9 08:53:03 bbeu kernel: Modules linked in: ipv6 sunrpc dm_mirror dm_mod lp parport_pc parport floppy snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq ohci1394 ieee1394 snd_seq_device snd_pcm_oss snd_mixer_oss ehci_hcd ohci_hcd snd_pcm sg snd_timer snd serio_raw pcspkr forcedeth ide_cd k8_edac soundcore snd_page_alloc edac_mc cdrom i2c_nforce2 shpchp i2c_core raid0 raid1 reiserfs sata_nv libata sd_mod scsi_mod Feb 9 08:53:03 bbeu kernel: Pid: 13807, comm: postmaster Not tainted 2.6.18-1.2257.fc5 #1 Feb 9 08:53:03 bbeu kernel: RIP: 0010:[<ffffffff8020aafd>] [<ffffffff8020aafd>] page_remove_rmap+0x94/0xb7 Feb 9 08:53:03 bbeu kernel: RSP: 0018:ffff81002a4fbc18 EFLAGS: 00010246 Feb 9 08:53:03 bbeu kernel: RAX: 0000000000000000 RBX: ffff810001e6cf00 RCX: 000000000000000d Feb 9 08:53:03 bbeu kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8054b4d0 Feb 9 08:53:03 bbeu kernel: RBP: ffff8100790b58c8 R08: 0000000000000000 R09: 00002aaab358f000 Feb 9 08:53:03 bbeu kernel: R10: 0000000000000010 R11: 0000000000000000 R12: 00002aaab34fb000 Feb 9 08:53:03 bbeu kernel: R13: ffff810044ec27d8 R14: ffff81000301a400 R15: 00002aaab358f000 Feb 9 08:53:03 bbeu kernel: FS: 00002aaaaaab91f0(0000) GS:ffffffff805d8000(0000) knlGS:00000000f7ff66c0 Feb 9 08:53:03 bbeu kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 9 08:53:03 bbeu kernel: CR2: 00002aaab359e0fb CR3: 0000000000201000 CR4: 00000000000006e0 Feb 9 08:53:03 bbeu kernel: Process postmaster (pid: 13807, threadinfo ffff81002a4fa000, task ffff810037c1e080) Feb 9 08:53:03 bbeu kernel: Stack: 0000000000000286 ffff810001e6cf00 0000000039afc020 ffffffff80207c76 Feb 9 08:53:03 bbeu kernel: 0000000000000000 ffff81002a4fbd08 ffffffffffffffff 0000000000000000 Feb 9 08:53:03 bbeu kernel: ffff8100790b58c8 ffff81002a4fbd10 000000000035e94a 0000000000000000 Feb 9 08:53:03 bbeu kernel: Call Trace: Feb 9 08:53:03 bbeu kernel: [<ffffffff80207c76>] unmap_vmas+0x48e/0x786 Feb 9 08:53:03 bbeu kernel: [<ffffffff80239ed3>] exit_mmap+0x73/0xee Feb 9 08:53:03 bbeu kernel: [<ffffffff8023c058>] mmput+0x41/0x96 Feb 9 08:53:03 bbeu kernel: [<ffffffff802150f4>] do_exit+0x293/0x928 Feb 9 08:53:03 bbeu kernel: [<ffffffff8024816e>] cpuset_exit+0x0/0x6c Feb 9 08:53:03 bbeu kernel: [<000000000000000b>] Feb 9 08:53:03 bbeu kernel: Feb 9 08:53:03 bbeu kernel: Feb 9 08:53:03 bbeu kernel: Code: 0f 0b 68 86 cb 47 80 c2 4b 02 8b 73 18 48 89 df 41 58 5b 5d Feb 9 08:53:03 bbeu kernel: RIP [<ffffffff8020aafd>] page_remove_rmap+0x94/0xb7 Feb 9 08:53:03 bbeu kernel: RSP <ffff81002a4fbc18> Feb 9 08:53:03 bbeu kernel: <1>Fixing recursive fault but reboot is needed!
I accidentally used the uname output from another (identical server) instead of the bbeu one. It is the same though: Linux bbeu 2.6.18-1.2257.fc5 #1 SMP Fri Dec 15 16:07:14 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
This 'cant happen' situation has been seen a number of times, and every time it's come down to some hardware problem. (I even experienced it myself, and it turned out to be bulging/oozing capacitors on the motherboard). I suggest giving the system a workout with memtest86+ to see if that turns up anything as a first call.
Created attachment 148250 [details] /var/log/messages for kernel BUG at mm/rmap.c:587!
FWIW I also encountered a different "kernel BUG at mm/rmap.c:587!" this week. I was using kernel-2.6.18-1.2257.fc5 but I've since upgraded to kernel-2.6.19-1.2288.fc5 I ran memtest86+ overnight and no errors were reported. /var/log/messages attached above.
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.