Bug 207022 - Kernel BUG at mm/highmem.c:561 and unable to handle kernel paging request at virtual address 4919f100
Kernel BUG at mm/highmem.c:561 and unable to handle kernel paging request at ...
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
5
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Dave Jones
Brian Brock
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-18 15:57 EDT by J. Adam Hough
Modified: 2015-01-04 17:28 EST (History)
2 users (show)

See Also:
Fixed In Version: 2.6.18-1.2200.fc5smp
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-10-17 13:36:33 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
lspci -v > lspci-verbose.txt (7.18 KB, text/plain)
2006-09-18 15:57 EDT, J. Adam Hough
no flags Details
A *.debug syslog file from the machine. (102.78 KB, application/octet-stream)
2006-09-18 16:00 EDT, J. Adam Hough
no flags Details
*.debug of syslog running the newer kernel (111.87 KB, application/octet-stream)
2006-09-19 09:43 EDT, J. Adam Hough
no flags Details
*.debug of syslog running the newer kernel (189.95 KB, application/octet-stream)
2006-09-19 15:44 EDT, J. Adam Hough
no flags Details
current fc5 kernel crash *.debug syslog.all (104.21 KB, application/octet-stream)
2006-09-28 10:38 EDT, J. Adam Hough
no flags Details
The 2.6.18-1.2189.fc5smp development kernel (123.80 KB, application/octet-stream)
2006-10-12 11:49 EDT, J. Adam Hough
no flags Details

  None (edit)
Description J. Adam Hough 2006-09-18 15:57:24 EDT
Description of problem:
Crashes with different messages each time.
Most frequent messages is "kernel BUG at mm/highmem.c:561!"
BUG: unable to handle kernel paging request at virtual address 4919f100

Modules linked in: ipv6 autofs4 ip_conntrack_netbios_ns ipt_REJECT xt_tcpudp
xt_state ip_conntrack nfnetlink iptable_filter ip_tables x_tables lp parport_pc
parport snd_intel8x0 snd_ac97_codec sg snd_ac97_bus ohci1394 snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq ieee1394 snd_seq_device tg3 snd_pcm_oss
snd_mixer_oss snd_pcm floppy e7xxx_edac snd_timer i2c_i801 snd edac_mc i2c_core
uhci_hcd soundcore ehci_hcd snd_page_alloc serio_raw dm_snapshot dm_zero
dm_mirror dm_mod raid1 ext3 jbd mptspi scsi_transport_spi mptscsih sd_mod
scsi_mod mptbase

Have tested it with and without many of the modules.

Version-Release number of selected component (if applicable):
2.6.17-1.2187_FC5smp
2.6.17-1.2174_FC5smp


How reproducible:
Boot machine, and wait up to an hour and sometimes though rarley more then an hour.

Steps to Reproduce:
1. Boot machine
2. Let machine Idle
3. Crash / panic
  
Actual results:
System crash because of kernel panic / BUG

Expected results:
System to continue to run without kernel errors.

Additional info:

Machine is a:
IBM 6221-67U
2Gb of ECC PC2100
2 Intel Xeon 3.2Ghz with Hyperthreading enabled.

I am working on getting other linux distros on to the machine to test if this is
just a Fedora bug. 
I have booted from a Knoppix CD and DVD which runs until I reboot the machine.
Comment 1 J. Adam Hough 2006-09-18 15:57:24 EDT
Created attachment 136579 [details]
lspci -v > lspci-verbose.txt
Comment 2 J. Adam Hough 2006-09-18 16:00:21 EDT
Created attachment 136580 [details]
A *.debug syslog file from the machine.

This file contains a *.debug syslog printout of the two most common errors that
are printed to the sreen.
Comment 3 Dave Jones 2006-09-18 16:22:48 EDT
two things to try.
1. give memtest86+ a spin.  A lot of the time bugs reported in the mm subsystem
turn out to be bad ram, or insufficient power/cooling.

2. There's an experimental kernel at
http://people.redhat.com/davej/kernels/Fedora/FC5/
Comment 4 J. Adam Hough 2006-09-18 16:45:20 EDT
I thought bad memory at first also.  I have let memtest86+ run for 15+ cycles
and no memory errors were reported by it.  

btw was testing what modules are needed for a crash. (though i think this is
random still)
#modprobe tg3
# BUG: unable to handle kernel paging request at virtual address 4919f100
 printing eip:
c0456a9f
*pde = 35c07001
Oops : 0000 [#1]
SMP
last sysfs file: /class/net/eth0/address
Module linked in tg3 i2c_dev -2c_core ipv6 dm_snapshot dm_zero dm_mirror dm_mod
raid1 ext3 jbd mptspi scsi_transport_spi mptscsih sd_mod scsi_mod mptbase
cpu:     2
EIP 006:[<c0456a9f>]  Not tainted VLI
EFLAGS: 00010a07   (2.6.17-1.2187_FC%smp #1)

(that was all the message it printed before hard locking)

I will install that new kernel now and see how it goes.
Comment 5 J. Adam Hough 2006-09-19 09:43:22 EDT
Created attachment 136634 [details]
*.debug of syslog running the newer kernel
Comment 6 J. Adam Hough 2006-09-19 09:52:00 EDT
Crashed again, but with a different address this time.

Should I try this again but with fewer modules? Those are the modules it autoloaded.

Sep 18 17:53:50 ahough kernel: BUG: unable to handle kernel paging request at
virtual address 7119f020
Sep 18 17:53:50 ahough kernel:  printing eip:
Sep 18 17:53:50 ahough kernel: c045d848
Sep 18 17:53:50 ahough kernel: *pde = 6b6b6b6b
Sep 18 17:53:50 ahough kernel: Oops: 0000 [#1]
Sep 18 17:53:50 ahough kernel: SMP
Sep 18 17:53:50 ahough kernel: last sysfs file: /block/hda/size
Sep 18 17:53:50 ahough kernel: Modules linked in: eeprom ibmasr hangcheck_timer
ipmi_devintf ipmi_watchdog ipmi_msghandler ipv6 ip_conntrack_netbios_ns
ipt_REJECT xt_tcpudp xt_state ip_conntrack nfnetlink iptable_filter ip_tables
x_tables loop video sbs i2c_ec button battery asus_acpi ac lp parport_pc parport
snd_intel8x0 snd_ac97_codec ohci1394 snd_ac97_bus snd_seq_dummy snd_seq_oss
snd_seq_midi_event ieee1394 uhci_hcd ehci_hcd floppy snd_seq sg snd_seq_device
snd_pcm_oss tg3 snd_mixer_oss snd_pcm serio_raw e7xxx_edac i2c_i801 snd_timer
edac_mc snd i2c_core ide_cd soundcore pcspkr snd_page_alloc cdrom dm_snapshot
dm_zero dm_mirror dm_mod raid1 ext3 jbd mptspi scsi_transport_spi mptscsih
sd_mod scsi_mod mptbase
Sep 18 17:53:50 ahough kernel: CPU:    3
Sep 18 17:53:50 ahough kernel: EIP:    0060:[<c045d848>]    Not tainted VLI
Sep 18 17:53:50 ahough kernel: EFLAGS: 00210286   (2.6.17-1.3001.fc5smp #1)
Sep 18 17:53:50 ahough kernel: EIP is at page_address+0xb/0x89
Sep 18 17:53:50 ahough kernel: eax: 7119f020   ebx: 7119f020   ecx: 00000008  
edx: 0cf81163
Sep 18 17:53:50 ahough kernel: esi: 000006a0   edi: c0007d40   ebp: c80d0ef8  
esp: c80d0ee8
Sep 18 17:53:50 ahough kernel: ds: 007b   es: 007b   ss: 0068
Sep 18 17:53:50 ahough kernel: Process sh (pid: 4663, ti=c80d0000 task=f707e030
task.ti=c80d0000)
Sep 18 17:53:50 ahough kernel: Stack: c0a0c080 7119f020 000006a0 c0007d40
c80d0f2c c045df76 c1f44900 00000000
Sep 18 17:53:50 ahough kernel:        00000000 00000044 00000000 000200d2
00000010 c1f44900 c1f44900 c1f44900
Sep 18 17:53:51 ahough kernel:        00000000 c80d0f38 c041d1ac 0000001f
c80d0f70 c047aa71 f751ed44 f751ed44
Sep 18 17:53:51 ahough kernel: Call Trace:
Sep 18 17:53:51 ahough kernel:  [<c045df76>] kmap_high+0x9c/0x212
Sep 18 17:53:51 ahough kernel:  [<c041d1ac>] kmap+0x45/0x48
Sep 18 17:53:51 ahough kernel:  [<c047aa71>] copy_strings+0xd4/0x1ad
Sep 18 17:53:51 ahough kernel:  [<c047ab66>] copy_strings_kernel+0x1c/0x2b
Sep 18 17:53:51 ahough kernel:  [<c047c523>] do_execve+0x113/0x1f9
Sep 18 17:53:51 ahough kernel:  [<c04022b4>] sys_execve+0x29/0x4d
Sep 18 17:53:51 ahough kernel:  [<c0403f5b>] syscall_call+0x7/0xb
Sep 18 17:53:51 ahough kernel: DWARF2 unwinder stuck at syscall_call+0x7/0xb
Sep 18 17:53:51 ahough kernel: Leftover inexact backtrace:
Sep 18 17:53:51 ahough kernel:  [<c0405331>] show_stack_log_lvl+0x8a/0x95
Sep 18 17:53:51 ahough kernel:  [<c0405468>] show_registers+0x12c/0x199
Sep 18 17:53:51 ahough kernel:  [<c0405665>] die+0x190/0x293
Sep 18 17:53:51 ahough kernel:  [<c0610275>] do_page_fault+0x4e9/0x5bb
Sep 18 17:53:52 ahough kernel:  [<c0404b8d>] error_code+0x39/0x40
Sep 18 17:53:52 ahough kernel:  [<c045df76>] kmap_high+0x9c/0x212
Sep 18 17:53:52 ahough kernel:  [<c041d1ac>] kmap+0x45/0x48
Sep 18 17:53:52 ahough kernel:  [<c047aa71>] copy_strings+0xd4/0x1ad
Sep 18 17:53:52 ahough kernel:  [<c047ab66>] copy_strings_kernel+0x1c/0x2b
Sep 18 17:53:52 ahough kernel:  [<c047c523>] do_execve+0x113/0x1f9
Sep 18 17:53:52 ahough kernel:  [<c04022b4>] sys_execve+0x29/0x4d
Sep 18 17:53:52 ahough kernel:  [<c0403f5b>] syscall_call+0x7/0xb
Sep 18 17:53:52 ahough kernel: Code: c0 6a 10 68 b5 51 63 c0 e8 50 7c fc ff 58
5a c9 31 c0 c3 55 83 c8 01 89 e5 e8 90 95 ff ff 5d c3 55 89 e5 57 56 53 89 c3 83
ec 04 <8b> 00 c1 e8 1e 8b 14 85 ac 76 72 c0 8b 82 0c 12 00 00 05 80 37
Sep 18 17:53:52 ahough kernel: EIP: [<c045d848>] page_address+0xb/0x89 SS:ESP
0068:c80d0ee8
Comment 7 J. Adam Hough 2006-09-19 15:44:01 EDT
Created attachment 136671 [details]
*.debug of syslog running the newer kernel

I am sorry I seem to be breaking things lately
Comment 8 J. Adam Hough 2006-09-21 09:46:42 EDT
After some digging I discovered that some BIOSes do not update their memory map
when legacy usb mouse / keyboard are enabled.  I have turned those of and was
able to get ~20 hours of uptime on the kernel you linked me to earlier.  I am
currently testing the current FC5 release kernel to see if it is also stable.
Sorry for wasting your time.
Comment 9 J. Adam Hough 2006-09-26 10:33:47 EDT
I am reopening the bug as I am still getting the "kernel BUG at
mm/highmem.c:561!".  I have run memtest and it was not able to find any errors.
(http://ahough2.ocs.lsu.edu/memtest/)

The most consistent way I have been able to duplicate the bug has been to try
and recompile the kernel source rpm.

This is running on the kernel you linked me to earlier.
Comment 10 J. Adam Hough 2006-09-28 10:38:51 EDT
Created attachment 137313 [details]
current fc5 kernel crash *.debug syslog.all
Comment 11 J. Adam Hough 2006-09-28 20:39:31 EDT
Replacing motherboard and CPUs as 4 red leds have lite up on the motherboard 1
by each cpu and 2 undocumented leds.  Will re-open only if the hardware
replacment does not fix the issue.
Comment 12 J. Adam Hough 2006-10-04 13:44:33 EDT
Okay motherboard and both processor have been replaced.  Memory has checked out
clean by memtest86+.  I am still getting the kernel panics / BUGs.
Comment 13 J. Adam Hough 2006-10-04 15:17:54 EDT
For grins I install the Fc6 test kernel 2.6.18-1.2726.fc6 (none PAE and PAE)

Both of which gave me something like this on boot up. (this is also very
similiar to  what I see happen on the FC5 kernels when it crashes in a way that
does not write to syslog. However I have not been fast enough to write it all down.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.18-1.2726.fc6 #1
-------------------------------------------------------
init/1 is trying to acquire lock:
 (&bdev_part_lock_key){--..}, at: [<c06133a2>] mutex_lock+0x21/0x24

but task is already holding lock:
 (&new->reconfig_mutex){--..}, at: [<c0613071>] mutex_lock_interruptible+0x21/0x24

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&new->reconfig_mutex){--..}:
       [<c043bf70>] lock_acquire+0x4b/0x6b
       [<c0612eb0>] __mutex_lock_interruptible_slowpath+0xbc/0x25c
       [<c0613071>] mutex_lock_interruptible+0x21/0x24
       [<c05a6e8a>] md_open+0x28/0x5d
       [<c0479f15>] do_open+0x8b/0x2f3
       [<c047a316>] blkdev_open+0x1d/0x46
       [<c0471806>] __dentry_open+0xc8/0x1ab
       [<c0471957>] nameidata_to_filp+0x1c/0x2e
       [<c0471997>] do_filp_open+0x2e/0x35
       [<c04719de>] do_sys_open+0x40/0xb5
       [<c0471a7f>] sys_open+0x16/0x18
       [<c0403fb7>] syscall_call+0x7/0xb

-> #1 (&bdev->bd_mutex){--..}:
       [<c043bf70>] lock_acquire+0x4b/0x6b
       [<c0613233>] __mutex_lock_slowpath+0xbc/0x20a
       [<c06133a2>] mutex_lock+0x21/0x24
       [<c0479ee6>] do_open+0x5c/0x2f3
       [<c047a1ec>] blkdev_get+0x6f/0x7a
       [<c0479f93>] do_open+0x109/0x2f3
       [<c047a1ec>] blkdev_get+0x6f/0x7a
       [<c047a44b>] open_by_devnum+0x30/0x3c
       [<c05a10dd>] md_import_device+0x212/0x231
       [<c05a59ce>] md_ioctl+0xac/0x1540
       [<c04df5b0>] blkdev_driver_ioctl+0x49/0x5b
       [<c04dfcd6>] blkdev_ioctl+0x714/0x762
       [<c04796ef>] block_ioctl+0x16/0x1b
       [<c0482f3a>] do_ioctl+0x22/0x67
       [<c04831d7>] vfs_ioctl+0x258/0x26b
       [<c0483231>] sys_ioctl+0x47/0x62
       [<c0403fb7>] syscall_call+0x7/0xb
-> #0 (&bdev_part_lock_key){--..}:
       [<c043bf70>] lock_acquire+0x4b/0x6b
       [<c0613233>] __mutex_lock_slowpath+0xbc/0x20a
       [<c06133a2>] mutex_lock+0x21/0x24
       [<c0479a46>] bd_claim_by_disk+0x5f/0x169
       [<c05a1306>] bind_rdev_to_array+0x20a/0x228
       [<c05a31ad>] autorun_devices+0x1c8/0x29d
       [<c05a5a26>] md_ioctl+0x104/0x1540
       [<c04df5b0>] blkdev_driver_ioctl+0x49/0x5b
       [<c04dfcd6>] blkdev_ioctl+0x714/0x762
       [<c04796ef>] block_ioctl+0x16/0x1b
       [<c0482f3a>] do_ioctl+0x22/0x67
       [<c04831d7>] vfs_ioctl+0x258/0x26b
       [<c0483231>] sys_ioctl+0x47/0x62
       [<c0403fb7>] syscall_call+0x7/0xb

other info that might help us debug this:

1 lock held by init/1:
 #0:  (&new->reconfig_mutex){--..}, at: [<c0613071>]
mutex_lock_interruptible+0x21/0x24

stack backtrace:
 [<c04051ed>] show_trace_log_lvl+0x58/0x16a
 [<c04057fa>] show_trace+0xd/0x10
 [<c0405913>] dump_stack+0x19/0x1b
 [<c043b0e7>] print_circular_bug_tail+0x59/0x64
 [<c043b870>] __lock_acquire+0x77e/0x90d
 [<c043bf70>] lock_acquire+0x4b/0x6b
 [<c0613233>] __mutex_lock_slowpath+0xbc/0x20a
 [<c06133a2>] mutex_lock+0x21/0x24
 [<c0479a46>] bd_claim_by_disk+0x5f/0x169
 [<c05a1306>] bind_rdev_to_array+0x20a/0x228
 [<c05a31ad>] autorun_devices+0x1c8/0x29d
 [<c05a5a26>] md_ioctl+0x104/0x1540
 [<c04df5b0>] blkdev_driver_ioctl+0x49/0x5b
 [<c04dfcd6>] blkdev_ioctl+0x714/0x762
 [<c04796ef>] block_ioctl+0x16/0x1b
 [<c0482f3a>] do_ioctl+0x22/0x67
 [<c04831d7>] vfs_ioctl+0x258/0x26b
 [<c0483231>] sys_ioctl+0x47/0x62
 [<c0403fb7>] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb
Leftover inexact backtrace:
 [<c04057fa>] show_trace+0xd/0x10
 [<c0405913>] dump_stack+0x19/0x1b
 [<c043b0e7>] print_circular_bug_tail+0x59/0x64
 [<c043b870>] __lock_acquire+0x77e/0x90d
 [<c043bf70>] lock_acquire+0x4b/0x6b
 [<c0613233>] __mutex_lock_slowpath+0xbc/0x20a
 [<c06133a2>] mutex_lock+0x21/0x24
 [<c0479a46>] bd_claim_by_disk+0x5f/0x169
 [<c05a1306>] bind_rdev_to_array+0x20a/0x228
 [<c05a31ad>] autorun_devices+0x1c8/0x29d
 [<c05a5a26>] md_ioctl+0x104/0x1540
 [<c04df5b0>] blkdev_driver_ioctl+0x49/0x5b
 [<c04dfcd6>] blkdev_ioctl+0x714/0x762
 [<c04796ef>] block_ioctl+0x16/0x1b
 [<c0482f3a>] do_ioctl+0x22/0x67
 [<c04831d7>] vfs_ioctl+0x258/0x26b
 [<c0483231>] sys_ioctl+0x47/0x62
 [<c0403fb7>] syscall_call+0x7/0xb

Comment 14 J. Adam Hough 2006-10-09 10:47:34 EDT
I have installed ubuntu and am running their 2.6.15-27-686 kernel.  The machine
is working fine.  I am going to recompile some older fedora kernels to test them
out.
Comment 15 J. Adam Hough 2006-10-09 11:42:27 EDT
Interesting.  If I chroot into FC5 and try to recompile a FC kernel I get the
type of kernel panic. (Unable to handle kernel paging request at virtual address
80010004).  Makes me think that their is a problem not with the kernel but maybe
with some io library inside of fedora.
Comment 16 J. Adam Hough 2006-10-12 11:49:23 EDT
Created attachment 138337 [details]
The 2.6.18-1.2189.fc5smp development kernel
Comment 17 Dave Jones 2006-10-16 20:53:29 EDT
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.
Comment 18 J. Adam Hough 2006-10-17 13:35:17 EDT
I believe that something in the newer kernel has fixed the issue that I was having.

Note You need to log in before you can comment on or make changes to this bug.