Bug 160135
Summary: | kernel panic in ioremap with four 1GB DIMMs (2.6.9-11.ELsmp) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Nate Faerber <nfaerber> | ||||||||
Component: | kernel | Assignee: | Jim Paradis <jparadis> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 4.0 | CC: | bilbrey, brett.morrow, jparadis, knweiss, netllama, peterm, ppokorny, wusel+rhbug | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHSA-2005-808 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2005-10-27 15:06:56 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Nate Faerber
2005-06-11 03:13:26 UTC
Created attachment 115323 [details]
ioremap patch adapted from SLES9 SP2
This patch has been added to a testing kernel. Any feed back is much appreciated: http://people.redhat.com/~jbaron/rhel4/ Jason, The 2.6.9-11.20.ELsmp kernel that was in your rhel4 directory up until yesterday sill exhibited the problem. Do you think the 11.21 version will work better? thanks for the feedback. no, 11.21 wouldn't make a difference. The patch from comment #1 is basically in 11.20, so i'm bit surprised this is still failing. we'll have to dig deeper.... I looked at the Source RPM and the patch you included (linux-2.6.9-ioremap-fixes.patch) is not identical to the patch in comment #1. Was your patch supposed to fix other things as well as this problem? It will take me some time to figure which part of the good patch is missing from yours. I can assure you that applying the patch in comment #1 to a 2.6.9-11.EL kernel will fix my problem. We are getting the same kernel panic on a dual opteron hp xw9300 with 4 GB on Red Hat Linux 4 Update 1 WS with kernel 2.6.9-11.ELsmp x86-64 when we boot with the kernel argument "acpi=off". (We'll get a different kernel panic without acpi=off - see my other bug): audit(1121422223.323:0): initialized Red Hat nash version 4.2.1.3 starting File descriptor 3 left open Reading all physical volumes. This may take a while... Found volume group "vg01" using metadata type lvm2 Found volume group "vg00" using metadata type lvm2 File descriptor 3 left open 2 logical volume(s) in volume group "vg00" now active INIT: version 2.85 booting Welcome to Red Hat Enterprise Linux WS Press 'I' to enter interactive startup. udev starten: [ OK ] Initialisiere Hardware... Speicher NetzwerkUnable to handle kernel paging request at 00000000000018f0 RIP: <ffffffff80122f32>{ioremap_nocache+196} PML4 17ed1f067 PGD 17f29f067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mp u401_uart snd_rawmidd Pid: 1069, comm: modprobe Not tainted 2.6.9-11.ELsmp RIP: 0010:[<ffffffff80122f32>] <ffffffff80122f32>{ioremap_nocache+196} RSP: 0018:0000010037cebd58 EFLAGS: 00010213 RAX: 00000100f2101000 RBX: 00000000f2101000 RCX: 0000000000000019 RDX: ffffffff7fffffff RSI: 0000010180000000 RDI: 0000000000000000 RBP: 0000000000001000 R08: 0000000000000008 R09: 0000000000000246 R10: 0000000000000000 R11: 0000000000000000 R12: ffffff0000018000 R13: dead4ead00000001 R14: dead4ead00000001 R15: 000001017fc14dc0 FS: 0000002a9557db00(0000) GS:ffffffff804c1780(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000018f0 CR3: 0000000008032000 CR4: 00000000000006e0 Process modprobe (pid: 1069, threadinfo 0000010037cea000, task 0000010037d467f0) Stack: 0000000000000000 000001017eb81780 0000000000000004 ffffffffa0141a57 000001017fc64cc0 ffffffff801c6041 000001017eb186e0 0000000000000246 000001007ffd6400 000001017ffe9800 Call Trace:<ffffffffa0141a57>{:snd_intel8x0:snd_intel8x0_probe+527} <ffffffff801c6041>{selinux_inode_alloc_security+72} <ffffffff801aac65>{sysfs_new_dirent+26} <ffffffff80187a9e>{dput+55} <ffffffff801aaeaa>{create_dir+305} <ffffffff801e5894>{pci_device_probe+110} <ffffffff802388b1>{bus_match+57} <ffffffff802389af>{driver_attach+68} <ffffffff80238ccb>{bus_add_driver+143} <ffffffff801e5604>{pci_register_driver+119} <ffffffffa014b00e>{:snd_intel8x0:alsa_card_intel8x0_init+14} <ffffffff8014d52c>{sys_init_module+316} <ffffffff8011003e>{system_call+126} Code: 48 8b 8f f0 18 00 00 76 10 48 b8 00 00 00 80 00 01 00 00 48 RIP <ffffffff80122f32>{ioremap_nocache+196} RSP <0000010037cebd58> CR2: 00000000000018f0 <0>Kernel panic - not syncing: Oops Here's the panic we get without acpi=off (but still smp kernel): Unable to handle kernel paging request at 00000000000018f0 RIP: <ffffffff80122f32>{ioremap_nocache+196} PML4 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.9-11.ELsmp RIP: 0010:[<ffffffff80122f32>] <ffffffff80122f32>{ioremap_nocache+196} RSP: 0000:000001017ffb1f08 EFLAGS: 00010213 RAX: 00000100e0000000 RBX: 00000000e0000000 RCX: 0000000000000019 RDX: ffffffff7fffffff RSI: 0000010180000000 RDI: 0000000000000000 RBP: 0000000010000000 R08: 0000000000000008 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffff0000080000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff804c1700(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000000018f0 CR3: 0000000000101000 CR4: 00000000000006e0 Process swapper (pid: 1, threadinfo 000001017ffb0000, task 0000010037e4a7f0) Stack: ffffffff804ee4c8 0000000000000000 0000000000000000 ffffffff804e4bea 0000000000000246 ffffffff8010c3eb 0000000000000246 0000000000000000 0000000000000000 ffffffff80110c8f Call Trace:<ffffffff804e4bea>{pci_mmcfg_init+32} <ffffffff8010c3eb>{init+474} <ffffffff80110c8f>{child_rip+8} <ffffffff8010c211>{init+0} <ffffffff80110c87>{child_rip+0} Code: 48 8b 8f f0 18 00 00 76 10 48 b8 00 00 00 80 00 01 00 00 48 RIP <ffffffff80122f32>{ioremap_nocache+196} RSP <000001017ffb1f08> CR2: 00000000000018f0 <0>Kernel panic - not syncing: Oops hmmm, i've dug into this one a bit...the crash here is happening in 'pfn_to_page' which is called by virt_to_page. It seems that NODE_DATA(nid) is NULL and thus node_start_pfn is NULL. The reason, i suspect, the patch in comment #1 works is b/c it no longer uses virt_to_page. However, i think the underlying problem is still present. This seems to be an issue with how NUMA is configured in RHEL4 x86_64. Also, consistent witht this is the fact that if i boot with 'numa=off' rhel4 u1 seems to work fine. THus, i would suggest that as a temporary workaround until we can get to the bottom of this. thanks. With kernel /vmlinuz-2.6.9-11.ELsmp ro root=/dev/vg00/root rhgb quiet numa=off the kernel still panics. Unfortunately, I can't show you the oops because the machine is at a remote location. I will try acpi=off and numa=off next. Okay, I did another test with acpi=off *and* numa=off. Now the smp kernel finally boots. However, I get the following kernel message now: Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 128 has empty cpu mask Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 129 has empty cpu mask . . . Jul 18 17:00:05 esw0001 kernel: k8-bus.c: bus 254 has empty cpu mask Jul 18 17:00:05 esw0001 kernel: k8-bus.c: bus 255 has empty cpu mask Here's the boot log: Jul 18 17:00:01 esw0001 syslogd 1.4.1: restart. Jul 18 17:00:01 esw0001 syslog: Starten von syslogd succeeded Jul 18 17:00:02 esw0001 kernel: klogd 1.4.1, log source = /proc/kmsg started. Jul 18 17:00:02 esw0001 kernel: Bootdata ok (command line is ro root=/dev/vg00/root rhgb quiet acpi=off numa=off) Jul 18 17:00:02 esw0001 kernel: Linux version 2.6.9-11.ELsmp (bhcompile.redhat.com) (gcc version 3.4.3 20050227 (Red Hat 3.4.3-22)) #1 SMP Fri M ay 20 18:25:30 EDT 2005 Jul 18 17:00:02 esw0001 kernel: BIOS-provided physical RAM map: Jul 18 17:00:02 esw0001 kernel: BIOS-e820: 0000000000000000 - 000000000009d400 (usable) Jul 18 17:00:02 esw0001 kernel: BIOS-e820: 000000000009d400 - 00000000000a0000 (reserved) Jul 18 17:00:02 esw0001 kernel: BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved) Jul 18 17:00:02 esw0001 kernel: BIOS-e820: 0000000000100000 - 000000007fff9500 (usable) Jul 18 17:00:02 esw0001 kernel: BIOS-e820: 000000007fff9500 - 0000000080000000 (reserved) Jul 18 17:00:02 esw0001 kernel: BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) Jul 18 17:00:02 esw0001 kernel: BIOS-e820: 0000000100000000 - 0000000180000000 (usable) Jul 18 17:00:02 esw0001 syslog: Starten von klogd succeeded Jul 18 17:00:02 esw0001 irqbalance: Starten von irqbalance succeeded Jul 18 17:00:02 esw0001 kernel: Warning: acpi_table_parse(ACPI_SLIT) returned 0! Jul 18 17:00:02 esw0001 kernel: NUMA turned off Jul 18 17:00:02 esw0001 kernel: Faking a node at 0000000000000000-0000000180000000 Jul 18 17:00:02 esw0001 kernel: Bootmem setup node 0 0000000000000000-0000000180000000 Jul 18 17:00:02 esw0001 kernel: No mptable found. Jul 18 17:00:02 esw0001 kernel: Nvidia board detected. Ignoring ACPI timer override. Jul 18 17:00:02 esw0001 kernel: Intel MultiProcessor Specification v1.4 Jul 18 17:00:02 esw0001 kernel: Virtual Wire compatibility mode. Jul 18 17:00:02 esw0001 kernel: OEM ID: HP <6>Product ID: workstation <6>APIC at: 0xFEE00000 Jul 18 17:00:02 esw0001 kernel: Processor #0 15:5 APIC version 16 Jul 18 17:00:02 esw0001 kernel: Processor #1 15:5 APIC version 16 Jul 18 17:00:02 esw0001 kernel: I/O APIC #8 Version 17 at 0xFEC00000. Jul 18 17:00:02 esw0001 kernel: I/O APIC #9 Version 17 at 0xF2600000. Jul 18 17:00:02 esw0001 kernel: I/O APIC #10 Version 17 at 0xF2601000. Jul 18 17:00:02 esw0001 kernel: I/O APIC #11 Version 17 at 0xF2700000. Jul 18 17:00:02 esw0001 kernel: Processors: 2 Jul 18 17:00:02 esw0001 kernel: Checking aperture... Jul 18 17:00:02 esw0001 kernel: CPU 0: aperture @ 8000000 size 32 MB Jul 18 17:00:02 esw0001 kernel: Aperture from northbridge cpu 0 too small (32 MB) Jul 18 17:00:02 esw0001 kernel: No AGP bridge found Jul 18 17:00:02 esw0001 portmap: Starten von portmap succeeded Jul 18 17:00:02 esw0001 kernel: Your BIOS doesn't leave a aperture memory hole Jul 18 17:00:02 esw0001 kernel: Please enable the IOMMU option in the BIOS setup Jul 18 17:00:02 esw0001 kernel: This costs you 64 MB of RAM Jul 18 17:00:02 esw0001 kernel: Mapping aperture over 65536 KB of RAM @ 8000000 Jul 18 17:00:02 esw0001 kernel: Built 1 zonelists Jul 18 17:00:02 esw0001 kernel: Kernel command line: ro root=/dev/vg00/root rhgb quiet acpi=off numa=off console=tty0 Jul 18 17:00:02 esw0001 kernel: Initializing CPU#0 Jul 18 17:00:02 esw0001 kernel: PID hash table entries: 4096 (order: 12, 131072 bytes) Jul 18 17:00:02 esw0001 kernel: time.c: Using 1.193182 MHz PIT timer. Jul 18 17:00:02 esw0001 kernel: time.c: Detected 2593.109 MHz processor. Jul 18 17:00:02 esw0001 rpc.statd[2261]: Version 1.0.6 Starting Jul 18 17:00:02 esw0001 kernel: Console: colour VGA+ 80x25 Jul 18 17:00:02 esw0001 kernel: Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Jul 18 17:00:02 esw0001 kernel: Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) Jul 18 17:00:02 esw0001 kernel: Memory: 4023244k/6291456k available (2033k kernel code, 0k reserved, 1252k data, 188k init) Jul 18 17:00:02 esw0001 kernel: Security Scaffold v1.0.0 initialized Jul 18 17:00:02 esw0001 kernel: SELinux: Initializing. Jul 18 17:00:02 esw0001 rpc.statd[2261]: gethostbyname error for esw0001 Jul 18 17:00:02 esw0001 kernel: SELinux: Starting in permissive mode Jul 18 17:00:02 esw0001 kernel: There is already a security framework initialized, register_security failed. Jul 18 17:00:02 esw0001 nfslock: Starten von rpc.statd succeeded Jul 18 17:00:02 esw0001 kernel: selinux_register_security: Registering secondary module capability Jul 18 17:00:02 esw0001 kernel: Capability LSM initialized as secondary Jul 18 17:00:02 esw0001 kernel: Mount-cache hash table entries: 256 (order: 0, 4096 bytes) Jul 18 17:00:02 esw0001 kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) Jul 18 17:00:02 esw0001 kernel: CPU: L2 Cache: 1024K (64 bytes/line) Jul 18 17:00:02 esw0001 kernel: Using local APIC NMI watchdog using perfctr0 Jul 18 17:00:02 esw0001 kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) Jul 18 17:00:02 esw0001 kernel: CPU: L2 Cache: 1024K (64 bytes/line) Jul 18 17:00:02 esw0001 kernel: CPU0: AMD Opteron(tm) Processor 252 stepping 01 Jul 18 17:00:02 esw0001 kernel: per-CPU timeslice cutoff: 1023.90 usecs. Jul 18 17:00:02 esw0001 kernel: task migration cache decay timeout: 2 msecs. Jul 18 17:00:02 esw0001 kernel: Booting processor 1/1 rip 6000 rsp 10006455f58 Jul 18 17:00:02 esw0001 kernel: Initializing CPU#1 Jul 18 17:00:02 esw0001 kernel: CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) Jul 18 17:00:02 esw0001 kernel: CPU: L2 Cache: 1024K (64 bytes/line) Jul 18 17:00:02 esw0001 kernel: AMD Opteron(tm) Processor 252 stepping 01 Jul 18 17:00:02 esw0001 kernel: Total of 2 processors activated (10305.53 BogoMIPS). Jul 18 17:00:02 esw0001 kernel: Using IO-APIC 8 Jul 18 17:00:02 esw0001 kernel: Using IO-APIC 9 Jul 18 17:00:02 esw0001 kernel: Using IO-APIC 10 Jul 18 17:00:02 esw0001 kernel: Using IO-APIC 11 Jul 18 17:00:02 esw0001 kernel: Using local APIC timer interrupts. Jul 18 17:00:02 esw0001 kernel: Detected 12.466 MHz APIC timer. Jul 18 17:00:02 esw0001 kernel: checking TSC synchronization across 2 CPUs: passed. Jul 18 17:00:02 esw0001 kernel: time.c: Using PIT/TSC based timekeeping. Jul 18 17:00:02 esw0001 kernel: Brought up 2 CPUs Jul 18 17:00:02 esw0001 kernel: checking if image is initramfs... it is Jul 18 17:00:02 esw0001 kernel: NET: Registered protocol family 16 Jul 18 17:00:02 esw0001 kernel: PCI: Using configuration type 1 Jul 18 17:00:02 esw0001 kernel: mtrr: v2.0 (20020519) Jul 18 17:00:02 esw0001 kernel: ACPI: Subsystem revision 20040816 Jul 18 17:00:02 esw0001 kernel: ACPI: Interpreter disabled. Jul 18 17:00:02 esw0001 kernel: usbcore: registered new driver usbfs Jul 18 17:00:02 esw0001 kernel: usbcore: registered new driver hub Jul 18 17:00:02 esw0001 kernel: PCI: Probing PCI hardware Jul 18 17:00:02 esw0001 kernel: PCI: Probing PCI hardware (bus 00) Jul 18 17:00:02 esw0001 rpcidmapd: Starten von rpc.idmapd succeeded Jul 18 17:00:02 esw0001 kernel: PCI: Transparent bridge - 0000:00:09.0 Jul 18 17:00:02 esw0001 kernel: PCI: Discovered primary peer bus 41 [IRQ] Jul 18 17:00:02 esw0001 kernel: PCI: Discovered primary peer bus 61 [IRQ] Jul 18 17:00:02 esw0001 kernel: PCI: Discovered primary peer bus 81 [IRQ] Jul 18 17:00:02 esw0001 kernel: PCI: Using IRQ router default [10de/0051] at 0000:00:01.0 Jul 18 17:00:02 esw0001 kernel: PCI->APIC IRQ transform: (B0,I1,P0) -> 11 Jul 18 17:00:02 esw0001 kernel: PCI->APIC IRQ transform: (B0,I2,P0) -> 5 Jul 18 17:00:02 esw0001 kernel: PCI->APIC IRQ transform: (B0,I2,P1) -> 10 Jul 18 17:00:02 esw0001 kernel: PCI->APIC IRQ transform: (B0,I4,P0) -> 11 Jul 18 17:00:02 esw0001 kernel: PCI->APIC IRQ transform: (B0,I7,P0) -> 5 Jul 18 17:00:02 esw0001 kernel: PCI->APIC IRQ transform: (B0,I8,P0) -> 10 Jul 18 17:00:02 esw0001 kernel: PCI->APIC IRQ transform: (B0,I10,P0) -> 11 Jul 18 17:00:02 esw0001 kernel: PCI->APIC IRQ transform: (B5,I5,P0) -> 10 Jul 18 17:00:02 esw0001 kernel: PCI->APIC IRQ transform: (B10,I0,P0) -> 5 Jul 18 17:00:02 esw0001 kernel: PCI->APIC IRQ transform: (B97,I6,P0) -> 5 Jul 18 17:00:02 esw0001 kernel: PCI->APIC IRQ transform: (B97,I6,P1) -> 10 Jul 18 17:00:02 esw0001 kernel: PCI-DMA: Disabling AGP. Jul 18 17:00:03 esw0001 kernel: PCI-DMA: aperture base @ 8000000 size 65536 KB Jul 18 17:00:03 esw0001 kernel: PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture Jul 18 17:00:03 esw0001 netfs: Andere Dateisysteme einhÃâ¬ngen: succeeded Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 128 has empty cpu mask Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 129 has empty cpu mask Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 130 has empty cpu mask Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 131 has empty cpu mask Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 132 has empty cpu mask Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 133 has empty cpu mask Jul 18 17:00:03 esw0001 rc: lm_sensors starten: succeeded Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 134 has empty cpu mask Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 135 has empty cpu mask Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 136 has empty cpu mask Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 137 has empty cpu mask Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 138 has empty cpu mask Jul 18 17:00:03 esw0001 kernel: k8-bus.c: bus 139 has empty cpu mask etc. could you plese test the u2 beta. you should be able to boot with both acpi=off and numa=off *NOT* set. http://people.redhat.com/~jbaron/rhel4/ thanks. (In reply to comment #22) > could you plese test the u2 beta. you should be able to boot with both acpi=off > and numa=off *NOT* set. http://people.redhat.com/~jbaron/rhel4/ Hi Jason! Good news: I've tried your u2 beta kernel (x86_64). This kernel now boots fine without the acpi=off and numa=off kernel parameters. Bad news: It crashed as soon as loaded the rebuilt nvidia kernel module (NVIDIA-Linux-x86_64-1.0-7667-pkg2.run). Unfortunately, I can't give you more details right now because the machine is at a remote location. (BTW: This is a 3d graphics workstation and the NVIDIA driver is mandatory.) This may be a separate issue, then. Please attach a console capture of the crash at your earliest convenience; we may recommend filing a separate issue. I'll try to capture a new oops as soon as possible. Regarding NVIDIA: Even with 2.6.9-11.ELsmp I get the following warnings: NVRM: loading NVIDIA Linux x86_64 NVIDIA Kernel Module 1.0-7676 Fri Jul 29 13:15:16 PDT 2005 NVRM: WARNING: Your Linux kernel has problems in its implementation of NVRM: the change_page_attr kernel interface. The NVIDIA kernel NVRM: module will attempt to work around these problems, but NVRM: system stability may be affected. It is recommended that NVRM: you update to a 2.6.11 or newer kernel. NVRM: bad caching on address 0x1014fe2f000: actual 0x163 != expected 0x173 NVRM: please see the README section on Cache Aliasing for more information NVRM: bad caching on address 0x10150c59000: actual 0x163 != expected 0x173 NVRM: bad caching on address 0x1016deec000: actual 0x163 != expected 0x173 NVRM: bad caching on address 0x1016deed000: actual 0x163 != expected 0x173 NVRM: bad caching on address 0x101648c4000: actual 0x163 != expected 0x173 NVRM: bad caching on address 0x101648c5000: actual 0x163 != expected 0x173 NVRM: bad caching on address 0x10173d76000: actual 0x163 != expected 0x173 NVRM: bad caching on address 0x10173d77000: actual 0x163 != expected 0x173 NVRM: bad caching on address 0x10169de8000: actual 0x163 != expected 0x173 NVRM: bad caching on address 0x10169de9000: actual 0x163 != expected 0x173 The NVIDIA README says Cache Aliasing Cache aliasing occurs when multiple mappings to a physical page of memory have conflicting caching states, such as cached and uncached. Due to these conflicting states, data in that physical page may become corrupted when the processor's cache is flushed. If that page is being used for dma by a driver such as NVIDIA's graphics driver, this can lead to hardware stability problems and system lockups. NVIDIA has encountered bugs with some Linux kernel versions that lead to cache aliasing. Although some systems will run perfectly fine when cache aliasing occurs, other systems will experience severe stability problems, including random lockups. Users experiencing stability problems due to cache aliasing will benefit from updating to a kernel that does not cause cache aliasing to occur. NVIDIA has added driver logic to detect cache aliasing and to print a warning with a message similar to the following: NVRM: bad caching on address 0x1cdf000: actual 0x46 != expected 0x73 If you see this message in your log files and are experiencing stability problems, you should update your kernel to the latest version. If the message persists after updating your kernel, please send a bug report to NVIDIA. Is there any chance that we'll get a kernel for RHEL4 without those Cache Aliasing issues? Okay, here are some new kernel Ooopses: ======================================================================= This is 2.6.9-15ELsmp booted with the following grub settings: title Red Hat Enterprise Linux WS (2.6.9-15.ELsmp) Update2 Beta Kernel root (hd0,0) kernel /vmlinuz-2.6.9-15.ELsmp ro root=/dev/vg00/root rhgb quiet console=tty0 console=ttyS0,38400n8 initrd /initrd-2.6.9-15.ELsmp.img This one boots fine. But when I recompile the NVIDIA-1.0-7676 kernel module and load it into the kernel I get the following oops: Unable to handle kernel paging request at 00000000000018f0 RIP: <ffffffff80123d00>{iounmap+304} PML4 1791fe067 PGD 177312067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: nvidia(U) nfs nfsd exportfs lockd md5 ipv6 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core button battery ac ohci_hcd ehci_hcd sndd Pid: 4839, comm: X Tainted: P 2.6.9-15.ELsmp RIP: 0010:[<ffffffff80123d00>] <ffffffff80123d00>{iounmap+304} RSP: 0018:0000010177673ab8 EFLAGS: 00010213 RAX: 00000100e0000000 RBX: 000001017fd9df00 RCX: 0000000000000019 RDX: ffffffff7fffffff RSI: 0000000000002000 RDI: 00000000e0000000 RBP: ffffff0000030000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000002a9557f3e0(0000) GS:ffffffff804d3300(0000) knlGS:00000000f7fd66c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000018f0 CR3: 0000000002c22000 CR4: 00000000000006e0 Process X (pid: 4839, threadinfo 0000010177672000, task 000001007f77c030) Stack: ffffffff803cbda8 ffffff0000030000 000001007f124800 ffffffffa04c56da 0000000000000080 ffffffffa02c60e0 000001006ec91500 00000000005e10de 0000000000000000 ffffffffa02c2d18 Call Trace:<ffffffffa04c56da>{:nvidia:os_unmap_kernel_space+9} <ffffffffa02c60e0>{:nvidia:_nv002012rm+42} <ffffffffa02c2d18>{:nvidia:_nv002316rm+208} <ffffffffa02c219b>{:nvidia:_nv002327rm+255} <ffffffffa02c2370>{:nvidia:_nv002284rm+100} <ffffffffa02ba06b>{:nvidia:_nv002166rm+39} <ffffffffa02c2458>{:nvidia:_nv002328rm+64} <ffffffffa02c8df7>{:nvidia:_nv003667rm+141} <ffffffffa02c8d3b>{:nvidia:_nv003623rm+275} <ffffffffa043c778>{:nvidia:_nv003247rm+126} <ffffffffa03ef948>{:nvidia:_nv004556rm+68} <ffffffffa03ef726>{:nvidia:_nv004385rm+104} <ffffffffa02c8b04>{:nvidia:_nv001453rm+96} <ffffffffa03a0338>{:nvidia:_nv000393rm+20} <ffffffffa03a04b3>{:nvidia:_nv000397rm+125} <ffffffffa02cb951>{:nvidia:_nv001426rm+141} <ffffffffa02c9542>{:nvidia:_nv001458rm+668} <ffffffffa02cc8f4>{:nvidia:rm_init_adapter+104} <ffffffffa04bf66f>{:nvidia:nv_kern_open+684} <ffffffff8017eb3c>{chrdev_open+412} <ffffffff801760a0>{dentry_open+223} <ffffffff801761db>{filp_open+62} <ffffffff801e9c85>{strncpy_from_user+74} <ffffffff801762cd>{get_unused_fd+230} <ffffffff801763bc>{sys_open+57} <ffffffff80110052>{system_call+126} Code: 49 8b 88 f0 18 00 00 76 1b 48 b8 00 00 00 80 00 01 00 00 48 RIP <ffffffff80123d00>{iounmap+304} RSP <0000010177673ab8> CR2: 00000000000018f0 <0>Kernel panic - not syncing: Oops ========================================================== This is 2.6.9-16.ELsmp booted with the following grub settings: title Red Hat Enterprise Linux WS (2.6.9-16.ELsmp) Update2 Beta Kernel root (hd0,0) kernel /vmlinuz-2.6.9-16.ELsmp ro root=/dev/vg00/root rhgb quiet console=tty0 console=ttyS0,38400n8 initrd /initrd-2.6.9-16.ELsmp.img (The line "Pid: 1, comm: swapper Not tainted 2.6.9-11.EL" really confuses me!) Unable to handle kernel NULL pointer dereference at 0000000000000002 RIP: <ffffffff80241567>{acpi_pci_root_add+296} PML4 7fdcc067 PGD 0 Oops: 0000 [1] CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.9-11.EL RIP: 0010:[<ffffffff80241567>] <ffffffff80241567>{acpi_pci_root_add+296} RSP: 0018:000001007ff83e08 EFLAGS: 00010206 RAX: 0000000000ff0002 RBX: 000001007fdb72c0 RCX: ffffffff804d478c RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 000001007ffde828 R10: 0000000000000000 R11: 0000000000000000 R12: 00000100065a6c00 R13: ffffffff80432320 R14: 0000000000000000 R15: 0000010037ff0f00 FS: 0000000000000000(0000) GS:ffffffff8051e980(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000002 CR3: 0000000000101000 CR4: 00000000000006e0 Process swapper (pid: 1, threadinfo 000001007ff82000, task 000001007ff81110) Stack: 000000000000001a 0000000000000000 0000000000000000 00000000000000ff ffffffff80431d00 00000100065a6c00 ffffffff80431d00 ffffffff80245f64 ffffffff80431d00 00000100065a6c00 Call Trace:<ffffffff80245f64>{acpi_bus_driver_init+49} <ffffffff802471be>{acpi_bus_add+2715} <ffffffff80226639>{acpi_os_wait_semaphore+133} <ffffffff8023c7e4>{acpi_ut_acquire_mutex+114} <ffffffff80539506>{acpi_scan_init+450} <ffffffff8010c3d9>{init+336} <ffffffff80111373>{child_rip+8} <ffffffff8010c289>{init+0} <ffffffff8011136b>{child_rip+0} Code: 48 8b 02 0f 18 08 48 81 fa 10 1e 43 80 74 68 8b 43 18 39 42 RIP <ffffffff80241567>{acpi_pci_root_add+296} RSP <000001007ff83e08> CR2: 0000000000000002 <0>Kernel panic - not syncing: Oops As you can see it crashes right at the beginning of the boot process. (The line "Pid: 1, comm: swapper Not tainted 2.6.9-11.EL" really confuses me!) It means that you're running the wrong kernel. On panic, the kernel prints the version string that was compiled in; if you see the wrong one then it means you're running the wrong kernel (you may have accidentally overwritten the 16.EL file or somesuch). In the meantime, pull down the SMP test kernel from http://people.redhat.com/~jparadis/numa and tell me how that works. It's the .16 kernel with an additional fix for the "Unable to handle kernel paging request at 00000000000018f0" issue. Hi Jim! The 2.6.9-16.EL.root.numafixsmp kernel boots fine but it still oopses immediately when I load the latest NVIDIA driver (1.0-7676) BTW: I did the following to compile the NVIDIA driver because there was no devel package for your root.numafix kernel: root@esw0001 # cd /lib/modules/2.6.9-16.EL.root.numafixsmp/ root@esw0001 # ln -s/usr/src/kernels/2.6.9-16.EL-smp-x86_64/ build root@esw0001 # ln -s build/ source Here's the oops: Unable to handle kernel paging request at 00000000000018f0 RIP: <ffffffff80123cb0>{iounmap+304} PML4 7a4cf067 PGD 7c910067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: nvidia(U) nfs nfsd exportfs lockd md5 ipv6 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core button battery ac ohci_hcd ehci_hcd sndd Pid: 4873, comm: X Tainted: P 2.6.9-16.EL.root.numafixsmp RIP: 0010:[<ffffffff80123cb0>] <ffffffff80123cb0>{iounmap+304} RSP: 0018:000001017ed1fab8 EFLAGS: 00010213 RAX: 00000100e0000000 RBX: 00000100082b0dc0 RCX: 0000000000000019 RDX: ffffffff7fffffff RSI: 0000000000002000 RDI: 00000000e0000000 RBP: ffffff0000030000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000002a9557f3e0(0000) GS:ffffffff804d3480(0000) knlGS:00000000f7fd66c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000018f0 CR3: 0000000000101000 CR4: 00000000000006e0 Process X (pid: 4873, threadinfo 000001017ed1e000, task 000001017b2cc030) Stack: ffffffff803cbfa8 ffffff0000030000 000001007ef3e800 ffffffffa04c66da 0000000000000080 ffffffffa02c70e0 0000010170e19500 00000000005e10de 0000000000000000 ffffffffa02c3d18 Call Trace:<ffffffffa04c66da>{:nvidia:os_unmap_kernel_space+9} <ffffffffa02c70e0>{:nvidia:_nv002012rm+42} <ffffffffa02c3d18>{:nvidia:_nv002316rm+208} <ffffffffa02c319b>{:nvidia:_nv002327rm+255} <ffffffffa02c3370>{:nvidia:_nv002284rm+100} <ffffffffa02bb06b>{:nvidia:_nv002166rm+39} <ffffffffa02c3458>{:nvidia:_nv002328rm+64} <ffffffffa02c9df7>{:nvidia:_nv003667rm+141} <ffffffffa02c9d3b>{:nvidia:_nv003623rm+275} <ffffffffa043d778>{:nvidia:_nv003247rm+126} <ffffffffa03f0948>{:nvidia:_nv004556rm+68} <ffffffffa03f0726>{:nvidia:_nv004385rm+104} <ffffffffa02c9b04>{:nvidia:_nv001453rm+96} <ffffffffa03a1338>{:nvidia:_nv000393rm+20} <ffffffffa03a14b3>{:nvidia:_nv000397rm+125} <ffffffffa02cc951>{:nvidia:_nv001426rm+141} <ffffffffa02ca542>{:nvidia:_nv001458rm+668} <ffffffffa02cd8f4>{:nvidia:rm_init_adapter+104} <ffffffffa04c066f>{:nvidia:nv_kern_open+684} <ffffffff8017eb50>{chrdev_open+412} <ffffffff801760b4>{dentry_open+223} <ffffffff801761ef>{filp_open+62} <ffffffff801e9cf5>{strncpy_from_user+74} <ffffffff801762e1>{get_unused_fd+230} <ffffffff801763d0>{sys_open+57} <ffffffff80110056>{system_call+126} Code: 49 8b 88 f0 18 00 00 76 1b 48 b8 00 00 00 80 00 01 00 00 48 RIP <ffffffff80123cb0>{iounmap+304} RSP <000001017ed1fab8> CR2: 00000000000018f0 <0>Kernel panic - not syncing: Oops Just for yuks, I put the kernel-smp-devel package up on my people page as well. Could you undo the symlinks you have above, install the devel package, and try again? I'd like to make sure we're using all the right bits before digging into this... This *looks* like Bug 160230, but it turns out to be the same issue as Bug 166785, for which a patch has been submitted. *** This bug has been marked as a duplicate of 166785 *** For those who can't access Bug 166785, here's a summary: We discovered a bug in the computation of the address-hash function with which the kernel access the memnodemap[] table (this is a table that maps address ranges to NUMA nodes). This bug is benign if the highest physical memory address in the system is less than 4G. Once it exceeds this point (either due to large memory config or memory hoisting) we overrun the table and things get very bad. We pulled in a couple of fixes from upstream that fix the problem (specifically new implementations of compute_hash_shift() and pfn_valid()). Jim, is there a new kernel we could test to make sure it really fixes our problem? (I was on holiday the last couple of days and couldn't do the test with the kernel-smp-devel package that you've requested yet.) There is a test kernel at: people.redhat.com/~jparadis/numa/kernel-smp-2.6.9-18.EL.jparadis.x86_64.rpm that you can try. Please treat it as a test kernel only; it is *not* to be used for any production purpose. I have verified, however, that the fixes I have made are slated for release. Could you please provide a -devel package for it, too? -devel package has been uploaded to the same place. Jim, Is the -devel package that you posted the one that corresponds with the smp kernel? thanks! Oops... sorry. I uploaded the UP devel pkg by mistake. I just uploaded the smp -devel package. Try it now. Thanks Jim. Unfortunately, this new kernel is still exhibiting the same Oops with the 1.0-7676 NVIDIA driver as with the -16 SMP kernel. Do you have any additional suggestions? Is this a closed-source driver? It might need to be rebuilt to take advantage of the fixed macro in mmzone.h... Yes, this is the closed source driver. By 'rebuilt' do you mean reinstalled so that its kernel module is rebuilt, or do you mean rebuilt by NVIDIA developers? I've reinstalled the driver, so the nvidia.ko kernel module is current with respect to the new kernel. Thanks. I mean it must be recompiled by nVidia, or we need to come up with another solution... The nvidia driver doesn't include any Linux kernel headers, so shouldn't any changes to mmzone.h be picked up automatically? No, a driver picks up the Linux kernel headers from the system it was *built* on. Changing the headers on the runtime system does nothing. The nvidia kernel module (nvidia.ko) is built on the system when it is installed. The driver package doesn't ship with a pre-compiled nvidia.ko for every kernel in existence. Can you elaborate on what was changed/fixed in mmzone.h to address the this bug? Jim, This is a known kernel bug that was fixed many months ago: http://linux.bkbits.net:8080/linux-2.6/diffs/arch/x86_64/mm/ioremap.c@1.23?nav=index.html|src/|src/arch|src/arch/x86_64|src/arch/x86_64/mm|hist/arch/x86_64/mm/ioremap.c Can you please merge this change into the kernel? thanks, Lonni Lonni, That patch makes sense, but we need to know: does that patch fix your problem? Hi Jim, We just completed testing, and the short answer is yes, it resolves the problem. Details: applied the patch to the 2.6.9-17.EL kernel and rebuilt it, but saw another crash in remap_page_range(); this crash also reproduced without the nvidia driver (i.e. with just 'nv') with RedHat's 2.6.9-17.EL build, apparently trying to map VGA registers via /dev/mem. checked 2.6.9-18.EL.jparadis and found that it doesn't crash in remap_page_range(). diff'd 2.6.9-18.EL.jparadis and 2.6.9-17.EL's asm-x86_64/mmzone.h (this is the file you stated held the fix for this bugzilla bug) and rebuilt 2.6.9-17.EL again with a NUMA related change to pfn_valid() included. With the two patches applied, X comes up fine. I'll attach the two patches we generated. Created attachment 119005 [details]
linux-2.6.9-x86_64-ioremap
Created attachment 119006 [details]
linux-2.6.9-x86_64-pfn_valid patch
Jim, Do you need anything else from me to integrate the patches I attached? Is there a kernel RPM available that already has them integrated that I could test? thanks, Lonni I would like to test a new kernel with those fixes, too. BTW: Jim, do you have any comment regarding the Cache Aliasing Issue? See comment #26 above. Jim, Do you need anything else from me to integrate the patches I attached? Is there a kernel RPM available that already has them integrated that I could test? thanks, Lonni I have tried the .22 kernel and it fixed the other problems I have had on ADM systems. Any word on a test kernel with these patches to fix the NVIDIA problems? Anyone tried adding them to the .22 release? Thank you for the patches. I got the .22 kernel source and applied the patches through the SPEC file. Built the new kernel and now the NVIDIA drivers work. Anyone know if the new kernel 2.6.9-22 released in WS4 Update 2 has the fixes in it? I hope to test soon. 2.6.9-22 does not fix the bug that impacts the NVIDIA driver. I've been told, unofficially, that Redhat will not be fixing this bug prior to the RHEL4-U2 final release. Well, just tried the RHEL4-U2 kernel. (did a fresh install) and the bug is not fixed. Here is the output: NVRM: loading NVIDIA Linux x86_64 NVIDIA Kernel Module 1.0-8163 Wed Sep 21 12: 54:25 PDT 2005 Unable to handle kernel paging request at 00000000000018f0 RIP: <ffffffff80123c18>{iounmap+304} PML4 2a090067 PGD 2b1ca067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: nvidia(U) md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_ core sunrpc ds yenta_socket pcmcia_core dm_mirror dm_multipath dm_mod joydev but ton battery ac ohci_hcd ehci_hcd shpchp snd_emu10k1 snd_rawmidi snd_pcm_oss snd_ mixer_oss snd_pcm snd_timer snd_seq_device snd_ac97_codec snd_page_alloc snd_uti l_mem snd_hwdep snd soundcore forcedeth floppy ext3 jbd sata_nv libata sd_mod sc si_mod Pid: 4506, comm: X Tainted: P 2.6.9-22.ELsmp RIP: 0010:[<ffffffff80123c18>] <ffffffff80123c18>{iounmap+304} RSP: 0018:000001011f4afbf8 EFLAGS: 00010213 RAX: 00000100e0000000 RBX: 000001007feeea40 RCX: 0000000000000019 RDX: ffffffff7fffffff RSI: 0000000000002000 RDI: 00000000e0000000 RBP: ffffff000053c000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001000 R13: 0000000000000000 R14: 0000000000000000 R15: 000001012812d680 FS: 0000002a95586920(0000) GS:ffffffff804d3100(0000) knlGS:00000000f7fcf6c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000018f0 CR3: 0000000000101000 CR4: 00000000000006e0 Process X (pid: 4506, threadinfo 000001011f4ae000, task 0000010037c45030) Stack: ffffffff803cbba8 ffffff000053c000 0000000000000000 ffffffffa048e8c7 0000000000000080 ffffffffa026affa 000001017fc20400 00000000005e10de 0000000000000000 ffffffffa026529a Call Trace:<ffffffffa048e8c7>{:nvidia:os_unmap_kernel_space+9} <ffffffffa026affa>{:nvidia:_nv002222rm+42} <ffffffffa026529a>{:nvidia:_nv 002554rm+208} <ffffffffa0264687>{:nvidia:_nv002563rm+255} <ffffffffa026485c>{:nvidia:_n v002520rm+100} <ffffffffa0264944>{:nvidia:_nv002564rm+64} <ffffffffa0257e9d>{:nvidia:_nv 001643rm+351} <ffffffffa026d9af>{:nvidia:_nv002285rm+45} <ffffffffa026e4e0>{:nvidia:_nv 001653rm+368} <ffffffff80113555>{setup_irq+194} <ffffffffa0272492>{:nvidia:rm_init_adap ter+104} <ffffffffa048869b>{:nvidia:nv_kern_open+697} <ffffffff8017eed0>{chrdev_op en+412} <ffffffff80176434>{dentry_open+223} <ffffffff8017656f>{filp_open+62} <ffffffff801ea045>{strncpy_from_user+74} <ffffffff80176661>{get_unused_fd +230} <ffffffff80176750>{sys_open+57} <ffffffff80110052>{system_call+126} Code: 49 8b 88 f0 18 00 00 76 1b 48 b8 00 00 00 80 00 01 00 00 48 RIP <ffffffff80123c18>{iounmap+304} RSP <000001011f4afbf8> CR2: 00000000000018f0 <0>Kernel panic - not syncing: Oops I've got an H8DCE motherboard with 1xOpteron Dual Core and 2x2GB Dimms. I get intermittent kernel panics with the configuration. Dual Core Opteron. 1 processor on the box. If I remove 1 of the 2GB sticks the machine is stable for weeks. Never left it up longer than that. I'd like to put the kernel dumps on the list to help troubleshoot the core issue should I log the core here or is there another bug ID that would be better. Also what is the best way to grab the kernel panic? Is it stored somewhere? Brett reports the problem persists with 2.6.9-22 (which doesn't have the available patches). Can we please get this fixed? The attached patches should apply on 2.6.9-22 as well. Do they not work for you? this should be fixed in -22.3.EL, see: http://people.redhat.com/~jbaron/rhel4/ Just got the chance to install the new kernel. Rebuilt and loaded the NVIDIA drivers and "IT WORKS!!! :)" Are we going to see this in a release soon???? Please :) I finally was able to test the latest beta kernel 2.6.9-22.3.ELsmp on our 64-bit hp xw9300 RHEL4-Update2 system, too. Here are the test results: 1. I was able to boot the kernel without acpi=off and numa=off. 2. It was possible to load the nvidia 1.0-7676 driver without the oops. However, I've the immediately got the following kernel messages: NVRM: loading NVIDIA Linux x86_64 NVIDIA Kernel Module 1.0-7676 Fri Jul 29 13: NVRM: WARNING: Your Linux kernel has problems in its implementation of NVRM: the change_page_attr kernel interface. The NVIDIA kernel NVRM: module will attempt to work around these problems, but NVRM: system stability may be affected. It is recommended that NVRM: you update to a 2.6.11 or newer kernel. NVRM: bad caching on address 0x10074b08000: actual 0x163 != expected 0x173 NVRM: please see the README section on Cache Aliasing for more information NVRM: bad caching on address 0x10074a0b000: actual 0x163 != expected 0x173 NVRM: bad caching on address 0x10073d09000: actual 0x163 != expected 0x173 NVRM: bad caching on address 0x10074258000: actual 0x163 != expected 0x173 NVRM: bad caching on address 0x10074259000: actual 0x163 != expected 0x173 3. Then I tried to reload the nvidia kernel module with the option NVreg_UseCPA=1 which forces the module to use the kernel's change_page_attr() api. When I use this module option the nvidia module loads without those warnings. We are using the machine with this setup now and we'll see how stable it runs. Does anybody know if the two nvidia driver warnings regarding change_page_attr() and cache aliasing are false warnings with the 2.6.9-22.3 kernel+1.0-7676 driver or if these kernel issues still persist? The bad caching warnings occur automatically with 1.0-7xxx nvidia drivers and Redhat kernels. As far as i know, all the kernel issues with change_page_attr are resovled in 2.6.9-22.3, thus as long as the nvidia 7676 is given the NVreg_UseCPA=1 everthing should work fine. my understanding is that the nvidia 8163+ drivers will automatically detect the change_page_attr changes and thus shouldn't require the command line options. So when can we expect an official kernel errata rpm >= 2.6.9-22.3? Side note: *Please* make sure that the next official kernel errata rpm addresses all known kernel bugs mentioned in the very nice summary from NVIDIA which can be found at http://www.nvnews.net/vbulletin/showthread.php?t=58498. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-808.html (In reply to comment #70) > this should be fixed in -22.3.EL, see: http://people.redhat.com/~jbaron/rhel4/ Please re-open this bug. Configuration and circumstances follow... Intel Server Motherboard SE7520JR2 Dual Xeon 3.4GHz 4 x 1GB RAM RHEL4 ES Update 2 stock kernel-smp-2.6.9-22.EL boot fails unless memory remap disabled in BIOS. Recommended fix: Errata kernel-smp-2.6.9-22.0.1.EL boot fails unless memory remap disabled in BIOS. ~jbaron kernel-smp-2.6.9-24.1.EL of 12-Dec-2005 16:40 boot fine with memory remap enabled, full 4G of RAM useable. |