Description of problem: Kernel 2.6.22.1-27.fc7.x86_64 fails to boot on my eMachines m6805, aka Arima W720-K8. Shortly after beginning to boot the kernel, it reboots, and falls into an endless reboot loop. Booting with "quiet" removed, it seems to get down to "agpgart: Detected AGP bridge 0" before it reboots. Its only displayed for a fraction of a second so its hard to be sure. Kernel 2.6.21-1.3228.fc7.x86_64 and earlier works fine. Version-Release number of selected component (if applicable): kernel-2.6.22.1-27.fc7.x86_64 How reproducible: Always
Well since it reboots so quickly lets try and disable things and see if you can get it to come up. Add to your kernel line in your grub.conf acpi=off noagp and try and boot into that kernel. If that works, remove one of those options and see if it still boots. Whichever option makes it play nicely post to this ticket, it will help narrow down the problem.
Neither helps. And it seems sometimes it will just lock up rather than rebooting, it is in fact hanging/rebooting at "agpgart: Detected AGP bridge 0".
(In reply to comment #1) > Well since it reboots so quickly lets try and disable things and see if you can > get it to come up. Add to your kernel line in your grub.conf > > acpi=off noagp > That's ^^^^^ agp=off
Oooh, agp=off worked. I need my DRI though. :)
I got hit by the same on SK8V from ASUSTeK Computer Inc. I have no idea how far this gets as a screen blinks and a machine reboots before I have a chance to read anything at all. It boots with agp=off but, of course, DRI is killed. No problems with earlier F7 kernels. A long succession of rawhide kernels, including various "2.6.22" kernels, usually was booting on the same hardware. The current rawhide 2.6.23-0.43.rc0.git16.fc8 is fine.
Can someone post the output of 'lspci' and also 'lspci -n' from the failing machines?
00:00.0 Host bridge: VIA Technologies, Inc. VT8385 [K8T800 AGP] Host Bridge (rev 01) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 South] 00:0a.0 CardBus bridge: ENE Technology Inc CB1410 Cardbus Controller 00:0c.0 Network controller: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller (rev 03) 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) 00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50) 00:11.6 Communication controller: VIA Technologies, Inc. AC'97 Modem Controller (rev 80) 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74) 00:13.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 80) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 VGA compatible controller: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] 00:00.0 0600: 1106:3188 (rev 01) 00:01.0 0604: 1106:b188 00:0a.0 0607: 1524:1410 00:0c.0 0280: 14e4:4320 (rev 03) 00:10.0 0c03: 1106:3038 (rev 80) 00:10.1 0c03: 1106:3038 (rev 80) 00:10.2 0c03: 1106:3038 (rev 80) 00:10.3 0c03: 1106:3104 (rev 82) 00:11.0 0601: 1106:3177 00:11.1 0101: 1106:0571 (rev 06) 00:11.5 0401: 1106:3059 (rev 50) 00:11.6 0780: 1106:3068 (rev 80) 00:12.0 0200: 1106:3065 (rev 74) 00:13.0 0c00: 1106:3044 (rev 80) 00:18.0 0600: 1022:1100 00:18.1 0600: 1022:1101 00:18.2 0600: 1022:1102 00:18.3 0600: 1022:1103 01:00.0 0300: 1002:4e50
This is for MSI board MS-6741 (x86_64). It also fails to boot starting with 2.6.22.1-27.fc7.x86_64, with multiple kernel exceptions, but I cannot get those details. It may be a different problem. 'lspci' shows there: 00:00.0 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge 00:00.1 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge 00:00.2 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge 00:00.3 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge 00:00.4 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge 00:00.7 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 South] 00:06.0 Network controller: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller (rev 03) 00:0e.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 80) 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) 00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South] 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60) 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 VGA compatible controller: VIA Technologies, Inc. S3 Unichrome Pro VGA Adapter (rev 01) 00:00.0 0600: 1106:0204 00:00.1 0600: 1106:1204 00:00.2 0600: 1106:2204 00:00.3 0600: 1106:3204 00:00.4 0600: 1106:4204 00:00.7 0600: 1106:7204 00:01.0 0604: 1106:b188 00:06.0 0280: 14e4:4320 (rev 03) 00:0e.0 0c00: 1106:3044 (rev 80) 00:0f.0 0104: 1106:3149 (rev 80) 00:0f.1 0101: 1106:0571 (rev 06) 00:10.0 0c03: 1106:3038 (rev 81) 00:10.1 0c03: 1106:3038 (rev 81) 00:10.2 0c03: 1106:3038 (rev 81) 00:10.3 0c03: 1106:3038 (rev 81) 00:10.4 0c03: 1106:3104 (rev 86) 00:11.0 0601: 1106:3227 00:11.5 0401: 1106:3059 (rev 60) 00:12.0 0200: 1106:3065 (rev 78) 00:18.0 0600: 1022:1100 00:18.1 0600: 1022:1101 00:18.2 0600: 1022:1102 00:18.3 0600: 1022:1103 01:00.0 0300: 1106:3108 (rev 01) This is the same for SK8V from comment #5: 00:00.0 Host bridge: VIA Technologies, Inc. VT8385 [K8T800 AGP] Host Bridge (rev 01) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 South] 00:07.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 80) 00:08.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak 378/SATA 378) (rev 02) 00:0a.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 12) 00:0e.0 Ethernet controller: Intel Corporation 82557/8/9 Ethernet Pro 100 (rev 0c) 00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) 00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) 00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South] 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60) 00:11.6 Communication controller: VIA Technologies, Inc. AC'97 Modem Controller (rev 80) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200 PRO] (rev 01) 01:00.1 Display controller: ATI Technologies Inc RV280 [Radeon 9200 PRO] (Secondary) (rev 01) 00:00.0 0600: 1106:3188 (rev 01) 00:01.0 0604: 1106:b188 00:07.0 0c00: 1106:3044 (rev 80) 00:08.0 0104: 105a:3373 (rev 02) 00:0a.0 0200: 10b7:1700 (rev 12) 00:0e.0 0200: 8086:1229 (rev 0c) 00:0f.0 0104: 1106:3149 (rev 80) 00:0f.1 0101: 1106:0571 (rev 06) 00:10.0 0c03: 1106:3038 (rev 81) 00:10.1 0c03: 1106:3038 (rev 81) 00:10.2 0c03: 1106:3038 (rev 81) 00:10.3 0c03: 1106:3038 (rev 81) 00:10.4 0c03: 1106:3104 (rev 86) 00:11.0 0601: 1106:3227 00:11.5 0401: 1106:3059 (rev 60) 00:11.6 0780: 1106:3068 (rev 80) 00:18.0 0600: 1022:1100 00:18.1 0600: 1022:1101 00:18.2 0600: 1022:1102 00:18.3 0600: 1022:1103 01:00.0 0300: 1002:5960 (rev 01) 01:00.1 0380: 1002:5940 (rev 01) -[0000:00]-+-00.0 VIA Technologies, Inc. VT8385 [K8T800 AGP] Host Bridge +-01.0-[0000:01]--+-00.0 ATI Technologies Inc RV280 [Radeon 9200 PRO] | \-00.1 ATI Technologies Inc RV280 [Radeon 9200 PRO] (Secondary) +-07.0 VIA Technologies, Inc. IEEE 1394 Host Controller +-08.0 Promise Technology, Inc. PDC20378 (FastTrak 378/SATA 378) +-0a.0 3Com Corporation 3c940 10/100/1000Base-T [Marvell] +-0e.0 Intel Corporation 82557/8/9 Ethernet Pro 100 +-0f.0 VIA Technologies, Inc. VIA VT6420 SATA RAID Controller +-0f.1 VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE +-10.0 VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller +-10.1 VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller +-10.2 VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller +-10.3 VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller +-10.4 VIA Technologies, Inc. USB 2.0 +-11.0 VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South] +-11.5 VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller +-11.6 VIA Technologies, Inc. AC'97 Modem Controller +-18.0 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration +-18.1 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map +-18.2 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller \-18.3 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
It looks like we can slow down the boot messages by adding boot_delay=N to the kernel command line. Try values of N starting with 200, then try 400 etc. until it prints slowly enough to see the messages. Try taking a picture of the screen if there's anything relevant, then attach that to the bugzilla.
It was just immediately rebooting when I was trying that before. At this moment 2.6.22.1-27.fc7 on SK8V prints assorted things until it will get down to these lines: ..... NetLabel: unlabeled traffic allowed by default agpgart: Detected AGP bridge 0 and after that it just sits there with a power switch the only remaining option to affect something here. No idea why that chaneg after powering down in the meantime. I did try to boot another working kernel, like 2.6.23-0.45.rc0.git16.fc8, and restart with 2.6.22.1-27.fc7. No visible influence. Like I wrote - with 'agp=off' I can boot.
Kernel 2.6.22.1-33.fc7 from updates-testing is stuck on SK8V precisely in the same way like described in comment #10.
There really doesn't seem to be much change in AGP to cause this.
can you get me a dmesg from a previous kernel and attach it? My guess is its a via chipset quirk missing or doing something different somewhere else...
Created attachment 160089 [details] dmesg from working kernel on m6805
I'm guessing it might be the e820 stuff in amd64-agp.c b92e9fac400d4ae5bc7a75c568e9844ec53ea329 is the lk commit I'm guessing, Chuck I'm unsure of procedure so can you followup?
Created attachment 160133 [details] a part of differences from dmesg for different kernels on two different boards > I'm guessing it might be the e820 stuff in amd64-agp.c Maybe. I am not sure if this will help but attached is a diff from dmesg for a kernel which still boots on SK8V, i.e. 2.6.21-1.3228.fc7, and dmesg for 2.6.22.1-33.fc7 on another x86_64 board (ASUS A8V Deluxe) where the later happens to work. That diff continues only to a section with "agpgart: Detected AGP bridge 0". Later stuff does a disk detection and other things which are quite radically different in both cases. Once again, with 2.6.22.1-33.fc7 on SK8V a boot just sits there after printing a line "agpgart: Detected AGP bridge 0". Rawhide kernels are fine.
I'm getting the same hang after detecting the AGP bridge. Motherboard is a Gigabyte GA-K8VT800M, VIA chipset and a Matrox G550 video card. Same kernel versions apply. agp=off and it boots ok.
kernel-2.6.22.1-41.fc7 fails to boot the same way as 2.6.22.1-27.fc7 and 2.6.22.1-33.fc7.
*** Bug 251555 has been marked as a duplicate of this bug. ***
Kernel kernel-2.6.22.1-32.fc6 is affected by the same disease. Screenshot attached to bug 251555 shows a tail of an oops but the whole thing stretches for something between two and three screenfulls (if it does not lock up silently).
Created attachment 161126 [details] first screen from oops with 2.6.22.1-32.fc6 With a help of 'boot_delay=200' I took pictures of both exception screens for 2.6.22.1-32.fc6. It says "invalid opcode" at the beginning.
Created attachment 161127 [details] second screen from oops with 2.6.22.1-32.fc6
Same situation: Last working kernel: 2.6.21-1.3228.fc7 Actual kernel: 2.6.22.1-41.fc7 Kernel hangs after printing line (withou quiet parameter): agpgart: Detected AGP bridge 0 What is interesting is that debug version of the same kernel (2.6.22.1-41.fc7) is working without problems.
I tried rebooting with kernel-2.6.22.2-52.fc7 (at koji at this moment). SK8V used in testing is a single core and is recognized by this kernel as such. With 'boot_delay=200' I can see "agpgart: Detected AGP bridge 0" line followed by an immediate reboot. Without 'boot_delay' parameter I stare at BIOS boot screens in an instant. After such boot attempt on some unpredictable occasions a machine may hang in a reboot and requires a powerdown. kernel-2.6.23-0.101.rc2.git5.fc8 does boot on a test machine.
Something changed in the e820 map. Does kernel option "numa=off" make a difference?
(In reply to comment #25) > Something changed in the e820 map. Does kernel option "numa=off" make a difference? No difference for me, but I have single-processor machine, so I think numa=off should not have any impact at all.
> Does kernel option "numa=off" make a difference? Not the slightest one for me. Not that surprising. I already mentioned that SK8V is a single core. Does anybody sees that on multiple core x86_64's?
(In reply to comment #22) > Created an attachment (id=161127) [edit] > second screen from oops with 2.6.22.1-32.fc6 The running kernel's code is corrupted. Somehow it has been overwritten during the AGP init phase, apparently. From the dump: Code: ff ff ff ff 00 00 00 00 00 00 00 00 90 e2 47 0f 00 81 ff But the kernel should have, at that address: pci_read(): /usr/src/debug/kernel-2.6.22/linux-2.6.22.x86_64/arch/i386/pci/common.c:32 ffffffff811f6388: 48 8b 05 19 7f 35 00 mov 3505945(%rip),%rax # ffffffff8154e2a8 <raw_pci_ops> /usr/src/debug/kernel-2.6.22/linux-2.6.22.x86_64/arch/i386/pci/common.c:31 ffffffff811f638f: 41 89 f2 mov %esi,%r10d /usr/src/debug/kernel-2.6.22/linux-2.6.22.x86_64/arch/i386/pci/common.c:32 ffffffff811f6392: 0f b6 b7 98 00 00 00 movzbl 0x98(%rdi),%esi ffffffff811f6399: 4d 89 c1 mov %r8,%r9 ffffffff811f639c: 31 ff xor %edi,%edi ffffffff811f639e: 41 89 c8 mov %ecx,%r8d ffffffff811f63a1: 89 d1 mov %edx,%ecx ffffffff811f63a3: 44 89 d2 mov %r10d,%edx ffffffff811f63a6: 4c 8b 18 mov (%rax),%r11 ffffffff811f63a9: 41 ff e3 jmpq *%r11
(In reply to comment #27) > > Does kernel option "numa=off" make a difference? > > Not the slightest one for me. Not that surprising. I already > mentioned that SK8V is a single core. It is doing "fake" numa for single-CPU machines. There are some e820 patches in 2.6.23 for bugs in the e820 code that make it use invalid addresses for the fake numa tables.
Comparing sources for 2.6.22.1-32.fc6, which fails to boot, and booting 2.6.23-0.101.rc2.git5.fc8 the only difference in drivers/char/agp/amd64-agp.c is that in the first case there is a call 'pci_read_config_byte(pdev, PCI_REVISION_ID, &rev_id);' to get rev_id of u8 type and in the second one pdev->revision is used instead of rev_id (in two places). That difference is due to patch-2.6.23-rc2.bz2. How significant is that I do not know; possibly not very as earlier rawhide kernels based on 2.6.22 were usually booting just fine.
Re comment 28: a picture posted by John Morris as https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=161005 from a panic on F7 (bug 251555) shows a saner looking code line. Unfortunately a preceeding screen is not there.
kernel-2.6.22.1-41.fc7.x86_64 fails in the same way on my m6805. So it appears to be a problem with VIA chipsets. One wonders, do ANY x86_64 machines with VIA chipsets work? My desktop machine with a Gigabyte GA-K8U motherboard, ULi M1689 chipset, boots these kernels just fine. These two are the only x86_64 machines I have though. I'll try numa=off.
Nope, numa=off does not appear to help at all on my m6805.
I tried kernel-2.6.22.2-57.fc7.x86_64 from testing. Just booting, or booting with 'initcall_debug' and a machine is back to BIOS rebooting. With 'initcall_debug boot_delay=150' the last two lines on screen are Calling initcall 0xffffffff814291d6: pci_iommu_init+0x0/0x17() agpgart: Detected AGP bridge 0 and after that it sits there completely frozen. 'agp=off' allows to boot that, as expected. The next line after that missing fragment about AGP is in this case: ACPI: RTC can wake from S4
(In reply to comment #31) > Re comment 28: a picture posted by John Morris as > https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=161005 > from a panic on F7 (bug 251555) shows a saner looking code line. That's not code, it's ASCII text!
> That's not code, it's ASCII text! Hm, indeed. You are right! LOC: ERR: %10u ar Not even reversed as it happens with little-endian. Maybe not as much as "overwritten" but something reads from somewhere else it expects to read?
Kernel with a possible fix (e820 hole mapping) is in Koji: http://koji.fedoraproject.org/koji/buildinfo?buildID=13938
(In reply to comment #37) > Kernel with a possible fix (e820 hole mapping) is in Koji: > > http://koji.fedoraproject.org/koji/buildinfo?buildID=13938 > This kernel works for me. If you have any questions about configuration or something else, feel free to ask.
> Kernel with a possible fix (e820 hole mapping) is in Koji Sorry! It dies for me the same way as before; i.e. it prints "agpgart: Detected AGP bridge 0" and nothing happens after that. Here is the top of dmesg output after booting this kernel with agp=off: Linux version 2.6.22.3-61.fc7 (kojibuilder.redhat.com) (gcc version 4.1.2 20070502 (Red Hat 4.1.2-12)) #1 SMP Thu Aug 16 13:23:49 EDT 2007 Command line: ro root=/dev/Vols/Vol04 agp=off 3 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000001ff30000 (usable) BIOS-e820: 000000001ff30000 - 000000001ff40000 (ACPI data) BIOS-e820: 000000001ff40000 - 000000001fff0000 (ACPI NVS) BIOS-e820: 000000001fff0000 - 0000000020000000 (reserved) BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved) Entering add_active_range(0, 0, 159) 0 entries of 3200 used Entering add_active_range(0, 256, 130864) 1 entries of 3200 used end_pfn_map = 1048576 DMI 2.3 present. ACPI: RSDP 000FA870, 0021 (r2 ACPIAM) ACPI: XSDT 1FF30100, 003C (r1 A M I OEMXSDT 9000323 MSFT 97) ACPI: FACP 1FF30290, 00F4 (r3 A M I OEMFACP 9000323 MSFT 97) ACPI: DSDT 1FF303E0, 35D8 (r1 SK8V_ SK8V_013 13 MSFT 100000D) ACPI: FACS 1FF40000, 0040 ACPI: APIC 1FF30390, 004A (r1 A M I OEMAPIC 9000323 MSFT 97) ACPI: OEMB 1FF40040, 003F (r1 A M I OEMBIOS 9000323 MSFT 97) Scanning NUMA topology in Northbridge 24 No NUMA configuration found Any other information which could be help?
I tried, just in case, 2.6.22.3-61.fc7 with 'numa=off'. This touches AGP and I am immediately back to BIOS rebooting; even with 'boot_delay=200'. Without 'boot_delay=...' and with 'numa=off' I cannot even see what really happened.
Nope, no luck with 2.6.22.3-61.fc7 on the m6805 either.
Checked kernel-2.6.22.4-66.fc7 which I found on koji. That one immediately reboots back to BIOS without or with various "regular" option combinations I tried. That with one exception; after 'boot_delay=200' it got to "agpgart: Detected AGP bridge 0" and it sits there. OTOH if I will drop an amount memory available (like "mem=64M") then pretty consistently I am getting panics which look like what is in an attachment (id=161127) (comment #22) picture. AFAICS results are the same with various values for 'mem=...' (all below 512M which happens to be an amount of memory in my test box). Only this time RIP is pci_read+0x3/0x24 and a "Code" line comes always as 37 71 3d 33 27 57 f3 75 73 35 3d 27 0b 35 f3 57 77 35 33 3f Not like that code from comment #28 but not an ASCII text either. As an extra attraction boot parameters say 'mem=64M boot_delay=150' mean an instant reboot the moment AGP is touched.
Sigh, still no luck with 2.6.22.4-65.fc7
And no luck with 2.6.22.5-71.fc7 either.
Created attachment 174641 [details] dmesg from 'mem=510M' boot with 2.6.22.4-45.fc6 I get "Kernel panic - not syncing ..." right away with kernel-2.6.22.4-45.fc6. A "Code" line gets even more curious: 00 01 10 00 00 00 00 00 00 02 20 00 00 00 00 00 00 00 00 00 OTOH I can boot that kernel if I will specify 'mem=510M' or some smaller amount ('mem=64M' still boots fine). With 'mem=511M' I get panic, although a "Code" line does not resemble anything seen earlier. Higher than 511M is not accepted (I have 512M on board) and for results with no 'mem=xxx' - see above. dmesg from 'mem=510M' boot is attached. Keep in mind comment #42 where I tried similar trickery with kernel-2.6.22.4-66.fc7 and still was unable to boot.
I tried the same 'mem=510M' with kernel-2.6.22.5-71.fc7 grabbed from koji and that boots for me as well. 'mem=511M', or no such parameter, and with 2.6.22.5-71.fc7 I have an instant reboot. Does that mean something scribbles over a memory it should not touch?
(In reply to comment #46) > I tried the same 'mem=510M' with kernel-2.6.22.5-71.fc7 grabbed > from koji and that boots for me as well. 'mem=511M', or no such > parameter, and with 2.6.22.5-71.fc7 I have an instant reboot. > Using the boot_delay option, or serial console, can you get the first 30 lines of output when booting without a "mem=" parameter (from the e820 info to the "bootmem setup node 0..."?)
> Using the boot_delay option, or serial console, can you get the > first 30 lines of output .... I am afraid that this turns out to be impossible. That SK8V board does not have a serial connector at all (or I would not bother with pictures), BIOS does not have an option to change a number of text lines on a screen and boot_delay is ineffective. By the latest I mean that the moment the first screen output shows up on a monitor we are way past that fragment you would want to see so I cannot even make pictures. Adding options like 'earlyprintk=vga' and/or 'initcall_debug' does not help. All I can tell that with 2.6.22.5-71.fc7 and with 'boot_delay=...' the moment the code reaches AGP there is an immediate reboot. Maybe somebody else has a hardware which would allow to catch more and can repeat something similar to my results? BTW - dmesg produced by 2.6.22.4-45.fc6, as attached to comment #45, and by 2.6.22.5-71.fc7, both with 'mem=510M', do not differ that much where it counts. In particular e820 map is the same. That map does not differ also from the one for 2.6.21-1.3228.fc7 which happens to be the last F7 kernel which boots on that board without any "heroic efforts". There are some differences in an initial setup though. Here: --- dmesg.2.6.21-1.3228.fc7 2007-08-28 15:53:01.000000000 -0600 +++ dmesg.2.6.22.5-71.fc7 2007-08-27 16:08:42.000000000 -0600 @@ -21,26 +21,24 @@ ACPI: APIC 1FF30390, 004A (r1 A M I OEMAPIC 9000323 MSFT 97) ACPI: OEMB 1FF40040, 003F (r1 A M I OEMBIOS 9000323 MSFT 97) Scanning NUMA topology in Northbridge 24 -Number of nodes 1 -Node 0 MemBase 0000000000000000 Limit 000000001ff30000 +No NUMA configuration found +Faking a node at 0000000000000000-000000001fe00000 Entering add_active_range(0, 0, 159) 0 entries of 3200 used -Entering add_active_range(0, 256, 130864) 1 entries of 3200 used -NUMA: Using 63 for the hash shift. -Using node hash shift of 63 -Bootmem setup node 0 0000000000000000-000000001ff30000 +Entering add_active_range(0, 256, 130560) 1 entries of 3200 used +Bootmem setup node 0 0000000000000000-000000001fe00000 ..... Although current rawhide kernels, booting, also have "No NUMA configuration found" but: Faking a node at 0000000000000000-000000001ff30000 .... Bootmem setup node 0 0000000000000000-000000001ff30000 and these happen to be the same addresses as in 2.6.21-1.3228.fc7 and not in 2.6.22.5-71.fc7. Still looking at results of booting 2.6.23-0.142.rc3.git10.fc8 with and without 'mem=510' that difference is really a result of this option. dmesg for 2.6.21-1.3228.fc7 was already attached to comment #14, Different board but it looks very similar to what I see.
No changes with 2.6.22.5-49.fc6 and 2.6.22.5-76.fc7. That means that I can boot if I will use 'mem=510M' in boot parameters; otherwise a bomb if 'agp=off' is not there.
Okay, with recent kernels I seem to get a backtrace before it reboots. By setting mem=256M it will lock up instead, so I was able to get a picture of it. vga=6 got the entire backtrace on screen. mem=510M seems to work however! (The machine has 512mb in it) Hurray I can once again get stable wireless and DRI at the same time...
Created attachment 211951 [details] Screenshot of kernel-2.6.22.9-91.fc7.x86_64 backtrace
In comment #50 by Callum Lerwick: > By setting mem=256M it will lock up instead Did you try mem=254M or somewhat less?
See https://bugzilla.redhat.com/show_bug.cgi?id=336281#c6 for a possible workaround, which may give DRI, if you have to boot with 'agp=off'.
Does kernel option "numa=fake=1" make any difference?
> Does kernel option "numa=fake=1" make any difference? When trying with kernel-2.6.23.1-31.fc8, which for me consistently gets stuck after showing "agpgart: Detected AGP bridge 0", the first test with "numa=fake=1" caused an instant reboot. But attempts to repeat that later with extra options like "boot_delay=150" were just getting stuck again in the same place.
This is the broken commit that causes this bug: commit 2e1c49db4c640b35df13889b86b9d62215ade4b6 Author: Zou Nan hai <nanhai.zou> Date: Fri Jun 1 00:46:28 2007 -0700 x86_64: allocate sparsemem memmap above 4G On systems with huge amount of physical memory, VFS cache and memory memmap may eat all available system memory under 4G, then the system may fail to allocate swiotlb bounce buffer. There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose not cover sparsemem model. This patch add fix to sparsemem model by first try to allocate memmap above 4G. Signed-off-by: Zou Nan hai <nanhai.zou> Acked-by: Suresh Siddha <suresh.b.siddha> Cc: Andi Kleen <ak> Cc: <stable> Signed-off-by: Andrew Morton <akpm> Signed-off-by: Linus Torvalds <torvalds>
I got a chance to experiment: 512M = "Error 28: Selected item cannot fit into memory" 511M = no boot 510M = boot 509M = no boot 508M = no boot 507M = no boot 506M = boot 256M = no boot 255M = boot 128M = no boot 127M = boot 96M = no boot 95M = boot 64M = no boot 63M = boot 32M = no boot 31M = boots, but the OOM killer kills the initrd. :)
Created attachment 240991 [details] Patch to revert broken commit This is tested against 2.6.23.1.
I've rebuilt the latest Fedora 8 kernel with the above patch included. Works just great here. http://mebourne.fedorapeople.org/kernel-2.6.23.1-37.bz249174.src.rpm http://mebourne.fedorapeople.org/kernel-2.6.23.1-37.bz249174.x86_64.rpm Of course, none of these machines have huge amounts of RAM, mine only has 512MB. I suspect that the original 'fix' breaks Fedora for more people than it fixes anything for. (I for one have been unable to upgrade two of my machines from FC6 to Fedora 7, and certainly don't want to miss out on Fedora 8 as well.)
I can confirm that my test machine boots fine with http://mebourne.fedorapeople.org/kernel-2.6.23.1-37.bz249174.x86_64.rpm kernel and it will get stuck, unless agp=off is used, with 2.6.23.1-37.fc8. kernel-2.6.23.1-37.bz249174 was configured with debugging off, right? Looking at the patch in question it seems to me that this is a pure dumb luck that various x86_64 boxes can boot with this patch when some workarounds are used (manipulating memory amounts, agp=off). OTOH just reverting the patch will break what this was supposed to fix in the first place ("... VFS cache and memory memmap may eat all available system memory under 4G, then the system may fail to allocate swiotlb bounce buffer").
We have a working confirmed patch. Adding it as a Fedora 8 blocker to review.
*** Bug 338551 has been marked as a duplicate of this bug. ***
*** Bug 336281 has been marked as a duplicate of this bug. ***
So, is there an F7 update I can test? :) 336281 is an AMD chipset. So it seems this isn't VIA only. I wonder if the reason my ULi M1689 based desktop works is because it has 2.25gb RAM.
Try the F8 kernel above, should work on F7.
My problem (bug #336281 / Fedora 7) was on an AMD Solo motherboard with AMD chipset. Is there an official Fedora project built test kernel with the fixes mentioned in comment #59 available? If so, I'd be happy to test it with Fedora 7 if there are no F8 userland deps. TIA
Patch added and will be building shortly...
kernel-2.6.23.1-41.fc8 from koji boots rawhide on my test machine without any extra options. Also Xorg works, and it using DRI, without a need to force bus to PCI. I should note that the same kernel works also as above for F7 installation too. Not that surprising, as this is the same hardware only different disk partitions, but I checked that just to be sure.
kernel-2.6.22.11-68.fc6 ('updates-testing' at this moment) boots for me as expected.