I get the following backtrace when booting 3.2.10-3.fc16.x86_64 (also 3.2.10-1 and 3.2.9-4 but not 3.2.9-2) as dom0 under xen. Some experimentation shows the crash occurs if the x86-ioapic-add-register-checks-for-bogus-io-apic-entries.patch patch is present in the kernel, but boots normally if it isn't. BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 IP: [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0 PGD 0 Oops: 0002 [#1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper/0 Not tainted 3.2.10-3.fc16.x86_64 #1 Dell Inc. Inspiron 1525 /0U990C RIP: e030:[<ffffffff8134e51f>] [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0 RSP: e02b: ffff8800d42cbb70 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00000000ffffffef RCX: 0000000000000001 RDX: 0000000000000040 RSI: 00000000ffffffef RDI: 0000000000000001 RBP: ffff8800d42cbb80 R08: ffff8800d6400000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffef R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000010 FS: 0000000000000000(0000) GS:ffff8800df5fe000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0:000000008005003b CR2: 0000000000000040 CR3: 0000000001a05000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/0 (pid: 1, threadinfo ffff8800d42ca000, task ffff8800d42d0000) Stack: 00000000ffffffef 0000000000000010 ffff8800d42cbbe0 ffffffff8134f157 ffffffff8100a9b2 ffffffff8182ffd1 00000000000000a0 00000000829e7384 0000000000000002 0000000000000010 00000000ffffffff 0000000000000000 Call Trace: [<ffffffff8134f157>] xen_bind_pirq_gsi_to_irq+0x87/0x230 [<ffffffff8100a9b2>] ? check_events+0x12+0x20 [<ffffffff814bab42>] xen_register_pirq+0x82/0xe0 [<ffffffff814bac1a>] xen_register_gsi.part.2+0x4a/0xd0 [<ffffffff814bacc0>] acpi_register_gsi_xen+0x20/0x30 [<ffffffff8103036f>] acpi_register_gsi+0xf/0x20 [<ffffffff8131abdb>] acpi_pci_irq_enable+0x12e/0x202 [<ffffffff814bc849>] pcibios_enable_device+0x39/0x40 [<ffffffff812dc7ab>] do_pci_enable_device+0x4b/0x70 [<ffffffff812dc878>] __pci_enable_device_flags+0xa8/0xf0 [<ffffffff812dc8d3>] pci_enable_device+0x13/0x20 [<ffffffff812d7cf8>] pci_enable_bridges+0x48/0x90 [<ffffffff81b1942e>] pci_assign_unassigned_resources+0x1f0/0x224 [<ffffffff813914c7>] ? put_device+0x17/0x20 [<ffffffff81165d0b>] ? kfree+0x3b/0x150 [<ffffffff812dfb8a>] ? pci_get_subsys+0x8a/0xc0 [<ffffffff81b2aa76>] ? pcibios_allocate_bus_resources+0x8d/0x8d [<ffffffff81b2aae8>] pcibios_assign_resources+0x72/0x76 [<ffffffff81b271f3>] ? parse_pmtmr+0x56/0x56 [<ffffffff81002042>] do_one_initcall+0x42/0x180 [<ffffffff81aebce7>] kernel_init+0xde/0x158 [<ffffffff81066537>] ? schedule_tail+0x27/0xb0 [<ffffffff815eec34>] kernel_thread_helper+0x4/0x10 [<ffffffff815ecce3>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff815e4f7c>] ? retint_restore_args+0x5/0x6 [<ffffffff815eec30>] ? gs_change+0x13/0x13
I did a bit of hacking to get the kernel warning without the crash. The context for the warning is [ 0.000000] ACPI: PM-Timer IO Port: 0x1008 [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) [ 0.000000] BIOS bug: APIC version is 0 for CPU 0/0x0, fixing up to 0x10 [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) [ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) [ 0.000000] I/O APIC 0xfec00000 regs return all ones, skipping! [ 0.000000] IOAPIC[0]: apic_id 2, version 255, address 0xfec00000, GSI 0-255 [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [ 0.000000] ACPI: IRQ0 used by override. [ 0.000000] ACPI: IRQ2 used by override. [ 0.000000] ACPI: IRQ9 used by override. [ 0.000000] Using ACPI (MADT) for SMP configuration information [ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000 [ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs [ 0.000000] nr_irqs_gsi: 272 I suspect the crash occurs a bit later when the pirqs are mapped [ 0.000000] NR_IRQS:16640 nr_irqs:512 16 [ 0.000000] xen: sci override: global_irq=9 trigger=0 polarity=0 [ 0.000000] xen: registering gsi 9 triggering 0 polarity 0 [ 0.000000] xen: --> pirq=9 -> irq=9 (gsi=9) [ 0.000000] xen: acpi sci 9 [ 0.000000] xen: --> pirq=1 -> irq=1 (gsi=1) [ 0.000000] xen: --> pirq=2 -> irq=2 (gsi=2) [ 0.000000] xen: --> pirq=3 -> irq=3 (gsi=3) [ 0.000000] xen: --> pirq=4 -> irq=4 (gsi=4) [ 0.000000] xen: --> pirq=5 -> irq=5 (gsi=5) [ 0.000000] xen: --> pirq=6 -> irq=6 (gsi=6) [ 0.000000] xen: --> pirq=7 -> irq=7 (gsi=7) [ 0.000000] xen: --> pirq=8 -> irq=8 (gsi=8) [ 0.000000] xen_map_pirq_gsi: returning irq 9 for gsi 9 [ 0.000000] xen: --> pirq=9 -> irq=9 (gsi=9) [ 0.000000] xen: --> pirq=10 -> irq=10 (gsi=10) [ 0.000000] xen: --> pirq=11 -> irq=11 (gsi=11) [ 0.000000] xen: --> pirq=12 -> irq=12 (gsi=12) [ 0.000000] xen: --> pirq=13 -> irq=13 (gsi=13) [ 0.000000] xen: --> pirq=14 -> irq=14 (gsi=14) [ 0.000000] xen: --> pirq=15 -> irq=15 (gsi=15)
I tried a scratch build without the x86-ioapic-add-register-checks-for-bogus-io-apic-entries.patch patch and that boots successfully as dom0.
Adding Konrad to CC. This patch is already queued for upstream, so I've emailed Suresh as well.
Thanks. Responded on the email thread. Will instrument it a bit to see if my theory holds true.
Hi, Unfortunately the problem continues to happen. Looking at the changelog of the new kernel didn't see the references to the patch cited by Michael Young. kernel-3.3.0-4.fc16.x86_64 xen-4.1.2-6.fc16.x86_64 Is there any estimate of when this problem will be solved? Thank a lot
Wating on Ingo to Ack these patches: https://lkml.org/lkml/2012/3/21/632
there is also the quick-n-dirty-hack: https://lkml.org/lkml/2012/3/20/349
[mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update.
It crashes the same way with kernel-3.3.0-4.fc16 (as expected).
I have done another scratch build - http://koji.fedoraproject.org/koji/taskinfo?taskID=3923761 - with the patches from https://lkml.org/lkml/2012/3/21/632 - this boots successfully as a dom0.
*** Bug 806245 has been marked as a duplicate of this bug. ***
Created attachment 572774 [details] x86: add io_apic_ops to allow interception
Created attachment 572775 [details] x86/apic_ops: Replace apic_ops with x86_apic_ops
Created attachment 572776 [details] xen/x86: Implement x86_apic_ops
I've committed these patches to F16 now. Should be in the next update.
*** Bug 807401 has been marked as a duplicate of this bug. ***
kernel-3.3.0-8.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.3.0-8.fc16
kernel-3.3.0-8.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/kernel-3.3.0-8.fc17
Package kernel-3.3.0-8.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.3.0-8.fc17' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-4888/kernel-3.3.0-8.fc17 then log in and leave karma (feedback).
Did kernel-PAE also get submitted? On F16 I just did a 'yum --enablerepo=updates-testing list kernel-PAE' but do not see any newer kernel-PAE. .
(In reply to comment #22) > Did kernel-PAE also get submitted? > > On F16 I just did a 'yum --enablerepo=updates-testing list kernel-PAE' but do > not see any newer kernel-PAE. They are all built together, so yes. Maybe your mirror is a bit stale. It can be found here: http://dl.fedoraproject.org/pub/fedora/linux/updates/testing/17/i386/kernel-PAE-3.3.0-8.fc17.i686.rpm I don't believe the F16 version has been pushed yet.
kernel-3.3.0-8.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
Created attachment 574485 [details] Xeon - xm dmesg
Created attachment 574486 [details] E5700 - xm dmesg
Hi, thanks a lot. Great job! Attached I sending my dmesg e xm dmesg, to yours appreciation. Intel Xeon and E5700.
kernel-3.3.0-8.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
This is still a problem with current Fedora 15 (still on maintenance). It looks like Suresh's patch didn't get reverted until 3.2.15, which we don't have. I can confirm that 2.6.43.2-6.fc15.x86_64 in testing boots OK, though I don't know if the decision to jump to 3.3 on f15 has been made yet.
I have the impression that the F15 3.3 kernels are ready but they don't get enough testing before they are obsoleted by the next update. If you want to give the 2.6.43.2-6.fc15 kernel positive karma you can do so at https://admin.fedoraproject.org/updates/FEDORA-2012-6406/kernel-2.6.43.2-6.fc15
(In reply to comment #30) > I have the impression that the F15 3.3 kernels are ready but they don't get > enough testing before they are obsoleted by the next update. If you want to > give the 2.6.43.2-6.fc15 kernel positive karma you can do so at > https://admin.fedoraproject.org/updates/FEDORA-2012-6406/kernel-2.6.43.2-6.fc15 Yes, that's exactly what is happening. It's rather frustrating.
Thanks for the explanation guys. >billmcgonigle - 2012-04-25 04:20:47 >stable for 24 hours on a Xen server. Fixes Xen crash on boot regression >currently shipping in F15! >bodhi - 2012-04-25 04:20:48 >Critical path update approved >bodhi - 2012-04-25 04:20:50 >This update has reached the stable karma threshold and will be pushed to the >stable updates repository oh, so this one's gonna be my fault. ;)