Opening this as a kernel bug, but it may be a xen issue... I got this panic when booting a RHEL4 x86_64 FV xen guest. This is from -68.34.ELsmp, but I've seen the same panic on a -68.32 based kernel as well. Let me know if you need access to the machine... Bootdata ok (command line is ro root=/dev/VolGroup00/LogVol00 console=ttyS0,115200 nohpet) Linux version 2.6.9-68.34.ELsmp (brewbuilder.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-9)) #1 SMP Mon Apr 14 16:50:55 EDT 2008 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000001fffac00 (usable) BIOS-e820: 000000001fffac00 - 0000000020000000 (reserved) No NUMA configuration found Faking a node at 0000000000000000-000000001fffa000 Bootmem setup node 0 0000000000000000-000000001fffa000 DMI 2.4 present. ACPI: PM-Timer IO Port: 0x1f48 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 6:15 APIC version 16 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled) Processor #2 6:15 APIC version 16 Setting APIC routing to flat ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-47 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 low level) ACPI: INT_SRC_OVR (bus 0 bus_irq 7 global_irq 7 low level) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 low level) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 low level) ACPI: HPET id: 0x8086a201 base: 0xfed00000 Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 30000000 (gap: 20000000:e0000000) Checking aperture... Built 1 zonelists Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=ttyS0,115200 nohpet Initializing CPU#0 PID hash table entries: 2048 (order: 11, 65536 bytes) time.c: Using 3.579545 MHz PM timer. time.c: Detected 2992.653 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes) Inode-cache hash table entries: 65536 (order: 7, 524288 bytes) Memory: 508516k/524264k available (2160k kernel code, 0k reserved, 1352k data, 208k init) Calibrating delay using timer specific routine.. 5995.99 BogoMIPS (lpj=2997999) Security Scaffold v1.0.0 initialized SELinux: Initializing. selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 256 (order: 0, 4096 bytes) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K CPU0: Intel(R) Xeon(R) CPU 5160 @ 3.00GHz stepping 0b per-CPU timeslice cutoff: 4096.17 usecs. task migration cache decay timeout: 4 msecs. Booting processor 1/2 rip 6000 rsp 1001fc05f58 Initializing CPU#1 Calibrating delay using timer specific routine.. 5984.27 BogoMIPS (lpj=2992135) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K Intel(R) Xeon(R) CPU 5160 @ 3.00GHz stepping 0b Total of 2 processors activated (11980.26 BogoMIPS). activating NMI Watchdog ... done. testing NMI watchdog ... CPU#0: NMI appears to be stuck (0)! Using local APIC timer interrupts. Detected 6.250 MHz APIC timer. checking TSC synchronization across 2 CPUs: passed. Brought up 2 CPUs time.c: Using PIT/TSC based timekeeping. checking if image is initramfs... it is NET: Registered protocol family 16 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20040816 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) ACPI: PCI Interrupt Link [LNKA] (IRQs *5 7 10 11) ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *7 10 11) ACPI: PCI Interrupt Link [LNKC] (IRQs 5 7 *10 11) ACPI: PCI Interrupt Link [LNKD] (IRQs 5 7 10 *11) xen_mem: Initialising balloon driver. usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing GSI 16 sharing vector 0xA9 and IRQ 16 ACPI: PCI Interrupt 0000:00:01.2[A] -> GSI 20 (level, low) -> IRQ 169 GSI 17 sharing vector 0xB1 and IRQ 17 ACPI: PCI Interrupt 0000:00:01.3[D] -> GSI 23 (level, low) -> IRQ 177 GSI 18 sharing vector 0xB9 and IRQ 18 ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 28 (level, low) -> IRQ 185 GSI 19 sharing vector 0xC1 and IRQ 19 ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 32 (level, low) -> IRQ 193 PCI-DMA: Disabling IOMMU. IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $ audit: initializing netlink socket (disabled) audit(1208266359.835:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Initializing Cryptographic API ksign: Installing public key data Loading keyring - Added public key 9A2F7475FE63665B - User ID: Red Hat, Inc. (Kernel Module GPG key) Limiting direct PCI/PCI transfers. PCI: PIIX3: Enabling Passive Release on 0000:00:01.0 Activating ISA DMA hang workarounds. pci_hotplug: PCI Hot Plug PCI Core version: 0.5 ACPI: Processor [PR00] (supports C1) ACPI: Processor [PR01] (supports C1) Real Time Clock Driver v1.12 Linux agpgart interface v0.100 (c) Dave Jones serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 68 ports, IRQ sharing enabled �ttyS0 at I/O 0x3f8 (irq = 4) is a 16450 RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 28 (level, low) -> IRQ 185 Xen version 3.1. Hypercall area is 1 pages. ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at gnttab:527 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.9-68.34.ELsmp RIP: 0010:[<ffffffff8025f365>] <ffffffff8025f365>{gnttab_map+82} RSP: 0018:000001001fc01d38 EFLAGS: 00010282 RAX: ffffffffffffffea RBX: 0000000000020000 RCX: 00000000ffffffff RDX: 00000000ffffffff RSI: 000001001fc01d38 RDI: 0000000000000007 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000246 R10: 0000000000000246 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000001000 R14: 0000000000000000 R15: 00000000f3000000 FS: 0000000000000000(0000) GS:ffffffff8050c380(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000000045efc0 CR3: 0000000000101000 CR4: 00000000000006e0 Process swapper (pid: 1, threadinfo 000001001fc00000, task 00000100011e47f0) Stack: 0000000100007ff0 00000000ffffffff 00000001000f3000 0000000000000020 0000000000000000 ffffffff8025f67e 0000000000001000 0000000000000000 00000000f3000000 ffffffff801763a9 Call Trace:<ffffffff8025f67e>{gnttab_resume+68} <ffffffff801763a9>{alloc_page_interleave+61} <ffffffff8052f7ed>{gnttab_init+132} <ffffffff8025edd2>{platform_pci_init+889} <ffffffff801f8a0c>{pci_device_probe+110} <ffffffff8024f075>{bus_match+57} <ffffffff8024f173>{driver_attach+68} <ffffffff8024f48f>{bus_add_driver+143} <ffffffff801f877c>{pci_register_driver+119} <ffffffff8052f741>{platform_pci_module_init+13} <ffffffff8010c58a>{init+474} <ffffffff80110f87>{child_rip+8} <ffffffff8020d5f8>{acpi_ds_init_one_object+0} <ffffffff8010c3b0>{init+0} <ffffffff80110f7f>{child_rip+0} Code: 0f 0b 27 47 34 80 ff ff ff ff 0f 02 ff c9 eb a7 48 83 c4 28 RIP <ffffffff8025f365>{gnttab_map+82} RSP <000001001fc01d38> <0>Kernel panic - not syncing: Oops
-68.31.EL looks to be fine. I suspect the regression was introduced in -68.32.EL. The delta between 31 and 32 is mostly xen patches, so this does look more like a xen bug.
Jeff, Yeah, I do believe this is a Xen bug, but I do not believe it is a dup of 442298. Don Dutile is aware of this problem and is working on it; I'll CC him here. Chris Lalancette
Thanks Chris... Yep, confirmed. I just tested a kernel with the patch for 442298 and it still panics on boot. Let me know if you need me to test a patch or anything...
Jeff, Found the source of the bug; the > 3VNIF patch introduced new code for the pv-on-hvm code path, but it (obviously) was never tested. RHEL4.7 actually has pv-on-hvm support built into it, and thus, crashes when booted as a FV guest. I have found a fix (and a few more memory-leaking bugs & fixes), which I will be posting today.
Created attachment 302504 [details] Posted patch for 4.7
Created attachment 304205 [details] screen shot of kernel panic with Xen Can you confirm that the attached call trace is the same as the one in comment #0? It looks the same to me but I get this when I try to install the guest. This is with RHEL4-U7-re20080424.0/x86_64/FV. Thanks.
Committed in 70.EL. RPMS are available at http://people.redhat.com/vgoyal/
Vivek, I can confirm that the panic is not there. I was able to boot a guest with the 70.EL kernel to the installation program. However I encountered a new bug and will need some help to triage it. Anaconda (stage2) can't open device /dev/xvda (the disk) while probing for available disk space (this is in the beginning of the install). This is a parted error. Trying to open the device with parted/fdisk from the console (while in the install environment) doesn't work also. One of the anaconda folks told me that this is probably kernel's fault as device perms seemed OK to him.
Created attachment 305937 [details] can't open /dev/xvda error Additional info: this is RHEL4-U7-re20080515.0. ia64 - FV guest works fine x86_64 - FV guest leads to the above error
Something is wrong with the way you started that installation. The only way anaconda should be looking for /dev/xvda is if it was starting a PV installation, not a FV installation. What was the virt-install command-line you used, or, alternatively, what were the steps you went through in virt-manager to start this install? Chris Lalancette
Ignore comment #12 and #13 - filed as bug #447315 I can confirm that kernel panic is gone.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html