Description of problem: 1 .Boot a VM with a assigned NIC.follow message is shown is dmesg output. and it failed to get ip . 2. in guest. ethtool eth0 reported wrong speed .( if it is another issue . i will open a new bug for it .please let me know ) #dmesg eth0: Detected Tx Unit Hang: TDH <3> TDT <3> next_to_use <3> next_to_clean <0> buffer_info[next_to_clean]: time_stamp <fffd6b62> next_to_watch <0> jiffies <fffd89f0> next_to_watch.status <0> NETDEV WATCHDOG: eth0: transmit timed out eth0: Link is Up 100 Mbps Full Duplex, Flow Control: None eth0: 10/100 speed: disabling TSO #ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 2 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbag Wake-on: g Current message level: 0x00000001 (1) Link detected: yes Version-Release number of selected component (if applicable): kvm-83-90.el5 Red Hat Enterprise Virtualization Hypervisor release 5.4-2.0.99 (11) How reproducible: 100% Steps to Reproduce: 1.Enable Vt-d in BOIS. 2.unbind the NIC from host 3.reload kvm kvm-intel 4.start the guest. /usr/libexec/qemu-kvm -no-hpet -usbdevice tablet -rtc-td-hack -drive file=/data/images/images/RHEL-Server-5.3-32.qcow2,media=disk,if=ide,cache=off,index=0 -smp 2 -m 2048 -cpu qemu64,+sse2 -vnc :12 -monitor stdio -net none -boot c -pcidevice host=00:19.0 Actual results: output of ifconfig command in guest show eth0 has no ip addr Expected results: Additional info: 1 The NIC information (on host before unbinding ): # lspci | grep Ethernet 00:19.0 Ethernet controller: Intel Corporation 82567LM-3 Gigabit Network Connection (rev 02) # ls -l /sys/bus/pci/devices/0000:00:19.0/driver lrwxrwxrwx 1 root root 0 Jul 17 21:44 /sys/bus/pci/devices/0000:00:19.0/driver -> ../../../bus/pci/drivers/e1000e # ls -l /sys/class/net/eth0/device lrwxrwxrwx 1 root root 0 Jul 17 21:45 /sys/class/net/eth0/device -> ../../../devices/pci0000:00/0000:00:19.0 # grep eth0 /proc/interrupts 74: 63 0 0 7708638 PCI-MSI eth0 2. ethtool ON host (before unbind ) ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 2 Transceiver: internal Auto-negotiation: on Supports Wake-on: d Wake-on: d Current message level: 0x00000001 (1) Link detected: yes
Can you reproduce this when running with virsh?
Try virsh on the rhev-h. after modify the guest. I can not restart guest. virsh # start TestNIC error : Failed to start domain TestNIC error : server closed connection # service libvirtd status libvirtd dead but pid file exits from bugzilla . I found bug 513317 . lihuang -> Chris: any other way to debug this issue ?
have the same mistake as bug 513317. Retested . have another issue block me from reproduce with virsh. virsh # start TestNIC error: Failed to start domain TestNIC error: internal error unable to start guest: char device redirected to /dev/pts/1 char device redirected to /dev/pts/2 get_real_device: /sys/bus/pci/devices/0000:00:19.0/config: Permission denied init_assigned_device: Error: Couldn't get real device (00:19.0)! Failed to initialize assigned device host=00:19.0
Retest on RHEL5u4 Host ( both with virsh and virt-manager ) [root@localhost ~]# rpm -q kernel kernel-2.6.18-159.el5 [root@localhost ~]# rpm -q kvm kvm-83-94.el5 Can reproduce the issue in comment #0
Can you collect "lspci -vvv -xxxx -s 00:19.0" and "dmesg" from the host before unbinding and assigning to the guest? From the guest can you also collect lspci -vvv -xxxx -s [pci device in guest, like 00:04.0] as well as dmesg in the guest after assigning the device to the guest and starting the guest?
1 lspci in host # lspci -vvv -xxxx -s 00:19.0 00:19.0 Ethernet controller: Intel Corporation 82567LM-3 Gigabit Network Connection (rev 02) Subsystem: Dell Unknown device 027f Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin A routed to IRQ 82 Region 0: Memory at febe0000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at febd9000 (32-bit, non-prefetchable) [size=4K] Region 2: I/O ports at ecc0 [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+ Address: 00000000fee02000 Data: 4052 Capabilities: [e0] #13 [0306] 00: 86 80 de 10 07 01 10 00 02 00 00 02 00 00 00 00 10: 00 00 be fe 00 90 bd fe c1 ec 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 7f 02 30: 00 00 00 00 c8 00 00 00 00 00 00 00 03 01 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 01 d0 22 c8 00 20 00 0d d0: 05 e0 81 00 00 20 e0 fe 00 00 00 00 52 40 00 00 e0: 13 00 06 03 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2 lspci in guest # lspci -vvv -xxxx -s 00:05.0 00:05.0 Ethernet controller: Intel Corporation 82567LM-3 Gigabit Network Connection (rev 02) Subsystem: Dell Unknown device 027f Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Interrupt: pin A routed to IRQ 217 Region 0: Memory at c4020000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at c4040000 (32-bit, non-prefetchable) [size=4K] Region 2: I/O ports at c220 [size=32] Capabilities: [40] Message Signalled Interrupts: 64bit- Queue=0/0 Enable+ Address: fee00000 Data: 40d9 00: 86 80 de 10 07 04 10 00 02 00 00 02 00 00 00 00 10: 00 00 02 c4 00 00 04 c4 21 c2 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 7f 02 30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00 40: 05 00 01 00 00 00 e0 fe d9 40 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 01 d0 22 c8 00 20 00 0d d0: 05 e0 81 00 00 00 e0 fe 00 00 00 00 6b 40 00 00 e0: 13 00 06 03 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Created attachment 355490 [details] dmesg in host
Created attachment 355491 [details] dmesg in guest
Can you try a RHEL 5.4 guest? The RHEL 5.3 guest has an older e1000e driver in it, and it is recognizing the device incorrectly as an ich9lan, but it is ich10lan. You actually only need to update the kernel in the guest.
Created attachment 355932 [details] dmesg in RHEL5u4 guest can reproduce in RHEL5u4. (kernel-2.6.18-160.el5.x86_64 ) attached is dmesg in guest. following dmesg log is given after guest is start up. PCI: Enabling device 0000:00:19.0 (0100 -> 0103) ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 21 (level, low) -> IRQ 58 PM: Writing back config space on device 0000:00:19.0 at offset 6 (was 1, writing ecc1) PM: Writing back config space on device 0000:00:19.0 at offset 5 (was 0, writing febd9000) PM: Writing back config space on device 0000:00:19.0 at offset 4 (was 0, writing febe0000) PM: Writing back config space on device 0000:00:19.0 at offset 1 (was 100000, writing 100400) and also can reproduce in Fedore 11 ( kernel-2.6.29.4-167.f11.x86_64 )
(In reply to comment #11) > following dmesg log is given after guest is start up. > PCI: Enabling device 0000:00:19.0 (0100 -> 0103) > ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 21 (level, low) -> IRQ 58 > PM: Writing back config space on device 0000:00:19.0 at offset 6 (was 1, > writing ecc1) > PM: Writing back config space on device 0000:00:19.0 at offset 5 (was 0, > writing febd9000) > PM: Writing back config space on device 0000:00:19.0 at offset 4 (was 0, > writing febe0000) > PM: Writing back config space on device 0000:00:19.0 at offset 1 (was 100000, > writing 100400) This looks like the host device driver is still active.
I should have noticed this earlier, but I think the problem is DMA related. The IOMMU (VT-d) is not enabled. Add intel_iommu=on to enable VT-d. Unfortunately, enabling VT-d on this machine causes a panic. Next step is to get the DMAR table, and try to figure out why the box is panicking. But this should probably be a new bug.
Created attachment 356114 [details] Panic with intel_iommu=on This is panicking due to a failure to allocate the global iommu array. The failure jumps to a cleanup routine which is causing the panic (NMI triggers due to lockup). Unclear why the allocation is failing unless there's some issue w/ DMAR tables, it's acting as if the number of IOMMUs is ridiculously large.
Created attachment 356202 [details] DMAR table
The Hardware information : processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz stepping : 10 cpu MHz : 2660.002 cache size : 3072 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx smx est tm2 cx16 xtpr lahf_lm bogomips : 5319.97 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: # dmidecode -t 0 # dmidecode 2.9 SMBIOS 2.5 present. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: Dell Inc. Version: A02 Release Date: 02/18/2009 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 4096 kB Characteristics: PCI is supported PNP is supported APM is supported BIOS is upgradeable BIOS shadowing is allowed ESCD support is available Boot from CD is supported Selectable boot is supported EDD is supported Japanese floppy for Toshiba 1.2 MB is supported (int 13h) 3.5"/720 KB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) ACPI is supported USB legacy is supported BIOS boot specification is supported Function key-initiated network boot is supported Targeted content distribution is supported BIOS Revision: 3.0
Googled . seems Xen also suffer from the panic. http://www.nabble.com/Xen-3.4.1-rc8-iommu-crash-on-Optiplex-760-(-Core2-E8400,-Q43---ICH10D-)-td24649271.html hw issue ?
Created attachment 356323 [details] host hang Tested on another host :Nahalem E5504 Host hang (see the screenshot) # dmidecode 2.9 SMBIOS 2.6 present. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: LENOVO Version: 60KT23AUS Release Date: 04/03/2009 Address: 0xE1F80 Runtime Size: 123008 bytes ROM Size: 2048 kB Characteristics: PCI is supported PNP is supported BIOS is upgradeable BIOS shadowing is allowed ESCD support is available Boot from CD is supported EDD is supported ACPI is supported USB legacy is supported Smart battery is supported BIOS boot specification is supported Targeted content distribution is supported BIOS Revision: 1.23 processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz stepping : 5 cpu MHz : 1596.000 cache size : 4096 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 6 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc nonstop_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm bogomips : 3990.00 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: [8] [root@intel-5504-12-1 ~]# lspci 00:00.0 Host bridge: Intel Corporation X58 I/O Hub to ESI Port (rev 13) 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13) 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13) 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13) 00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 13) 00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 13) 00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 13) 00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13) 00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13) 00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13) 00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13) 00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13) 00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13) 00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13) 00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13) 00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4 00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5 00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2 00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller 00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 1 00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 5 00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1 00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2 00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3 00:1d.3 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6 00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller 03:00.0 VGA compatible controller: nVidia Corporation G96 [Quadro FX 380] (rev a1) 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express (rev 02)
lihuang, can you disable the onboard NIC (the Intel Corporation 82567LM-3 Gigabit Network) and try plugging in another NIC like a PCI e1000 or something? Intel thinks this could be a case of the NIC is not handling DMA properly so trying another NIC is the first step.
On the OPTIPLEX 760 box: Disabled the onboard NIC in BOIS: Settings -System Configuration -Intergrated NIC -Disable [root@lihuang 105]# cu -l ttyS0 -s 115200 Connected. Linux version 2.6.18-160.el5 (mockbuild.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Mon Jul 27 17:28:29 EDT 2009 Command line: ro root=/dev/HostVG/Root roottypefs=ext3 crashkernel=128M@16M elevator=deadline intel_iommu=on console=tty0 console=ttyS0,115200n8 BIOS-provided physical RAM map: BIOS-e820: 0000000000010000 - 000000000009fc00 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000bfdffc00 (usable) BIOS-e820: 00000000bfdffc00 - 00000000bfe53c00 (ACPI NVS) BIOS-e820: 00000000bfe53c00 - 00000000bfe55c00 (ACPI data) BIOS-e820: 00000000bfe55c00 - 00000000c0000000 (reserved) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved) BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved) BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000238000000 (usable) DMI 2.5 present. >>> ERROR: Invalid checksum No NUMA configuration found Faking a node at 0000000000000000-0000000238000000 Bootmem setup node 0 0000000000000000-0000000238000000 ACPI: PM-Timer IO Port: 0x808 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) Processor #2 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) Processor #3 7:7 APIC version 20 ACPI: LAPIC (acpi_id[0x05] lapic_id[0x00] disabled) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x01] disabled) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x02] disabled) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x03] disabled) ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1]) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Setting APIC routing to physical flat ACPI: HPET id: 0x8086a701 base: 0xfed00000 Using ACPI (MADT) for SMP configuration information Nosave address range: 000000000009f000 - 00000000000f0000 Nosave address range: 00000000000f0000 - 0000000000100000 Nosave address range: 00000000bfdff000 - 00000000bfe00000 Nosave address range: 00000000bfe00000 - 00000000bfe53000 Nosave address range: 00000000bfe53000 - 00000000bfe54000 Nosave address range: 00000000bfe54000 - 00000000bfe55000 Nosave address range: 00000000bfe55000 - 00000000bfe56000 Nosave address range: 00000000bfe56000 - 00000000c0000000 Nosave address range: 00000000c0000000 - 00000000e0000000 Nosave address range: 00000000e0000000 - 00000000f0000000 Nosave address range: 00000000f0000000 - 00000000fec00000 Nosave address range: 00000000fec00000 - 00000000fed00000 Nosave address range: 00000000fed00000 - 00000000fed20000 Nosave address range: 00000000fed20000 - 00000000feda0000 Nosave address range: 00000000feda0000 - 00000000fee00000 Nosave address range: 00000000fee00000 - 00000000fef00000 Nosave address range: 00000000fef00000 - 00000000ffb00000 Nosave address range: 00000000ffb00000 - 0000000100000000 Allocating PCI resources starting at c2000000 (gap: c0000000:20000000) SMP: Allowing 8 CPUs, 4 hotplug CPUs Built 1 zonelists. Total pages: 2030658 Kernel command line: ro root=/dev/HostVG/Root roottypefs=ext3 crashkernel=128M@16M elevator=deadline intel_iommu=on console=tty0 console=ttyS0,115200n8 Intel-IOMMU: enabled Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour VGA+ 80x25 Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) Checking aperture... Memory: 7971072k/9306112k available (2547k kernel code, 283960k reserved, 1289k data, 208k init) Calibrating delay loop (skipped), value calculated using timer frequency.. 5320.00 BogoMIPS (lpj=2660003) Security Framework v1.0.0 initialized SELinux: Initializing. selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 256 CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 3072K using mwait in idle threads. CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 CPU0: Thermal monitoring enabled (TM2) SMP alternatives: switching to UP code ACPI: Core revision 20060707 Using local APIC timer interrupts. result 20781263 Detected 20.781 MHz APIC timer. SMP alternatives: switching to SMP code Booting processor 1/4 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 5319.96 BogoMIPS (lpj=2659982) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 3072K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 CPU1: Thermal monitoring enabled (TM2) Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz stepping 0a CPU 1: Syncing TSC to CPU 0. CPU 1: synchronized TSC with CPU 0 (last diff -2197 cycles, maxerr 320 cycles) SMP alternatives: switching to SMP code Booting processor 2/4 APIC 0x2 Initializing CPU#2 Calibrating delay using timer specific routine.. 5320.00 BogoMIPS (lpj=2660000) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 3072K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 2 CPU2: Thermal monitoring enabled (TM2) Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz stepping 0a CPU 2: Syncing TSC to CPU 0. CPU 2: synchronized TSC with CPU 0 (last diff -1028 cycles, maxerr 568 cycles) SMP alternatives: switching to SMP code Booting processor 3/4 APIC 0x3 Initializing CPU#3 Calibrating delay using timer specific routine.. 5319.98 BogoMIPS (lpj=2659992) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 3072K CPU: Physical Processor ID: 0 CPU: Processor Core ID: 3 CPU3: Thermal monitoring enabled (TM2) Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz stepping 0a CPU 3: Syncing TSC to CPU 0. CPU 3: synchronized TSC with CPU 0 (last diff -1032 cycles, maxerr 560 cycles) Brought up 4 CPUs testing NMI watchdog ... OK. time.c: Using 14.318180 MHz WALL HPET GTOD HPET timer. time.c: Detected 2660.003 MHz processor. migration_cost=24,2212 checking if image is initramfs... it is Freeing initrd memory: 7508k freed NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using MMCONFIG at e0000000 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: ACPI Dock Station Driver: 1 docks/bays found ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Transparent bridge - 0000:00:1e.0 ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 9 10 11 12 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *9 10 11 12 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 *10 11 12 15) ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 11 12 15) *0, disabled. ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 11 12 15) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 *5 6 7 9 10 11 12 15) ACPI: PCI Interrupt Link [LNKH] (IRQs *3 4 5 6 7 9 10 11 12 15) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 9 devices usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default hpet0: at MMIO 0xfed00000 (virtual 0xffffffffff5fe000), IRQs 2, 8, 0, 0, 0, 0, 0, 0 hpet0: 8 64-bit timers, 14318180 Hz DMAR:Host address width 36 DMAR:DRHD (flags: 0x00000000)base: 0x00000000fedc1000 DMAR:DRHD (flags: 0x00000000)base: 0x00000000fedc3000 DMAR:DRHD (flags: 0x00000001)base: 0x00000000fedc4000 DMAR:RMRR base: 0x00000000bfe58000 end: 0x00000000bfe6ffff Allocating domain array failed NMI Watchdog detected LOCKUP on CPU 0 CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.18-160.el5 #1 RIP: 0010:[<ffffffff80064bdb>] [<ffffffff80064bdb>] .text.lock.spinlock+0x11/0x30 RSP: 0000:ffff810107d39e48 EFLAGS: 00000086 RAX: 0000000000000286 RBX: ffff810236926340 RCX: 0000000000000000 RDX: 000000000000002b RSI: 0000000007bf5840 RDI: ffff810236926390 RBP: ffff810236926340 R08: 0000000000000001 R09: ffff8100090503d4 R10: 0000000000000001 R11: ffffffff8016bbde R12: 0000000000040000 R13: 00000000fffffff4 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff803c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 Process swapper (pid: 1, threadinfo ffff810107d38000, task ffff810107d317a0) Stack: ffffffff80161f96 00000000000000d0 ffff810236926340 ffff810236926340 000000000000000b 00000000fffffff4 ffffffff8015f229 ffff810236ab5ac0 ffffffff804152fe 0000000000000000 0000000000000000 ffffffff8042bb28 Call Trace: [<ffffffff80161f96>] free_dmar_iommu+0x1aa/0x218 [<ffffffff8015f229>] free_iommu+0xe/0x26 [<ffffffff804152fe>] intel_iommu_init+0x8bc/0x99d [<ffffffff80403d93>] pci_iommu_init+0xe/0x21 [<ffffffff803fba56>] init+0x1f9/0x2f7 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff8017bac6>] acpi_ds_init_one_object+0x0/0x80 [<ffffffff803fb85d>] init+0x0/0x2f7 [<ffffffff8005dfa7>] child_rip+0x0/0x11 Code: 7e f9 e9 f9 fe ff ff f3 90 83 3f 00 7e f9 e9 f8 fe ff ff f3 Kernel panic - not syncing: nmi watchdog BUG: warning at kernel/panic.c:137/panic() (Not tainted) Call Trace: <NMI> [<ffffffff80090a49>] panic+0x1da/0x1eb [<ffffffff8006ba4a>] _show_stack+0xdb/0xea [<ffffffff8006bb3d>] show_registers+0xe4/0x100 [<ffffffff80065295>] die_nmi+0x66/0xa3 [<ffffffff800659db>] nmi_watchdog_tick+0x157/0x1d3 [<ffffffff800655f9>] default_do_nmi+0x81/0x225 [<ffffffff80065866>] do_nmi+0x43/0x61 [<ffffffff80064ebf>] nmi+0x7f/0x88 [<ffffffff8016bbde>] vgacon_cursor+0x0/0x1a5 [<ffffffff80064bdb>] .text.lock.spinlock+0x11/0x30 <<EOE>> [<ffffffff80161f96>] free_dmar_iommu+0x1aa/0x218 [<ffffffff8015f229>] free_iommu+0xe/0x26 [<ffffffff804152fe>] intel_iommu_init+0x8bc/0x99d [<ffffffff80403d93>] pci_iommu_init+0xe/0x21 [<ffffffff803fba56>] init+0x1f9/0x2f7 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff8017bac6>] acpi_ds_init_one_object+0x0/0x80 [<ffffffff803fb85d>] init+0x0/0x2f7 [<ffffffff8005dfa7>] child_rip+0x0/0x11
This looks like buggy hardware. From the dmesg: ddd: Number of Domains supportd <262144> That suggests the hardware is reporting an illegal ND value in the IOMMU Capability Register. As per the spec: "111b: Reserved." Can we get the iommu->cap field dumped w/ a debug build to verify that's the case?
Created attachment 357524 [details] dmidecode > dmidecode.txt
Created attachment 357526 [details] boot log v2
Created attachment 357527 [details] boot log v2 ( with debug )
THere is something wrong with this hardware. IOMMU fedc1000: ver 15:15 cap ffffffffffffffff ecap ffffffffffffffff IOMMU fedc3000: ver 15:15 cap ffffffffffffffff ecap ffffffffffffffff IOMMU fedc4000: ver 15:15 cap ffffffffffffffff ecap ffffffffffffffff Register reads from each DMA Remapping Unit are returning all 1's. It looks like one of: - the IOMMU is disable in BIOS, but BIOS is still provding DMAR table - the DMAR table is pointing to an incorrect address for the Register Base Address - the IOMMU is simply broken In any of these cases, this looks like a hardware or BIOS problem. Best we can do is blacklist this platform and never enable the IOMMU.
Closing this bug since it is a HW issue.- bad BIOS ( seems a little strange to close a urgent bug as NOTABUG... so i reset the pri. correct me if wrong. )