Bug 521143
| Summary: | Hard lockup of igb network driver with kernel-xen dom0 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Rich Graves <rgraves> |
| Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> |
| Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 5.4 | CC: | agospoda, bbraswel, clalance, ddugger, drjones, gasmith, gregory.v.rose, igeorgex, john.ronciak, jtluka, jvillalo, keve.a.gabbert, kraxel, Litton.Peng, pbonzini, rpacheco, sassmann, seanos, sibai.li, tao, xen-maint, zwlu |
| Target Milestone: | rc | ||
| Target Release: | 5.5 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2009-12-23 12:53:03 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Rich Graves
2009-09-03 20:33:28 UTC
Was this exact NIC (quad-port 82575) working on 5.3 under Xen? > Was this exact NIC (quad-port 82575) working on 5.3 under Xen?
Yes. I have a (nearly) identical system using a quad-port 82575 under kernel-xen-2.6.18-128.4.1.el5 right now.
Being down a system, I can't make disruptive changes to the working 5.3 box, but could gather information.
On the other hand, I have a quite similarly configured Dell 1950 that works with kernel-xen-2.6.18-164.el5, xen, and 82575GB. The most visible difference is that xen1 (broken 2950) boots from software RAID1 over mptbase/mptscsi and xen2 (working 1950) boots from multipathed SAN over ISP2432. My third similar host, xen0, is a 2950 booting from software RAID1. [root@xen2 ~]# lspci 00:00.0 Host bridge: Intel Corporation 5000X Chipset Memory Controller Hub (rev 12) 00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 2 (rev 12) 00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 3 (rev 12) 00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev 12) 00:05.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 5 (rev 12) 00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 6-7 (rev 12) 00:07.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 7 (rev 12) 00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 12) 00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 12) 00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 12) 00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 12) 00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev 12) 00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 12) 00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 12) 00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09) 00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09) 00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09) 00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09) 00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9) 00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09) 00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09) 01:00.0 PCI bridge: Intel Corporation 6702PXH PCI Express-to-PCI Bridge A (rev 09) 02:08.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01) 03:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c2) 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 11) 05:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01) 05:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01) 06:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01) 06:01.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E2 (rev 01) 07:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c2) 08:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 11) 0b:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02) 0b:00.1 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 02) 0d:00.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI Express Switch (rev 0e) 0e:02.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI Express Switch (rev 0e) 0e:04.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI Express Switch (rev 0e) 0f:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network Connection (rev 02) 0f:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network Connection (rev 02) 10:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network Connection (rev 02) 10:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network Connection (rev 02) 12:0d.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) To clarify, I've got 3 similar systems, each with 2 on-board bnx2's, a qla24xx, and a quad GigE 82575GB (2 ports in use). xen0: Dell 2950, rhel 5.3, boot from sw raid 1 (md), working ok xen1: Dell 2950, worked ok under rhel 5.3. boot from sw raid 1 (md). after upgrade to 5.4, loading the igb driver causes serious problems. nearly identical config to xen0 -- in fact the OS was "installed" by breaking/reestablishing the raid array. xen2: Dell 1950, just upgraded to rhel 5.4, boot from SAN (qla2xxx), appears to work. except for boot device, appears *very* similar to xen1. REMOVING IGB DRIVER CAUSES CRASH. OK, so xen2, unlike xen1, "works," but I can crash it by downing all interfaces and then "rmmod igb". Definitely something wrong with that driver and xen. I rebuilt kernel-xen-2.6.28-164 with RHEL 5.3's 1.2.45-k2 version of the igb driver (installed SRPM, hacked the old drivers/net/igb into place, rebuilt; no other changes). Reverting to the old driver does *not* resolve the problem. Disabling loading of igb in modprobe.conf, then booting in single-user mode, then manually loading igb, then /sbin/telinit 3 appears to work. So is it the order in which some unknown set of kernel modules load? Maddening. At the kernel command line, I've got dom0_mem=1024M. What more information can I usefully provide? I ran across bug #508870 which includes some disturbing comments. Is there a kernel I can try post-128 but pre-154? I've tried kernels 159, 162 and 165 from people.redhat.com/dzickus/el5 and 2.6.18-164.el5bz518338xen from clalance. They all lock up in the same way. The last 5.3 kernel works. The system also works, or at least "works," with the 5.4 xen.gz, with grub.conf: title Hack 2.6.18-128.7.1.el5xen with xen.gz-2.6.18-164 root (hd0,0) kernel /xen.gz-2.6.18-164.el5 dom0_mem=1024M module /vmlinuz-2.6.18-128.7.1.el5xen ro root=/dev/md2 elevator=noop module /initrd-2.6.18-128.7.1.el5xen.img pci=assign-busses as suggested at http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg01763.html has no effect (I am not trying to use multiple VFs). BUG: soft lockup - CPU#1 stuck for 183s! [irqbalance:8810] CPU 1: Modules linked in: netloop netbk blktap blkbk bridge ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport sg floppy ata_piix i5000_edac edac_mc libata pcspkr serial_core igb 8021q bnx2 ide_cd cdrom serio_raw dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 8810, comm: irqbalance Not tainted 2.6.18-164.el5bz518338xen #1 RIP: e030:[<ffffffff8020622a>] [<ffffffff8020622a>] hypercall_page+0x22a/0x1000 RSP: e02b:ffff8800307abe58 EFLAGS: 00000246 RAX: 0000000000030001 RBX: 0000000000000000 RCX: ffffffff8020622a RDX: ffffffffff578000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88003cf708c0 R08: 00000000ffffffff R09: 000000000000003c R10: ffffffff8072de60 R11: 0000000000000246 R12: 0000000000000010 R13: ffff88003e48bf40 R14: ffffffff805d34bc R15: 0000000000000000 FS: 00002b22077f16e0(0000) GS:ffffffff805ca080(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Call Trace: [<ffffffff8031a148>] file_has_perm+0x94/0xa3 [<ffffffff803ae05d>] force_evtchn_callback+0xa/0xb [<ffffffff8026dd3c>] show_interrupts+0x17f/0x28e [<ffffffff80241228>] seq_read+0x1b8/0x28c [<ffffffff8020bbaf>] vfs_read+0xcb/0x171 [<ffffffff802124ad>] sys_read+0x45/0x6e [<ffffffff802602f9>] tracesys+0xab/0xb6 BUG: soft lockup - CPU#0 stuck for 184s! [swapper:0] CPU 0: Modules linked in: netloop netbk blktap blkbk bridge ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables cpufreq_ondemand acpi_cpufreq freq_table dm_round_robin dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport sg floppy ata_piix i5000_edac edac_mc libata pcspkr serial_core igb 8021q bnx2 ide_cd cdrom serio_raw dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-164.el5bz518338xen #1 RIP: e030:[<ffffffff802063aa>] [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 RSP: e02b:ffffffff8063bf58 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff802063aa RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001 RBP: 0000000000000000 R08: ffffffff8063a000 R09: 0000000000000000 R10: ffff880000028b00 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00002afaa2afedd0(0000) GS:ffffffff805ca000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Call Trace: [<ffffffff8029992b>] rcu_pending+0x26/0x50 [<ffffffff8026f4d5>] raw_safe_halt+0x84/0xa8 [<ffffffff8026ca50>] xen_idle+0x38/0x4a [<ffffffff8024afa1>] cpu_idle+0x97/0xba [<ffffffff80644b05>] start_kernel+0x21f/0x224 [<ffffffff806441e5>] _sinittext+0x1e5/0x1eb igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX mptscsih: ioc0: attempting task abort! (sc=ffff88002b25ab40) sd 0:0:0:0: command: Write(10): 2a 00 00 87 15 1b 00 00 60 00 mptscsih: ioc0: task abort: SUCCESS (sc=ffff88002b25ab40) mptscsih: ioc0: attempting task abort! (sc=ffff88002b25a9c0) sd 0:0:1:0: command: Write(10): 2a 00 00 87 15 1b 00 00 60 00 mptscsih: ioc0: task abort: SUCCESS (sc=ffff88002b25a9c0) (In reply to comment #6) > I ran across bug #508870 which includes some disturbing comments. Is there a > kernel I can try post-128 but pre-154? > > I've tried kernels 159, 162 and 165 from people.redhat.com/dzickus/el5 and > 2.6.18-164.el5bz518338xen from clalance. They > all lock up in the same way. > > The last 5.3 kernel works. The system also works, or at least "works," with the > 5.4 xen.gz, with grub.conf: > > title Hack 2.6.18-128.7.1.el5xen with xen.gz-2.6.18-164 > root (hd0,0) > kernel /xen.gz-2.6.18-164.el5 dom0_mem=1024M > module /vmlinuz-2.6.18-128.7.1.el5xen ro root=/dev/md2 > elevator=noop > module /initrd-2.6.18-128.7.1.el5xen.img OK, thanks, this is good information. It clearly shows that the problem then is in the dom0 kernel (as opposed to the hypervisor). I can provide you with a pre-154 kernel, here: http://people.redhat.com/clalance/bz521143/kernel-xen-2.6.18-153.el5.x86_64.rpm. May I ask, however, what went into the -154 kernel that makes you suspect it? Thanks, Chris Lalancette >> title Hack 2.6.18-128.7.1.el5xen with xen.gz-2.6.18-164 >> root (hd0,0) >> kernel /xen.gz-2.6.18-164.el5 dom0_mem=1024M >> module /vmlinuz-2.6.18-128.7.1.el5xen ro root=/dev/md2 > > OK, thanks, this is good information. It clearly shows that the problem then > is in the dom0 kernel (as opposed to the hypervisor). Not necessarily. The reverse also "works." | title Hack xen.gz-2.6.18-128.7.1 with 2.6.18-164.el5xen | root (hd0,0) | kernel /xen.gz-2.6.18-128.7.1.el5 dom0_mem=1024M | module /vmlinuz-2.6.18-164.el5xen ro root=/dev/md2 elevator=noop | module /initrd-2.6.18-164.el5xen.img So it's neither the xernel nor the hypervisor. It's the intersection thereof. Both the (dom0-128 over xen-164) and (dom0-164 over xen-128) combinations succeed configuring all bnx2 and igb network interfaces and booting a xen guest. Both combinations appear unable to save/restore or live-migrate guests, however, so they're not really usable. One clear (?) difference is that in the "working" cases, both mixed-128 and all-128, igb is using legacy interrupts; and in the "not working" cases, igb is using MSI-X. > May I ask, however, what went into the -154 kernel that makes you suspect it? Bug #503818 was (in part) added for igb* drivers, and has some objections from engineers to the relatively untested change. 153 didn't help me, though. Same problem. Bad: Sep 10 10:08:42 xen1 kernel: igb 0000:0b:00.0: Intel(R) Gigabit Ethernet Network Connection Sep 10 10:08:42 xen1 kernel: igb 0000:0b:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:40 Sep 10 10:08:42 xen1 kernel: igb 0000:0b:00.0: eth0: PBA No: d96950-006 Sep 10 10:08:42 xen1 kernel: igb 0000:0b:00.0: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s) Sep 10 10:08:42 xen1 kernel: ACPI: PCI Interrupt 0000:0b:00.1[B] -> GSI 16 (level, low) -> IRQ 169 Sep 10 10:08:42 xen1 kernel: igb 0000:0b:00.1: Disabling ASPM L0s upstream switch port 0000:0a:02.0 Sep 10 10:08:42 xen1 kernel: Floppy drive(s): fd0 is 1.44M Sep 10 10:08:42 xen1 kernel: FDC 0 is a post-1991 82077 7c25b Sep 10 10:08:42 xen1 kernel: ACPI: PCI Interrupt 0000:08:00.0[A] -> GSI 16 (level, low) -> IRQ 169 Sep 10 10:08:42 xen1 kernel: igb 0000:0b:00.1: Intel(R) Gigabit Ethernet Network Connection Sep 10 10:08:42 xen1 kernel: igb 0000:0b:00.1: eth2: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:41 Sep 10 10:08:42 xen1 kernel: igb 0000:0b:00.1: eth2: PBA No: d96950-006 Sep 10 10:08:42 xen1 kernel: igb 0000:0b:00.1: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s) Sep 10 10:08:42 xen1 kernel: GSI 21 sharing vector 0xD2 and IRQ 21 Sep 10 10:08:42 xen1 kernel: ACPI: PCI Interrupt 0000:0c:00.0[A] -> GSI 17 (level, low) -> IRQ 210 Sep 10 10:08:42 xen1 kernel: igb 0000:0c:00.0: Disabling ASPM L0s upstream switch port 0000:0a:04.0 Sep 10 10:08:42 xen1 kernel: igb 0000:0c:00.0: Intel(R) Gigabit Ethernet Network Connection Sep 10 10:08:42 xen1 kernel: igb 0000:0c:00.0: eth3: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:44 Sep 10 10:08:42 xen1 kernel: igb 0000:0c:00.0: eth3: PBA No: d96950-006 Sep 10 10:08:42 xen1 kernel: igb 0000:0c:00.0: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s) Sep 10 10:08:42 xen1 kernel: GSI 22 sharing vector 0x4B and IRQ 22 Sep 10 10:08:42 xen1 kernel: ACPI: PCI Interrupt 0000:0c:00.1[B] -> GSI 18 (level, low) -> IRQ 75 Sep 10 10:08:42 xen1 kernel: igb 0000:0c:00.1: Disabling ASPM L0s upstream switch port 0000:0a:04.0 7c25d Sep 10 10:08:42 xen1 kernel: igb 0000:0c:00.1: Intel(R) Gigabit Ethernet Network Connection Sep 10 10:08:42 xen1 kernel: igb 0000:0c:00.1: eth5: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:45 Sep 10 10:08:42 xen1 kernel: igb 0000:0c:00.1: eth5: PBA No: d96950-006 Sep 10 10:08:42 xen1 kernel: igb 0000:0c:00.1: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s) "Not as Bad," kernel-xen-164 on top of xen.gz-128: map irq failed map irq failed Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.9.3 (March 17, 2009) ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 16 (level, low) -> IRQ 16 EDAC MC: Ver: 2.0.1 Sep 4 2009 EDAC MC0: Giving out device to i5000_edac.c I5000: DEV 0000:00:10.0 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled igb 0000:0b:00.0: Intel(R) Gigabit Ethernet Network Connection igb 0000:0b:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:40 igb 0000:0b:00.0: eth0: PBA No: d96950-006 igb 0000:0b:00.0: Using legacy interrupts. 1 rx queue(s), 1 tx queue(s) ACPI: PCI Interrupt 0000:0b:00.1[B] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:0b:00.1 to 64 map irq failed map irq failed intel_rng: FWH not detected Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled igb 0000:0b:00.1: Intel(R) Gigabit Ethernet Network Connection igb 0000:0b:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:41 igb 0000:0b:00.1: eth1: PBA No: d96950-006 igb 0000:0b:00.1: Using legacy interrupts. 1 rx queue(s), 1 tx queue(s) GSI 21 sharing vector 0xC8 and IRQ 21 ACPI: PCI Interrupt 0000:0c:00.0[A] -> GSI 17 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:0c:00.0 to 64 map irq failed map irq failed libata version 3.00 loaded. Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 igb 0000:0c:00.0: Intel(R) Gigabit Ethernet Network Connection igb 0000:0c:00.0: eth2: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:44 igb 0000:0c:00.0: eth2: PBA No: d96950-006 igb 0000:0c:00.0: Using legacy interrupts. 1 rx queue(s), 1 tx queue(s) GSI 22 sharing vector 0xD0 and IRQ 22 ACPI: PCI Interrupt 0000:0c:00.1[B] -> GSI 18 (level, low) -> IRQ 22 PCI: Setting latency timer of device 0000:0c:00.1 to 64 map irq failed map irq failed eth3: Broadcom NetXtreme II BCM5708 1000Base-T (B1) PCI-X 64-bit 133MHz found at mem da000000, IRQ 16, node addr 00188b47c25b ACPI: PCI Interrupt 0000:08:00.0[A] -> GSI 16 (level, low) -> IRQ 16 igb 0000:0c:00.1: Intel(R) Gigabit Ethernet Network Connection igb 0000:0c:00.1: eth4: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:45 igb 0000:0c:00.1: eth4: PBA No: d96950-006 igb 0000:0c:00.1: Using legacy interrupts. 1 rx queue(s), 1 tx queue(s) eth5: Broadcom NetXtreme II BCM5708 1000Base-T (B1) PCI-X 64-bit 133MHz found at mem d6000000, IRQ 16, node addr 00188b47c25d (In reply to comment #8) > >> title Hack 2.6.18-128.7.1.el5xen with xen.gz-2.6.18-164 > >> root (hd0,0) > >> kernel /xen.gz-2.6.18-164.el5 dom0_mem=1024M > >> module /vmlinuz-2.6.18-128.7.1.el5xen ro root=/dev/md2 > > > > OK, thanks, this is good information. It clearly shows that the problem then > > is in the dom0 kernel (as opposed to the hypervisor). > > Not necessarily. The reverse also "works." > > | title Hack xen.gz-2.6.18-128.7.1 with 2.6.18-164.el5xen > | root (hd0,0) > | kernel /xen.gz-2.6.18-128.7.1.el5 dom0_mem=1024M > | module /vmlinuz-2.6.18-164.el5xen ro root=/dev/md2 elevator=noop > | module /initrd-2.6.18-164.el5xen.img > > So it's neither the xernel nor the hypervisor. It's the intersection thereof. Ah, OK. Then it's probably the MSI support; with old HV + new dom0, the HV doesn't support MSI-X, so the hypercall fails and we fall back to legacy. With new HV + old dom0, the dom0 never tries to enable MSI-X, so we are using legacy. If you boot up the bare-metal kernel, does the igb then use MSI-X, and does it work? Chris Lalancette > If you boot up the bare-metal kernel, does the igb then use MSI-X, and
> does it work?
Yes and yes.
I tried pci=nomsi and it no longer has any effect.
[root@xen1 ~]# grep -2 -i msi /var/log/dmesg
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
--
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:02.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:02.0:pcie00]
Allocate Port Service[0000:00:02.0:pcie01]
ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:03.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:03.0:pcie00]
Allocate Port Service[0000:00:03.0:pcie01]
ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:04.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:04.0:pcie00]
Allocate Port Service[0000:00:04.0:pcie01]
PCI: Setting latency timer of device 0000:00:05.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:05.0:pcie00]
Allocate Port Service[0000:00:05.0:pcie01]
ACPI: PCI Interrupt 0000:00:06.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:06.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:06.0:pcie00]
Allocate Port Service[0000:00:06.0:pcie01]
PCI: Setting latency timer of device 0000:00:07.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:07.0:pcie00]
Allocate Port Service[0000:00:07.0:pcie01]
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:1c.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:1c.0:pcie00]
Allocate Port Service[0000:00:1c.0:pcie03]
--
ACPI: PCI Interrupt 0000:06:00.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:06:00.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:06:00.0:pcie20]
Allocate Port Service[0000:06:00.0:pcie21]
ACPI: PCI Interrupt 0000:06:01.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:06:01.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:06:01.0:pcie20]
Allocate Port Service[0000:06:01.0:pcie21]
--
Allocate Port Service[0000:09:00.0:pcie13]
PCI: Setting latency timer of device 0000:0a:02.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:0a:02.0:pcie20]
Allocate Port Service[0000:0a:02.0:pcie21]
Allocate Port Service[0000:0a:02.0:pcie23]
PCI: Setting latency timer of device 0000:0a:04.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:0a:04.0:pcie20]
Allocate Port Service[0000:0a:04.0:pcie21]
--
igb 0000:0b:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:40
igb 0000:0b:00.0: eth0: PBA No: d96950-006
igb 0000:0b:00.0: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
ACPI: PCI Interrupt 0000:0b:00.1[B] -> GSI 16 (level, low) -> IRQ 169
igb 0000:0b:00.1: Disabling ASPM L0s upstream switch port 0000:0a:02.0
--
igb 0000:0b:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:41
igb 0000:0b:00.1: eth1: PBA No: d96950-006
igb 0000:0b:00.1: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
GSI 21 sharing vector 0xD2 and IRQ 21
ACPI: PCI Interrupt 0000:0c:00.0[A] -> GSI 17 (level, low) -> IRQ 210
--
igb 0000:0c:00.0: eth3: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:44
igb 0000:0c:00.0: eth3: PBA No: d96950-006
igb 0000:0c:00.0: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
GSI 22 sharing vector 0x4B and IRQ 22
ACPI: PCI Interrupt 0000:0c:00.1[B] -> GSI 18 (level, low) -> IRQ 75
--
igb 0000:0c:00.1: eth5: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:45
igb 0000:0c:00.1: eth5: PBA No: d96950-006
igb 0000:0c:00.1: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
The good news is that everything including dynamically unloading/reloading the igb driver works with the bare-metal kernel. The bad news is that it probably shouldn't be allowing me to remove the driver for a running interfce, should it?
[root@xen1 ~]# date; rmmod igb; sleep 5; tail -36 /var/log/messages
Thu Sep 10 11:33:37 CDT 2009
Sep 10 11:33:38 xen1 kernel: ACPI: PCI interrupt for device 0000:0b:00.1 disabled
Sep 10 11:33:39 xen1 kernel: ACPI: PCI interrupt for device 0000:0b:00.0 disabled
Sep 10 11:33:39 xen1 kernel: Intel(R) Gigabit Ethernet Network Driver - version 1.3.16-k2
Sep 10 11:33:39 xen1 kernel: Copyright (c) 2007-2009 Intel Corporation.
Sep 10 11:33:39 xen1 kernel: PCI: Enabling device 0000:0b:00.0 (0140 -> 0142)
Sep 10 11:33:39 xen1 kernel: ACPI: PCI Interrupt 0000:0b:00.0[A] -> GSI 19 (level, low) -> IRQ 106
Sep 10 11:33:39 xen1 kernel: igb 0000:0b:00.0: Disabling ASPM L0s upstream switch port 0000:0a:02.0
Sep 10 11:33:39 xen1 kernel: igb 0000:0b:00.0: Intel(R) Gigabit Ethernet Network Connection
Sep 10 11:33:39 xen1 kernel: igb 0000:0b:00.0: eth2: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:40
Sep 10 11:33:39 xen1 kernel: igb 0000:0b:00.0: eth2: PBA No: d96950-006
Sep 10 11:33:39 xen1 kernel: igb 0000:0b:00.0: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
Sep 10 11:33:39 xen1 kernel: PCI: Enabling device 0000:0b:00.1 (0140 -> 0142)
Sep 10 11:33:39 xen1 kernel: ACPI: PCI Interrupt 0000:0b:00.1[B] -> GSI 16 (level, low) -> IRQ 169
Sep 10 11:33:39 xen1 kernel: igb 0000:0b:00.1: Disabling ASPM L0s upstream switch port 0000:0a:02.0
Sep 10 11:33:39 xen1 kernel: igb 0000:0b:00.1: Intel(R) Gigabit Ethernet Network Connection
Sep 10 11:33:39 xen1 kernel: igb 0000:0b:00.1: eth3: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:41
Sep 10 11:33:39 xen1 kernel: igb 0000:0b:00.1: eth3: PBA No: d96950-006
Sep 10 11:33:39 xen1 kernel: igb 0000:0b:00.1: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
Sep 10 11:33:39 xen1 kernel: PCI: Enabling device 0000:0c:00.0 (0140 -> 0142)
Sep 10 11:33:39 xen1 kernel: ACPI: PCI Interrupt 0000:0c:00.0[A] -> GSI 17 (level, low) -> IRQ 210
Sep 10 11:33:39 xen1 kernel: igb 0000:0c:00.0: Disabling ASPM L0s upstream switch port 0000:0a:04.0
Sep 10 11:33:39 xen1 kernel: igb 0000:0c:00.0: Intel(R) Gigabit Ethernet Network Connection
Sep 10 11:33:39 xen1 kernel: igb 0000:0c:00.0: eth4: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:44
Sep 10 11:33:39 xen1 kernel: igb 0000:0c:00.0: eth4: PBA No: d96950-006
Sep 10 11:33:39 xen1 kernel: igb 0000:0c:00.0: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
Sep 10 11:33:39 xen1 kernel: PCI: Enabling device 0000:0c:00.1 (0140 -> 0142)
Sep 10 11:33:39 xen1 kernel: ACPI: PCI Interrupt 0000:0c:00.1[B] -> GSI 18 (level, low) -> IRQ 75
Sep 10 11:33:39 xen1 kernel: igb 0000:0c:00.1: Disabling ASPM L0s upstream switch port 0000:0a:04.0
Sep 10 11:33:39 xen1 kernel: igb 0000:0c:00.1: Intel(R) Gigabit Ethernet Network Connection
Sep 10 11:33:39 xen1 kernel: igb 0000:0c:00.1: eth5: (PCIe:2.5Gb/s:Width x4) 00:1b:21:3c:40:45
Sep 10 11:33:39 xen1 kernel: igb 0000:0c:00.1: eth5: PBA No: d96950-006
Sep 10 11:33:39 xen1 kernel: igb 0000:0c:00.1: Using MSI-X interrupts. 4 rx queue(s), 1 tx queue(s)
Sep 10 11:33:40 xen1 kernel: device eth3 entered promiscuous mode
Sep 10 11:33:41 xen1 kernel: igb: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Sep 10 11:33:41 xen1 kernel: igb: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Sep 10 11:33:41 xen1 kernel: xenbr100: port 1(eth3) entering learning state
Hmm, the kernel parameter pci=nomsi appears to work consistently for kernel-xen-164. The (new, 5.4) igb driver loads and functions in legacy interrupt mode. I coulda swore I'd tried that before. Maybe I only tried device_nomsi=10d68086, which does *not* work. Per a comment at http://downloadmirror.intel.com/13663/eng/README.txt, I also disabled irqbalance, but that (in itself) was not sufficient to resolve the issue. Does RHEL include backported patches to address this? MSI-X Issues with Kernels between 2.6.19 - 2.6.21 (inclusive) ------------------------------------------------------------- Kernel panics and instability may be observed on any MSI-X hardware if you use irqbalance with kernels between 2.6.19 and 2.6.21. If such problems are encountered, you may disable the irqbalance daemon or upgrade to a newer kernel. So my systems are "working," I cannot live-migrate Xen guests from RHEL 5.3 machines to RHEL 5.4 machines. Is this supposed to work? If not, that sucks, but it's unrelated to this bugzilla. So, why is MSI a problem, and only in xen mode? I'm not running esoteric hardware here. Why doesn't the more specific device_nomsi=10d68086 work? Same problem, same pci=nomsi workaround on a Dell R710 with 4 on-board bnx2 and an add-in 82575GB. Reminiscent of https://bugzilla.redhat.com/show_bug.cgi?id=506841 Event posted on 10-06-2009 05:40pm EDT by bbraswel Per my conversation with Chris and similarities between the two problems, I am attaching IT340924 to this BZ. The customers system sometimes hangs when booting the xen kernel after either initializing the network adapter or after starting xen. Using the option, “pci=nomsi” eliminates the problem and is currently being recommended to them as a workaround Internal Status set to 'Waiting on Engineering' This event sent from IssueTracker by bbraswel issue 340924 Can you please attach the following data for the broken system (aka xen1):
(1) native kernel
- linux kernel boot log (dmesg)
- content of /proc/interrupts
(2) kernel-xen with pci=nomsi:
- xen kernel boot log (xm dmesg)
- linux kernel boot log (dmesg)
- content of /proc/interrupts
(3) kernel-xen with msi enabled:
- xen kernel boot log (xm dmesg)
- linux kernel boot log (dmesg)
- content of /proc/interrupts
- output of 'lspci -v'
- output of 'lspci -t'
From the working system (aka xen0):
kernel-xen with msi enabled (which it probably runs anyway):
- content of /proc/interrupts
- output of 'lspci -v'
- output of 'lspci -t'
(all with rhel 5.4 kernels).
Has the broken system (xen1) VT-d support? If so: any change in behavior if you enable vtd (using iommu=1 on the xen kernel command line)? Might be you have to enable VT-d in the BIOS too.
Created attachment 365397 [details]
xm dmesg for 2.6.18-164.2.1.el5xen pci=nomsi
Created attachment 365398 [details]
xm dmesg for 2.6.18-164.2.1.el5xen pci=nomsi
Replacing xen4's xm dmesg with xen0, which will be easier to reboot into non-xen kernel later.
Created attachment 365399 [details]
dmesg for 2.6.18-164.2.1.el5xen pci=nomsi
Created attachment 365401 [details]
/proc/interrupts for 2.6.18-164.2.1.el5xen pci=nomsi
After a little more scientific troubleshooting, all of my systems xen0-xen4 behave the same: some faster than others, but all die within a few minutes of enabling igb devices with MSI-X enabled.
> Has the broken system (xen1) VT-d support?
xen4 (only) has Nehalem L5520, so could possibly have VT-d. The rest are 5060 or 5160. While this is an interesting engineering question, I'm going to answer no for now, because I need both classes of system to work, and we already have plenty of variables to look at.
Rebooting xen0 into bare kernel (non-xen) now....
Created attachment 365406 [details]
dmesg for NON-XEN 2.6.18-164.2.1.el5
Created attachment 365407 [details]
/proc/interrupts for NON-XEN 2.6.18-164.2.1.el5
Created attachment 365415 [details]
xm dmesg for 2.6.18-164.2.1.el5xen WITH MSI-X
Created attachment 365416 [details]
dmesg for 2.6.18-164.2.1.el5xen WITH MSI-X
Created attachment 365417 [details]
/proc/interrupts for 2.6.18-164.2.1.el5xen WITH MSI-X
dmesg may vary slightly in the xen+msi-x case because i disabled the following to avoid lockup:
ifcfg-eth{1-5}
iscsi (runs over eth2)
iscsid
xend (started manually by removing eth2's ip from /etc/xen/xend-config.sxp)
xendomains
And that's all.
Oh, I'll attach lspci next.
Created attachment 365418 [details]
lspci -t and lspci -v for the xen+msi-x case
37 msi / msi-x vectors are allocated on the system. 11 msi vectors for the pci expresss ports and pci bridges. 2 msi vectors for the broadcom nics. 4*6 msi-x vectors for the quad-igb. Xen maps msi/msi-x vectors to dom0 event channels, starting from 255, going downwards. So the pcie ports + bridges most likely got event channels 255 -> 245 (because they are initialized early). Next are the msi-x vectors for the quad-igb (244 -> 221). Finally the msi vectors for the broadcom cards (220+219). The broadcom cards have irq handlers registered and show up in /proc/interrupts: 219: [ cpu stats snipped ] Phys-irq peth1 220: [ cpu stats snipped ] Phys-irq peth0 So MSI is working fine for the broadcom cards. igb isn't there because the driver doesn't register irq handlers for inactive devices. hypercall_page+0x3aa/0x1000 (soft lookup in initial comment) is __HYPERVISOR_sched_op hypercall. Most likely SCHEDOP_block aka 'I'll wanna sleep now but please wake me up if there is an event for me'). Having *that* in a soft lookup stacktrace looks very much like irqs are not delivered. Can you get some more information with one of the igb ports active please? I'd like to have: (1) /proc/interrupts and 'lspci -vvv' *before* activating the igb port, (2) /proc/interrupts and 'lspci -vvv' *after* activating the igb port, then wait for the first softlookup coming in, then: (3) 3-4 /proc/interrupts snapshots, taken every few seconds. Oh, and 'lspci -vvv' on a native kernel (with igb active) would be useful too. Thanks. I have not been able to get (2) because the lockup is immediate and severe. Will try to get luckier at the physical console tomorrow. I have encountered the same issue and found this bug report. I then downloaded the latest Intel igb driver 2.0.6, compiled and installed on the 2.6.18-164.2.1.el5xen kernel. Now xend seems to work just fine! (In reply to comment #39) > I have encountered the same issue and found this bug report. I then downloaded > the latest Intel igb driver 2.0.6, compiled and installed on the > 2.6.18-164.2.1.el5xen kernel. Now xend seems to work just fine! I would guess the 2.0.6 igb driver from intel does not use msi-x. How many entries for your network device are listed in /proc/interrupts? The 2.0.6 driver does support and use MSI-X interrupts. Thy may not be in use depending on how the driver loaded but they are supported in that driver. (In reply to comment #41) > The 2.0.6 driver does support and use MSI-X interrupts. Thy may not be in use > depending on how the driver loaded but they are supported in that driver. I'm sure it supports MSI-X, but my guess is that support for MSI-X/multiqueue rx is not used when compiling and running on RHEL5. I was asking zwlu to find out of that was the case on his system as I thought it might provide another nice data point for those working on this issue. Here are the output from dmesg related to igb driver, it appears that the system is using MSI-X interrupts. Intel(R) Gigabit Ethernet Network Driver - version 2.0.6 Copyright (c) 2007-2009 Intel Corporation. ACPI: PCI Interrupt 0000:08:00.0[A] -> GSI 56 (level, low) -> IRQ 20 PCI: Setting latency timer of device 0000:08:00.0 to 64 igb: eth0: igb_probe: Intel(R) Gigabit Ethernet Network Connection igb: eth0: igb_probe: (PCIe:2.5Gb/s:Width x4) 00:30:48:c7:2d:c0 igb: eth0: igb_probe: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s) GSI 28 sharing vector 0x81 and IRQ 28 ACPI: PCI Interrupt 0000:08:00.1[B] -> GSI 70 (level, low) -> IRQ 28 PCI: Setting latency timer of device 0000:08:00.1 to 64 igb: eth1: igb_probe: Intel(R) Gigabit Ethernet Network Connection igb: eth1: igb_probe: (PCIe:2.5Gb/s:Width x4) 00:30:48:c7:2d:c1 igb: eth1: igb_probe: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s) igb: eth0: igb_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None igb: peth0: igb_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Does /proc/interrupts look the same as the info that was attached earlier to this BZ? I mean using the configuration as above in #43? I take it the driver works fine when not used in Xen? This could be something dealing with TX MQ not being supported in RHEL5. The backport to RHEL5 may have broken things since the 2.0.6 driver seems to be working. This is the result of /proc/interrupts for #43 247: 17738193 0 0 0 0 0 0 385 Phys-irq peth0-TxRx-0 248: 1 0 0 0 0 0 0 0 Phys-irq peth0 RHEL 5.3 Dom0 didn't use MSI, i.e. CONFIG_PCI_MSI was turned off (pci=nomsi) at compile time. Below is a snippet from RHEL-5.3's drivers/pci/Kconfig. Here it's seen that for XEN configs PCI_MSI was forced off.
config PCI_MSI
bool "Message Signaled Interrupts (MSI and MSI-X)"
depends on PCI
depends on (X86_LOCAL_APIC && X86_IO_APIC) || IA64 || PPC64
depends on !XEN
All the msi support was added as effort for 5.4. An entirely new source file drivers/pci/msi-xen.c was created and linked for xen config kernels since 5.3.
This means two things.
1) we should concentrate on understanding how the latest intel driver seems to fix things. It's possible that the problem is only in the driver, and it was uncovered when Dom0 starting using MSI.
2) this bug isn't a regression from 5.3, so I'm removing the regression keyword.
This bug needs some more testing with the old and new igb drivers to try and confirm if the issue is in the driver. I tried to reserve a machine that I found with this nic, but the system isn't currently letting me. Below is a link to kernel with several updates to the driver done for bug 513710. Can someone who is able to reproduce this problem please test with this kernel? http://people.redhat.com/drjones/virttest/x86_64/kernel-xen-2.6.18-175.el5.bug513710_01.x86_64.rpm Thanks, Andrew Andrew, I have installed this kernel today, it appears to work on my system so far. Side notes: I installed the 2.6.18-164.9.1.el5xen kernel today on this same system, and I used my old trick of compiling/installing the latest Intel drive igb-2.1.1, the resulting kernel worked for only a few minutes, in syslog, I got Dec 22 10:28:47 brahma kernel: printk: 40 messages suppressed. Dec 22 10:28:47 brahma kernel: xen_net: Memory squeeze in netback driver. Dec 22 10:28:52 brahma kernel: printk: 19 messages suppressed. Dec 22 10:28:52 brahma kernel: xen_net: Memory squeeze in netback driver. By the way, the old trick of compiling Intel kernel did work for 2.6.18-164.6.1.el5xen. (In reply to comment #48) > Andrew, > > I have installed this kernel today, it appears to work on my system so far. That is great news. We work extremely hard to use the latest, stable driver available when we ship a kernel, so we are glad to hear it is working well for users. > Side notes: I installed the 2.6.18-164.9.1.el5xen kernel today on this > same system, and I used my old trick of compiling/installing the latest > Intel drive igb-2.1.1, the resulting kernel worked for only a few > minutes, in syslog, I got > > Dec 22 10:28:47 brahma kernel: printk: 40 messages suppressed. > Dec 22 10:28:47 brahma kernel: xen_net: Memory squeeze in netback driver. > Dec 22 10:28:52 brahma kernel: printk: 19 messages suppressed. > Dec 22 10:28:52 brahma kernel: xen_net: Memory squeeze in netback driver. > > By the way, the old trick of compiling Intel kernel did work for > 2.6.18-164.6.1.el5xen. Because of the effort we put into our driver, it is personally disappointing to hear that users don't even use it and just use Intel's sourceforge driver instead. Because we have no control over any of Intel's sourceforge drivers, there isn't much we can do when a user encounters a problem with it that doesn't show up when using the RHEL driver. If you ever have problems with a driver included in the latest RHEL that causes you to use the sourceforge driver instead, please open a bug and let us know (you can even assign it to me or add me to the cc-list). Without reports that there are problems, there is no guarantee we will fix the problem with the next update. Thanks! (In reply to comment #49) > (In reply to comment #48) > > Andrew, > > > > I have installed this kernel today, it appears to work on my system so far. > > > Because of the effort we put into our driver, it is personally disappointing to > hear that users don't even use it and just use Intel's sourceforge driver > instead. > > Because we have no control over any of Intel's sourceforge drivers, there isn't > much we can do when a user encounters a problem with it that doesn't show up > when using the RHEL driver. The fact that I didn't use the RHEL driver is the driver locks up the system. > > If you ever have problems with a driver included in the latest RHEL that causes > you to use the sourceforge driver instead, please open a bug and let us know > (you can even assign it to me or add me to the cc-list). Without reports that > there are problems, there is no guarantee we will fix the problem with the next > update. > > Thanks! Since this bug was reported by a user and we do not have hardware that exhibited the problem, closing as duplicate. *** This bug has been marked as a duplicate of bug 513710 *** I'm not sure, because the change that fixed it is not related to xen in the end. I'm resetting the component. The main hurdle seems to be that it's not clear how the rebase fixed the problem. I think I meet with the similary problem. My NIC is 82576 on ATCA server. The redhat distribution is RHEL 5.5 x86_64 booted with xen kernel. when I unload the igb driver(modprobe -r igb) and then load it again(modprobe igb), the 82576 can not work unless i reboot my machine. I have a 82567LM-2 NIC at the same time, but it can work well under this kind of case. So I think it should be caused by the igb driver. But i am not very sure of that. In addition, if I boot with normal kerenl(no with xen), all the NICs can work well under this kind of case. Please open another bug or issue tracker. |