Description of problem: network does not work with rhel 5.5 snap1 x64 server, xen kernel, and r8169 driver works ok with non xen kernel Version-Release number of selected component (if applicable): rhel 5.5 snap1 x64 How reproducible: every time Steps to Reproduce: 1.install rhel 5.5 snap1 x64 with xen to machine with realtek nic 2.boot xen kernel 3.try to access network Actual results: network is not reachable cannot obtain dhcp address, static ip can be set but cannot ping other hosts Expected results: network should be reachable can obtain dhcp address or set static ip and ping other hosts Additional info: systems with nvidia or intel cards work as expected with xen kernel systems with realtek cards work with non xen kernel
this also fails with snapshot 2 will move to issue-tracker once I have a valid login
I'm seeing this problem as well. Realtek NIC with r8169 driver no longer works in 5.5 with Xen kernel. It worked fine in RHEL5.4.
These are the changes since 5.4 b2ce4b2 r8169: add missing hunk from frame length filtering fix cf315a0 r8169: improved frame length filtering 4c7ce5e [net] r8169: update to latest upstream for rhel5.5 e2410cf [net] resolve issues with vlan creation and filtering 9083e21 [net] r8169: avoid losing MSI interrupts 9b097f2 [net] r8169: balance pci_map/unmap pair, use hw padding The "update to latest upstream" is nearly a re-write. It looks like this didn't get tested on a xen kernel before it was posted? cc'ing Ivan.
This is strange, I cannot reproduce this issue on my workstation with the Realtek based NIC. Info: - eth1 is Realtek NIC with RTL8168b/8111b chipset - it is configured by DHCP (without any problem) - I tried transfer several megabytes IN/OUT without any problem [root@cera ~]# uname -r 2.6.18-191.el5xen [root@cera ~]# ethtool -i eth1 driver: r8169 version: 2.3LK-NAPI firmware-version: bus-info: 0000:0c:02.0 [root@cera ~]# ethtool eth1 Settings for eth1: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000033 (51) Link detected: yes [root@cera ~]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:E0:4C:69:21:B8 inet addr:10.34.33.109 Bcast:10.34.35.255 Mask:255.255.252.0 inet6 addr: fe80::2e0:4cff:fe69:21b8/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:41553 errors:0 dropped:0 overruns:0 frame:0 TX packets:32557 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:42367187 (40.4 MiB) TX bytes:41780224 (39.8 MiB) Interrupt:19 Base address:0x8f00
Ivan, is this the only nic in your system? Gary says in comment 9 that having another active NIC in the system makes it unreproducible.
That's right, if I put another card in this system, the realtek card appears to work, but it doesn't really. It's very strange. I'm going to try reinstalling this box with 5.4, then updating to the test kernel. I want to see if 5.4's xen kernel works with serial port.
(In reply to comment #11) > This is strange, I cannot reproduce this issue on my workstation with the > Realtek based NIC. > > Info: > - eth1 is Realtek NIC with RTL8168b/8111b chipset > - it is configured by DHCP (without any problem) > - I tried transfer several megabytes IN/OUT without any problem > > [root@cera ~]# uname -r > 2.6.18-191.el5xen > > [root@cera ~]# ethtool -i eth1 > driver: r8169 > version: 2.3LK-NAPI > firmware-version: > bus-info: 0000:0c:02.0 > > [root@cera ~]# ethtool eth1 > Settings for eth1: > Supported ports: [ TP MII ] > Supported link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Half 1000baseT/Full > Supports auto-negotiation: Yes > Advertised link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Half 1000baseT/Full > Advertised auto-negotiation: Yes > Speed: 1000Mb/s > Duplex: Full > Port: MII > PHYAD: 0 > Transceiver: internal > Auto-negotiation: on > Supports Wake-on: pumbg > Wake-on: g > Current message level: 0x00000033 (51) > Link detected: yes > > [root@cera ~]# ifconfig eth1 > eth1 Link encap:Ethernet HWaddr 00:E0:4C:69:21:B8 > inet addr:10.34.33.109 Bcast:10.34.35.255 Mask:255.255.252.0 > inet6 addr: fe80::2e0:4cff:fe69:21b8/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:41553 errors:0 dropped:0 overruns:0 frame:0 > TX packets:32557 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:42367187 (40.4 MiB) TX bytes:41780224 (39.8 MiB) > Interrupt:19 Base address:0x8f00 Below is what I am seeing on xen/nonxen kernel #eth0 info with xen kernel [root@localhost ~]# uname -r 2.6.18-190.el5xen [root@localhost ~]# ethtool eth0 Settings for eth0: Link detected: yes [root@localhost ~]# ethtool -i eth0 Cannot get driver information: Operation not supported [root@localhost ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr E0:CB:4E:7D:88:BD BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:48 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:360 (360.0 b) TX bytes:10847 (10.5 KiB) [root@localhost ~]# #eth0 info with NON xen kernel [root@localhost ~]# uname -r 2.6.18-190.el5 [root@localhost ~]# ethtool eth0 Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: d Current message level: 0x00000033 (51) Link detected: yes [root@localhost ~]# ethtool -i eth0 driver: r8169 version: 2.3LK-NAPI firmware-version: bus-info: 0000:02:00.0 [root@localhost ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr E0:CB:4E:7D:88:BD inet addr:192.168.9.21 Bcast:192.168.9.255 Mask:255.255.255.0 inet6 addr: fe80::e2cb:4eff:fe7d:88bd/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:146 errors:0 dropped:0 overruns:0 frame:0 TX packets:150 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:16312 (15.9 KiB) TX bytes:15876 (15.5 KiB) Interrupt:58 Base address:0x2000 [root@localhost ~]#
Hi Gary, Sorry for my slow update to this BZ, I buried myself in it and forgot to dig myself out to update you. I'm on a machine that Ivan found (amd-ma78gm-02.lab.bos.redhat.com) right now. It has 2 nics, but he blacklisted the other one. We've started some bisecting and here's our current results -191 Xen HV & -180 dom0: the nic works with xend starting on boot, creating the bridging, or in other words, it completely works -191 Xen HV & -181 dom: the nic works without xend starting on boot, but then after manually starting xend (and it running the bridging script), we lose eth0 and can't ifup it The rebase (commit 4c7ce5e) was done between -180 and -181, so it has to be that patch. However, since it was a big change we need to gather some more clues on how the bridging that xend does is involved. Another note of interest is that when booting the working 191xen/180kern hybrid or a lower rev working hybrid (e.g. 191/164), I see this unusual console message after xend runs the bridging script peth0: Promiscuous mode enabled. So possibly things are prefect for the lower revs either?
Heh! I'm on the other box I found, trying to reproduce this myself. It only has the one NIC, so it might be valuable for you if things don't behave on the system with the blacklisted second NIC. I'll log myself off of that one in case you need to reserve it.
(In reply to comment #19) > > peth0: Promiscuous mode enabled. > This message was is caused by module parameter 'debug' that was set on the testing machine.
Finally, I found the problem :-). This strange was caused by some missing assignments from net_device_ops structure. It is interesting that this issue arose only in the Xen environment.
Created attachment 399115 [details] The patch that fixes the issue Here is the patch that fixes this issue.
Note: This still fails with rhel 5.5 x64 snapshot 4. Will the fix be in the GA?
Note : This was also reported on Issue Tracker # 576063
Dale, We're working to get a fix into RHEL5.5, but it's not in a released build just yet. -Gary
in kernel-2.6.18-194.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details.
Dale, Kernel -194 works for me on my Intel DG31PR motherboard. The network comes up as expected under the Xen kernel. Can you verify that this new test kernel also works for you? -Gary
Gary, I installed the 194 xen kernel and the nic works now. I will check all other systems and report back if it doesn't work on those systems. Note, I do not see the ethtool output Ivan saw in Comment#11 but otherwise it works. I checked ethtool on a system with Nvidia nic and 191 xen kernel and it is similar to the 194. Thanks, Dale Sykora [root@localhost ~]# uname -r 2.6.18-194.el5xen [root@localhost ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:1B:78:B2:F9:30 inet addr:192.168.9.15 Bcast:192.168.9.255 Mask:255.255.255.0 inet6 addr: fe80::21b:78ff:feb2:f930/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:133 errors:0 dropped:0 overruns:0 frame:0 TX packets:97 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:22000 (21.4 KiB) TX bytes:11550 (11.2 KiB) [root@localhost ~]# ethtool eth0 Settings for eth0: Link detected: yes [root@localhost ~]# ethtool -i eth0 Cannot get driver information: Operation not supported [root@localhost ~]# lspci | grep Ethernet 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 02) [root@localhost ~]#
Tested on RC2, nic works. Thanks for the fix.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html