Bug 568040 - network does not work with rhel 5.5 snap1 x64 server, xen kernel, and r8169 driver
Summary: network does not work with rhel 5.5 snap1 x64 server, xen kernel, and r8169 d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-02-24 16:36 UTC by dale sykora
Modified: 2010-04-08 15:51 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-03-30 06:54:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The patch that fixes the issue (1.79 KB, patch)
2010-03-10 15:41 UTC, Ivan Vecera
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description dale sykora 2010-02-24 16:36:46 UTC
Description of problem:
network does not work with rhel 5.5 snap1 x64 server, xen kernel, and r8169 driver
works ok with non xen kernel

Version-Release number of selected component (if applicable):
rhel 5.5 snap1 x64

How reproducible:
every time

Steps to Reproduce:
1.install rhel 5.5 snap1 x64 with xen to machine with realtek nic
2.boot xen kernel
3.try to access network
  
Actual results:
network is not reachable
cannot obtain dhcp address, static ip can be set but cannot ping other hosts
Expected results:
network should be reachable
can obtain dhcp address or set static ip and ping other hosts

Additional info:
systems with nvidia or intel cards work as expected with xen kernel
systems with realtek cards work with non xen kernel

Comment 1 dale sykora 2010-02-26 16:50:28 UTC
this also fails with snapshot 2
will move to issue-tracker once I have a valid login

Comment 2 Gary Case 2010-03-08 17:14:55 UTC
I'm seeing this problem as well. Realtek NIC with r8169 driver no longer works in 5.5 with Xen kernel. It worked fine in RHEL5.4.

Comment 6 Andrew Jones 2010-03-08 17:46:00 UTC
These are the changes since 5.4

b2ce4b2 r8169: add missing hunk from frame length filtering fix
cf315a0 r8169: improved frame length filtering
4c7ce5e [net] r8169: update to latest upstream for rhel5.5
e2410cf [net] resolve issues with vlan creation and filtering
9083e21 [net] r8169: avoid losing MSI interrupts
9b097f2 [net] r8169: balance pci_map/unmap pair, use hw padding

The "update to latest upstream" is nearly a re-write. It looks like this didn't
get tested on a xen kernel before it was posted? cc'ing Ivan.

Comment 11 Ivan Vecera 2010-03-09 13:12:14 UTC
This is strange, I cannot reproduce this issue on my workstation with the Realtek based NIC.

Info:
- eth1 is Realtek NIC with RTL8168b/8111b chipset
- it is configured by DHCP (without any problem)
- I tried transfer several megabytes IN/OUT without any problem

[root@cera ~]# uname -r
2.6.18-191.el5xen

[root@cera ~]# ethtool -i eth1
driver: r8169
version: 2.3LK-NAPI
firmware-version: 
bus-info: 0000:0c:02.0

[root@cera ~]# ethtool eth1
Settings for eth1:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbg
	Wake-on: g
	Current message level: 0x00000033 (51)
	Link detected: yes

[root@cera ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:E0:4C:69:21:B8  
          inet addr:10.34.33.109  Bcast:10.34.35.255  Mask:255.255.252.0
          inet6 addr: fe80::2e0:4cff:fe69:21b8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:41553 errors:0 dropped:0 overruns:0 frame:0
          TX packets:32557 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:42367187 (40.4 MiB)  TX bytes:41780224 (39.8 MiB)
          Interrupt:19 Base address:0x8f00

Comment 12 Andrew Jones 2010-03-09 14:27:56 UTC
Ivan, is this the only nic in your system? Gary says in comment 9 that having another active NIC in the system makes it unreproducible.

Comment 13 Gary Case 2010-03-09 15:56:50 UTC
That's right, if I put another card in this system, the realtek card appears to work, but it doesn't really. It's very strange. I'm going to try reinstalling this box with 5.4, then updating to the test kernel. I want to see if 5.4's xen kernel works with serial port.

Comment 16 dale sykora 2010-03-09 16:55:29 UTC
(In reply to comment #11)
> This is strange, I cannot reproduce this issue on my workstation with the
> Realtek based NIC.
> 
> Info:
> - eth1 is Realtek NIC with RTL8168b/8111b chipset
> - it is configured by DHCP (without any problem)
> - I tried transfer several megabytes IN/OUT without any problem
> 
> [root@cera ~]# uname -r
> 2.6.18-191.el5xen
> 
> [root@cera ~]# ethtool -i eth1
> driver: r8169
> version: 2.3LK-NAPI
> firmware-version: 
> bus-info: 0000:0c:02.0
> 
> [root@cera ~]# ethtool eth1
> Settings for eth1:
>  Supported ports: [ TP MII ]
>  Supported link modes:   10baseT/Half 10baseT/Full 
>                          100baseT/Half 100baseT/Full 
>                          1000baseT/Half 1000baseT/Full 
>  Supports auto-negotiation: Yes
>  Advertised link modes:  10baseT/Half 10baseT/Full 
>                          100baseT/Half 100baseT/Full 
>                          1000baseT/Half 1000baseT/Full 
>  Advertised auto-negotiation: Yes
>  Speed: 1000Mb/s
>  Duplex: Full
>  Port: MII
>  PHYAD: 0
>  Transceiver: internal
>  Auto-negotiation: on
>  Supports Wake-on: pumbg
>  Wake-on: g
>  Current message level: 0x00000033 (51)
>  Link detected: yes
> 
> [root@cera ~]# ifconfig eth1
> eth1      Link encap:Ethernet  HWaddr 00:E0:4C:69:21:B8  
>           inet addr:10.34.33.109  Bcast:10.34.35.255  Mask:255.255.252.0
>           inet6 addr: fe80::2e0:4cff:fe69:21b8/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:41553 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:32557 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000 
>           RX bytes:42367187 (40.4 MiB)  TX bytes:41780224 (39.8 MiB)
>           Interrupt:19 Base address:0x8f00    

Below is what I am seeing on xen/nonxen kernel

#eth0 info with xen kernel

[root@localhost ~]# uname -r
2.6.18-190.el5xen
[root@localhost ~]# ethtool eth0
Settings for eth0:
        Link detected: yes
[root@localhost ~]# ethtool -i eth0
Cannot get driver information: Operation not supported
[root@localhost ~]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr E0:CB:4E:7D:88:BD  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:48 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:360 (360.0 b)  TX bytes:10847 (10.5 KiB)

[root@localhost ~]# 


#eth0 info with NON xen kernel

[root@localhost ~]# uname -r
2.6.18-190.el5
[root@localhost ~]# ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Half 1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Half 1000baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbg
        Wake-on: d
        Current message level: 0x00000033 (51)
        Link detected: yes
[root@localhost ~]# ethtool -i eth0
driver: r8169
version: 2.3LK-NAPI
firmware-version: 
bus-info: 0000:02:00.0
[root@localhost ~]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr E0:CB:4E:7D:88:BD  
          inet addr:192.168.9.21  Bcast:192.168.9.255  Mask:255.255.255.0
          inet6 addr: fe80::e2cb:4eff:fe7d:88bd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:146 errors:0 dropped:0 overruns:0 frame:0
          TX packets:150 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:16312 (15.9 KiB)  TX bytes:15876 (15.5 KiB)
          Interrupt:58 Base address:0x2000 

[root@localhost ~]#

Comment 19 Andrew Jones 2010-03-09 18:29:57 UTC
Hi Gary,

Sorry for my slow update to this BZ, I buried myself in it and forgot to dig myself out to update you. I'm on a machine that Ivan found (amd-ma78gm-02.lab.bos.redhat.com) right now. It has 2 nics, but he blacklisted the other one. We've started some bisecting and here's our current results

-191 Xen HV & -180 dom0:
the nic works with xend starting on boot, creating the bridging, or in other words, it completely works

-191 Xen HV & -181 dom:
the nic works without xend starting on boot, but then after manually starting xend (and it running the bridging script), we lose eth0 and can't ifup it

The rebase (commit 4c7ce5e) was done between -180 and -181, so it has to be that patch. However, since it was a big change we need to gather some more clues on how the bridging that xend does is involved.

Another note of interest is that when booting the working 191xen/180kern hybrid or a lower rev working hybrid (e.g. 191/164), I see this unusual console message after xend runs the bridging script

peth0: Promiscuous mode enabled.

So possibly things are prefect for the lower revs either?

Comment 20 Gary Case 2010-03-09 18:44:22 UTC
Heh! I'm on the other box I found, trying to reproduce this myself. It only has the one NIC, so it might be valuable for you if things don't behave on the system with the blacklisted second NIC. I'll log myself off of that one in case you need to reserve it.

Comment 21 Ivan Vecera 2010-03-10 15:37:02 UTC
(In reply to comment #19)
> 
> peth0: Promiscuous mode enabled.
> 
This message was is caused by module parameter 'debug' that was set on the testing machine.

Comment 22 Ivan Vecera 2010-03-10 15:39:56 UTC
Finally, I found the problem :-). This strange was caused by some missing assignments from net_device_ops structure. It is interesting that this issue arose only in the Xen environment.

Comment 23 Ivan Vecera 2010-03-10 15:41:33 UTC
Created attachment 399115 [details]
The patch that fixes the issue

Here is the patch that fixes this issue.

Comment 31 dale sykora 2010-03-16 15:45:18 UTC
Note: This still fails with rhel 5.5 x64 snapshot 4.  Will the fix be in the GA?

Comment 32 dale sykora 2010-03-16 17:34:42 UTC
Note : This was also reported on Issue Tracker # 576063

Comment 33 Gary Case 2010-03-16 18:57:24 UTC
Dale, 

We're working to get a fix into RHEL5.5, but it's not in a released build just yet. 

-Gary

Comment 34 Jarod Wilson 2010-03-17 15:53:21 UTC
in kernel-2.6.18-194.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 36 Gary Case 2010-03-17 17:29:22 UTC
Dale,

Kernel -194 works for me on my Intel DG31PR motherboard. The network comes up as expected under the Xen kernel. Can you verify that this new test kernel also works for you?

-Gary

Comment 37 dale sykora 2010-03-17 18:08:18 UTC
Gary,
  I installed the 194 xen kernel and the nic works now.  I will check all other systems and report back if it doesn't work on those systems.  Note, I do not see the ethtool output Ivan saw in Comment#11 but otherwise it works.  I checked ethtool on a system with Nvidia nic and 191 xen kernel and it is similar to the 194.

Thanks,

Dale Sykora

[root@localhost ~]# uname -r
2.6.18-194.el5xen
[root@localhost ~]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:1B:78:B2:F9:30  
          inet addr:192.168.9.15  Bcast:192.168.9.255  Mask:255.255.255.0
          inet6 addr: fe80::21b:78ff:feb2:f930/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:133 errors:0 dropped:0 overruns:0 frame:0
          TX packets:97 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:22000 (21.4 KiB)  TX bytes:11550 (11.2 KiB)

[root@localhost ~]# ethtool eth0
Settings for eth0:
	Link detected: yes
[root@localhost ~]# ethtool -i eth0
Cannot get driver information: Operation not supported
[root@localhost ~]# lspci | grep Ethernet
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 02)
[root@localhost ~]#

Comment 39 dale sykora 2010-03-25 17:29:52 UTC
Tested on RC2, nic works.  Thanks for the fix.

Comment 40 errata-xmlrpc 2010-03-30 06:54:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.