Bug 637440 - RTL8169sc/8110sc network devices mac address changes after soft reboot.
Summary: RTL8169sc/8110sc network devices mac address changes after soft reboot.
Keywords:
Status: CLOSED DUPLICATE of bug 581654
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: i686
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Ivan Vecera
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-09-25 20:48 UTC by Tru Huynh
Modified: 2010-09-29 07:16 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 573201
Environment:
Last Closed: 2010-09-29 07:16:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
CentOS 4317 0 None None None Never

Description Tru Huynh 2010-09-25 20:48:31 UTC
+++ This bug was initially created as a clone of Bug #573201 +++

Description of problem:
When I soft-reboot my fedora host the mac addresses change. This machine is:
Jetway J7F2 Fanless 1.2GHz Eden C7 Mini-ITX Motherboard &
Jetway 3xGigaLAN Daughterboard Module

When cold booted, host mac addresses:
eth0 00:30:18:AC:C8:D5
eth1 00:30:18:AC:C8:D6
eth2 00:30:18:AC:C8:D7
eth3 00:30:18:A6:11:E0
(the labels are as discovered on first install)

When soft-rebooted, mac addresses are:
00:00:00:00:C8:D5
00:00:00:00:C8:D6
00:00:00:00:C8:D7
00:30:18:A6:11:E0

OS history - started life as an FC10 last april, has since been upgraded to FC11, then FC12.
First noticed with kernel-2.6.32.9-70 (2010-03-10), but also exhibited it with kernel-2.6.32.9-67 once noticed.

I first thought it was bug 557771, but I now think the NICs are not resetting/initialised/queried properly following a soft reboot.

udev (145-15) went a bit mad and adds pertitent rules to 70-persistent-net.rules which match the mac addresses seen, but using these the driver seems ineffective, it can't see link state changes and tcpdump can't see a thing.

lspci, dmesg, udevinfo attached (cold and soft booted)

Version-Release number of selected component (if applicable):
Pertinent package history:
Jan 15 19:45:53 Updated: kernel-firmware-2.6.27.41-170.2.117.fc10.noarch
Jan 15 19:47:48 Installed: kernel-2.6.27.41-170.2.117.fc10.i686
Jan 15 19:47:57 Installed: kernel-2.6.27.41-170.2.117.fc10.i686
Feb 26 22:04:41 Erased: libudev0
Feb 26 22:48:05 Updated: kernel-firmware-2.6.30.10-105.2.23.fc11.noarch
Feb 26 22:50:10 Installed: libudev0-141-8.fc11.i586
Feb 26 22:54:58 Installed: udev-extras-20090226-0.5.20090302git.fc11.i586
Feb 26 22:55:37 Updated: udev-141-8.fc11.i586
Feb 26 23:00:05 Installed: kernel-2.6.30.10-105.2.23.fc11.i586
Feb 26 23:13:08 Installed: kernel-2.6.30.10-105.2.23.fc11.i586
Mar 03 20:24:47 Updated: kernel-firmware-2.6.31.12-174.2.22.fc12.noarch
Mar 03 20:26:10 Installed: libudev-145-15.fc12.i686
Mar 03 20:30:02 Installed: libgudev1-145-15.fc12.i686
Mar 03 20:33:39 Installed: udev-145-15.fc12.i686
Mar 03 20:46:43 Erased: libudev0
Mar 03 20:48:01 Erased: udev-extras
Mar 03 20:59:36 Installed: kernel-2.6.31.12-174.2.22.fc12.i686
Mar 06 09:08:15 Updated: kernel-firmware-2.6.32.9-67.fc12.noarch
Mar 06 09:12:28 Installed: kernel-2.6.32.9-67.fc12.i686
Mar 10 19:37:10 Updated: kernel-firmware-2.6.32.9-70.fc12.noarch
Mar 10 19:39:36 Installed: kernel-2.6.32.9-70.fc12.i686

How reproducible:
On demand.

Steps to Reproduce:
1.Reboot host - fails to recognise or initialise network devices.
2.
3.
  
Actual results:
NICs are discovered, but renumbered and not working.

Expected results:
No change.

Additional info:

--- Additional comment from gavdav.co.uk on 2010-03-13 06:56:32 EST ---

Created attachment 399829 [details]
dmesg of machine in fault condition

--- Additional comment from gavdav.co.uk on 2010-03-13 06:57:02 EST ---

Created attachment 399830 [details]
lspci -vvnn of machine in fault condition

--- Additional comment from gavdav.co.uk on 2010-03-13 06:57:34 EST ---

Created attachment 399832 [details]
eth3 udevinfo in fault condition

--- Additional comment from gavdav.co.uk on 2010-03-13 06:57:50 EST ---

Created attachment 399833 [details]
eth4 udevinfo in fault condition

--- Additional comment from gavdav.co.uk on 2010-03-13 06:58:06 EST ---

Created attachment 399834 [details]
eth5 udevinfo in fault condition

--- Additional comment from gavdav.co.uk on 2010-03-13 06:58:23 EST ---

Created attachment 399835 [details]
eth6 udevinfo in fault condition

--- Additional comment from gavdav.co.uk on 2010-03-13 06:58:54 EST ---

Created attachment 399836 [details]
dmesg of machine after cold boot

--- Additional comment from gavdav.co.uk on 2010-03-13 06:59:14 EST ---

Created attachment 399837 [details]
lspci -vvnn of machine after cold boot

--- Additional comment from gavdav.co.uk on 2010-03-13 06:59:39 EST ---

Created attachment 399838 [details]
eth0 udevinfo of machine after cold boot

--- Additional comment from gavdav.co.uk on 2010-03-13 07:00:04 EST ---

Created attachment 399839 [details]
eth1 udevinfo of machine after cold boot

--- Additional comment from gavdav.co.uk on 2010-03-13 07:00:26 EST ---

Created attachment 399840 [details]
eth2 udevinfo of machine after cold boot

--- Additional comment from gavdav.co.uk on 2010-03-13 07:00:44 EST ---

Created attachment 399841 [details]
eth3 udevinfo of machine after cold boot

--- Additional comment from mschmidt on 2010-03-15 18:16:11 EDT ---

So the low 4 bytes of the MAC address get zero'd on warm reboot for some
reason.
2.6.32 writes the MAC address to the card on shutdown. It calls rtl_rar_set()
which does:
...
        RTL_W32(MAC0, low);
        RTL_W32(MAC4, high);
...

And the card loses the 'low' part. Wild guess - perhaps the card wants to have
the high half written first.

Anyway, the writing of the MAC address on shutdown was introduced by the
commit:

commit cc098dc705895f6b0109b7e8e026ac2b8ae1c0a1
Author: Ivan Vecera
Date:   Sun Nov 29 23:12:52 2009 -0800

    r8169: restore mac addr in rtl8169_remove_one and rtl_shutdown

Gavin, could you test if reverting the commit would fix your problem?

Putting Ivan to CC... Ivan, any ideas?

--- Additional comment from gavdav.co.uk on 2010-03-15 18:36:59 EDT ---

Sure, which kernel do you need me to test ?

--- Additional comment from mschmidt on 2010-03-16 05:23:22 EDT ---

I suggest you download the latest stable vanilla kernel, build it from source, and see if the bug is still reproducible with it. Then we can add some test patches to it.

But before you do that, let's try a couple of easy experiments first with your current Fedora kernel:
 1. What happens when you remove the module (rmmod r8169) and then load it back (modprobe r8169)? Does the MAC address get corrupted in this case also?
 2. What if you rmmod the module and then reboot? Is the MAC address read correctly then?

And just in case - you are not using kexec to reboot quickly, are you?

--- Additional comment from gavdav.co.uk on 2010-03-21 10:21:09 EDT ---

1. If I unload the driver and reload it, it comes back with the invalid mac address (either after a bring in a working state, or soft booted with the invalid MACs prior to unloading the driver)

2. If I unload the driver and reboot, it comes back with the invalid mac addresses.

Going to try and compile and build this: http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.34-rc2.tar.bz2

--- Additional comment from mschmidt on 2010-03-22 12:21:40 EDT ---

Assuming you will manage to reproduce the problem with 2.6.34-rc2 ...

I'm going to attach two short patches for r8169 against 2.6.34-rc2 and I would like to know if any one of them helps. Do not try to apply both of them at the same time, they are mutually exclusive.

--- Additional comment from mschmidt on 2010-03-22 12:26:36 EDT ---

Created attachment 401815 [details]
write the high half of the MAC address first

With this patch I expect one of the following results after a cold start+rmmod+modprobe cycle:
1. The MAC address will appear correct.
2. The MAC address will be invalid, but in a different way than before. Perhaps the first 4 bytes will be good and the other 2 bytes will be zeros.
3. The MAC address will be invalid in the same way as before.

--- Additional comment from mschmidt on 2010-03-22 12:29:22 EDT ---

Created attachment 401816 [details]
flush after each write

This is the other patch that might help.

--- Additional comment from gavdav.co.uk on 2010-03-23 15:45:25 EDT ---

Urgh - I'm rusty. Can't get what I've built to boot yet and I'm not getting much time to look at it. Will persevere (the reason I use FC12 is so I don't need to build kernels by hand any more)

--- Additional comment from ivecera on 2010-03-29 06:10:03 EDT ---

Recent update in Dave's tree should fix this issue.

http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=78f1cd02457252e1ffbc6caa44a17424a45286b8

--- Additional comment from mschmidt on 2010-03-29 06:17:43 EDT ---

Great, so it seems my original "wild guess" was correct.
Ivan, do you know if the patch is considered for -stable ?

--- Additional comment from ivecera on 2010-03-29 06:36:16 EDT ---

AFAIK not yet but IMHO it should be

--- Additional comment from christian.robert07 on 2010-04-09 19:47:31 EDT ---

(In reply to comment #23)
> AFAIK not yet but IMHO it should be    

I have the exact same bug with RedHat 5.5 (Tikanga)
and the latest kernel, eg:  2.6.18-194.el5

Xtian.

--- Additional comment from ivecera on 2010-04-12 05:39:33 EDT ---

For RHEL the new bugzilla report should be opened. Christian, could you do it?

--- Additional comment from jflorian on 2010-05-02 13:41:17 EDT ---

I also have the exact same problem using a RTL8169sc/8110sc on an Asus A7V-333X mainboard and first noticed this with 2.6.32.11-99.fc12.i686.PAE.  I tried updating with updates-testing enabled which resulted in 2.6.32.12-114.fc12.i686.PAE being installed, but I see no improvement.  I haven't reviewed the changelog to see if that should be expected or not.

--- Additional comment from gavdav.co.uk on 2010-05-04 16:09:13 EDT ---

I finally read the instructions :)

Have built and have booting a 2.6.34-rc6 kernel which is still exhibiting the problem.

The line numbers have changed - trying to apply the patches above are failing:

# cat patch.401815 | patch -p1
patching file drivers/net/r8169.c
Hunk #1 FAILED at 2821.
1 out of 1 hunk FAILED -- saving rejects to file drivers/net/r8169.c.rej

Can you re-issue the same patches (and how I should apply them properly) for 2.6.34-rc6 ?

Dmesg information for the cards with this kernel soft booted:
kernel: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
kernel: r8169 0000:00:09.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
kernel: r8169 0000:00:09.0: (unregistered net_device): no PCI Express capability
kernel: r8169 0000:00:09.0: eth0: RTL8169sc/8110sc at 0xf805e000, 00:00:00:00:c8:d5, XID 18000000 IRQ 18
kernel: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
kernel: r8169 0000:00:0b.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
kernel: r8169 0000:00:0b.0: (unregistered net_device): no PCI Express capability
kernel: r8169 0000:00:0b.0: eth1: RTL8169sc/8110sc at 0xf80ac000, 00:00:00:00:c8:d6, XID 18000000 IRQ 19
kernel: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
kernel: r8169 0000:00:0c.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
kernel: r8169 0000:00:0c.0: (unregistered net_device): no PCI Express capability
kernel: r8169 0000:00:0c.0: eth2: RTL8169sc/8110sc at 0xf8110000, 00:00:00:00:c8:d7, XID 18000000 IRQ 16
kernel: via-rhine.c:v1.10-LK1.4.3 2007-03-06 Written by Donald Becker
kernel: via-rhine: Broken BIOS detected, avoid_D3 enabled.
kernel: via-rhine 0000:00:12.0: PCI INT A -> Link[ALKD] -> GSI 23 (level, low) -> IRQ 23
kernel: eth3: VIA Rhine II at 0xfdffa000, 00:30:18:a6:11:e0, IRQ 23.
kernel: eth3: MII PHY found at address 1, status 0x7849 advertising 05e1 Link 0000.
kernel: udev: renamed network interface eth0 to eth6
kernel: udev: renamed network interface eth1 to eth5
kernel: udev: renamed network interface eth2 to eth4

--- Additional comment from gavdav.co.uk on 2010-05-04 17:43:21 EDT ---

ok, so I made the same edits as per http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=78f1cd02457252e1ffbc6caa44a17424a45286b8

(Because I couldn't work out how to generate a patch myself)

This has fixed the problem.

-Cold boot into 2.6.34-rc6. Devices report predicted mac addresses (and work)
-Soft reboot into 2.6.34-rc6. Devices report predicted mac addresses (and work)

dmesg of discovery:
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:09.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
r8169 0000:00:09.0: (unregistered net_device): no PCI Express capability
r8169 0000:00:09.0: eth1: RTL8169sc/8110sc at 0xf810e000, 00:30:18:ac:c8:d5, XID 18000000 IRQ 18
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:0b.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
r8169 0000:00:0b.0: (unregistered net_device): no PCI Express capability
r8169 0000:00:0b.0: eth2: RTL8169sc/8110sc at 0xf8112000, 00:30:18:ac:c8:d6, XID 18000000 IRQ 19
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:0c.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
r8169 0000:00:0c.0: (unregistered net_device): no PCI Express capability
r8169 0000:00:0c.0: eth3: RTL8169sc/8110sc at 0xf8116000, 00:30:18:ac:c8:d7, XID 18000000 IRQ 16
r8169 0000:00:09.0: eth0: link up
r8169 0000:00:09.0: eth0: link up
r8169 0000:00:0b.0: eth1: link up
r8169 0000:00:0b.0: eth1: link up
r8169 0000:00:0c.0: eth2: link up

This appears to have fixed it - what other tests should I do ?

--- Additional comment from mschmidt on 2010-05-13 02:41:00 EDT ---

The patch "r8169: fix broken register writes" and one more related patch ("r8169: more broken register writes workaround") are now included in the -stable release 2.6.32.13.

You can download a RPM for F-12 from Koji:
http://koji.fedoraproject.org/koji/buildinfo?buildID=173221

The build has not yet been submitted as a Fedora update, but eventually it (or a later one) will be.

--- Additional comment from ipilcher on 2010-05-19 10:05:47 EDT ---

I just ran into this on my Asterisk box.  (The wife loves it when the
phones stop working!)  Seems to be fixed by 2.6.32.13-118.fc12.i686.PAE.

More info available on request.

--- Additional comment from yves on 2010-08-01 02:46:11 EDT ---

*** Bug 583223 has been marked as a duplicate of this bug. ***

--- Additional comment from yves on 2010-08-01 02:55:26 EDT ---

I too am affected by this bug, but on Fedora 13, with kernel:
2.6.33.6-147.fc13.i686.PAE

Is there a way to flag this bug for Fedora 13 as well as 12?

--- Additional comment from cebbert on 2010-08-02 11:46:18 EDT ---

(In reply to comment #32)
> I too am affected by this bug, but on Fedora 13, with kernel:
> 2.6.33.6-147.fc13.i686.PAE
> 

All of the r8169 patches that went into 2.6.32.13 also went into 2.6.33.4 at the same time. So why they would fix one kernel and not the other is a mystery.

> Is there a way to flag this bug for Fedora 13 as well as 12?    

The only way to do that is to report the bug separately against each release, but you just closed the Fedora 13 bug as a duplicate of this one.

--- Additional comment from yves on 2010-08-02 12:01:04 EDT ---

>> Is there a way to flag this bug for Fedora 13 as well as 12?    

>> The only way to do that is to report the bug separately against each release,
but you just closed the Fedora 13 bug as a duplicate of this one.    

Should I re-open it (or re-create another one)?

> All of the r8169 patches that went into 2.6.32.13 also went into 2.6.33.4 at
the same time. So why they would fix one kernel and not the other is a mystery.

What can we do to make sure that the patches are applied to 2.6.33.*, as well as 2.6.34 and 2.6.35 (and the future new ones)?

--- Additional comment from gavdav.co.uk on 2010-08-04 15:03:51 EDT ---

This problem has re-appeared.

It's not exactly the same, it did it after a cold boot and I had a fight getting it working. Interfaces have unique mac addresses, but one or all of them remained unresponsive after a boot or a down/up.

Uname now: 2.6.32.16-141.fc12.i686
Picked this up in dmesg:
Aug  4 19:35:18 turnstile kernel: ------------[ cut here ]------------
Aug  4 19:35:18 turnstile kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xc6/0x158()
Aug  4 19:35:18 turnstile kernel: Hardware name:
Aug  4 19:35:18 turnstile kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Aug  4 19:35:18 turnstile kernel: Modules linked in: xt_multiport ipt_LOG act_nat ebt_dnat ebt_snat ebtable_nat ebtables iptable_nat nf_nat_snmp_basic nf_nat_proto_udplite nf_nat_sip nf_nat_irc nf_nat_ftp nf_nat_pptp nf_nat_proto_gre nf_nat_proto_dccp nf_nat_amanda nf_nat_proto_sctp libcrc32c nf_nat_tftp nf_nat_h323 nf_nat nf_conntrack_irc nf_conntrack_ftp nf_conntrack_netbios_ns ts_kmp nf_conntrack_amanda nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_sip nf_conntrack_netlink nfnetlink nf_conntrack_tftp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_sane nf_conntrack_proto_dccp nf_conntrack_h323 nf_conntrack_ipv6 ip6t_LOG ip6table_filter ip6_tables cpufreq_ondemand acpi_cpufreq sit tunnel4 ipv6 dm_multipath uinput i2c_viapro i2c_core via_rhine r8169 mii pata_acpi ata_generic firewire_ohci sata_via firewire_core crc_itu_t [last unloaded: scsi_wait_scan]
Aug  4 19:35:18 turnstile kernel: Pid: 0, comm: swapper Not tainted 2.6.32.16-141.fc12.i686 #1
Aug  4 19:35:18 turnstile kernel: Call Trace:
Aug  4 19:35:18 turnstile kernel: [<c043a779>] warn_slowpath_common+0x6a/0x81
Aug  4 19:35:18 turnstile kernel: [<c071b7e1>] ? dev_watchdog+0xc6/0x158
Aug  4 19:35:18 turnstile kernel: [<c043a7ce>] warn_slowpath_fmt+0x29/0x2c
Aug  4 19:35:18 turnstile kernel: [<c071b7e1>] dev_watchdog+0xc6/0x158
Aug  4 19:35:18 turnstile kernel: [<c04475ae>] ? __mod_timer+0x115/0x120
Aug  4 19:35:18 turnstile kernel: [<c04410f8>] ? local_bh_enable_ip+0xd/0xf
Aug  4 19:35:18 turnstile kernel: [<c0795303>] ? _spin_unlock_bh+0x13/0x15
Aug  4 19:35:18 turnstile kernel: [<f7e423cd>] ? fib6_run_gc+0xb7/0xbe [ipv6]
Aug  4 19:35:18 turnstile kernel: [<c0447277>] run_timer_softirq+0x16d/0x1f0
Aug  4 19:35:18 turnstile kernel: [<c071b71b>] ? dev_watchdog+0x0/0x158
Aug  4 19:35:18 turnstile kernel: [<c0440e72>] __do_softirq+0xb1/0x157
Aug  4 19:35:18 turnstile kernel: [<c0440f4e>] do_softirq+0x36/0x41
Aug  4 19:35:18 turnstile kernel: [<c0441041>] irq_exit+0x2e/0x61
Aug  4 19:35:18 turnstile kernel: [<c0404e3d>] do_IRQ+0x86/0x9a
Aug  4 19:35:18 turnstile kernel: [<c0403c90>] common_interrupt+0x30/0x38
Aug  4 19:35:18 turnstile kernel: [<c0617074>] ? acpi_idle_enter_simple+0x10f/0x142
Aug  4 19:35:18 turnstile kernel: [<c0616da5>] acpi_idle_enter_bm+0xc7/0x287
Aug  4 19:35:18 turnstile kernel: [<c06e785c>] cpuidle_idle_call+0x73/0xcb
Aug  4 19:35:18 turnstile kernel: [<c0402728>] cpu_idle+0x96/0xb2
Aug  4 19:35:18 turnstile kernel: [<c0782a1c>] rest_init+0x58/0x5a
Aug  4 19:35:18 turnstile kernel: [<c09da8dd>] start_kernel+0x33c/0x341
Aug  4 19:35:18 turnstile kernel: [<c09da09e>] i386_start_kernel+0x9e/0xa5
Aug  4 19:35:18 turnstile kernel: ---[ end trace 58bb370737c8d652 ]---

What's going on ?

Comment 1 Tru Huynh 2010-09-25 20:51:29 UTC
fix provided as http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=78f1cd02457252e1ffbc6caa44a17424a45286b8 works fine to fix cold/warm boot issue

Comment 2 Akemi Yagi 2010-09-26 08:35:29 UTC
(In reply to comment #1)
> fix provided as
> http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=78f1cd02457252e1ffbc6caa44a17424a45286b8
> works fine to fix cold/warm boot issue

Just for info: the fix is _not_ in the RHEL6beta2refresh kernel ( 2.6.32-44.2 ).

Comment 3 Akemi Yagi 2010-09-26 13:33:41 UTC
This report seems to be a duplicate of bugzilla #581654. If so, can we expect the patch to be in RHEL 5.6?

Comment 4 Ivan Vecera 2010-09-29 07:16:31 UTC
Yes, Akemi. The mentioned commit is already part of the patch that solves the bug #581654 and _will_ be in RHEL 5.6. So closing this one as a duplicate.

*** This bug has been marked as a duplicate of bug 581654 ***


Note You need to log in before you can comment on or make changes to this bug.