+++ This bug was initially created as a clone of Bug #573201 +++ Description of problem: When I soft-reboot my fedora host the mac addresses change. This machine is: Jetway J7F2 Fanless 1.2GHz Eden C7 Mini-ITX Motherboard & Jetway 3xGigaLAN Daughterboard Module When cold booted, host mac addresses: eth0 00:30:18:AC:C8:D5 eth1 00:30:18:AC:C8:D6 eth2 00:30:18:AC:C8:D7 eth3 00:30:18:A6:11:E0 (the labels are as discovered on first install) When soft-rebooted, mac addresses are: 00:00:00:00:C8:D5 00:00:00:00:C8:D6 00:00:00:00:C8:D7 00:30:18:A6:11:E0 OS history - started life as an FC10 last april, has since been upgraded to FC11, then FC12. First noticed with kernel-2.6.32.9-70 (2010-03-10), but also exhibited it with kernel-2.6.32.9-67 once noticed. I first thought it was bug 557771, but I now think the NICs are not resetting/initialised/queried properly following a soft reboot. udev (145-15) went a bit mad and adds pertitent rules to 70-persistent-net.rules which match the mac addresses seen, but using these the driver seems ineffective, it can't see link state changes and tcpdump can't see a thing. lspci, dmesg, udevinfo attached (cold and soft booted) Version-Release number of selected component (if applicable): Pertinent package history: Jan 15 19:45:53 Updated: kernel-firmware-2.6.27.41-170.2.117.fc10.noarch Jan 15 19:47:48 Installed: kernel-2.6.27.41-170.2.117.fc10.i686 Jan 15 19:47:57 Installed: kernel-2.6.27.41-170.2.117.fc10.i686 Feb 26 22:04:41 Erased: libudev0 Feb 26 22:48:05 Updated: kernel-firmware-2.6.30.10-105.2.23.fc11.noarch Feb 26 22:50:10 Installed: libudev0-141-8.fc11.i586 Feb 26 22:54:58 Installed: udev-extras-20090226-0.5.20090302git.fc11.i586 Feb 26 22:55:37 Updated: udev-141-8.fc11.i586 Feb 26 23:00:05 Installed: kernel-2.6.30.10-105.2.23.fc11.i586 Feb 26 23:13:08 Installed: kernel-2.6.30.10-105.2.23.fc11.i586 Mar 03 20:24:47 Updated: kernel-firmware-2.6.31.12-174.2.22.fc12.noarch Mar 03 20:26:10 Installed: libudev-145-15.fc12.i686 Mar 03 20:30:02 Installed: libgudev1-145-15.fc12.i686 Mar 03 20:33:39 Installed: udev-145-15.fc12.i686 Mar 03 20:46:43 Erased: libudev0 Mar 03 20:48:01 Erased: udev-extras Mar 03 20:59:36 Installed: kernel-2.6.31.12-174.2.22.fc12.i686 Mar 06 09:08:15 Updated: kernel-firmware-2.6.32.9-67.fc12.noarch Mar 06 09:12:28 Installed: kernel-2.6.32.9-67.fc12.i686 Mar 10 19:37:10 Updated: kernel-firmware-2.6.32.9-70.fc12.noarch Mar 10 19:39:36 Installed: kernel-2.6.32.9-70.fc12.i686 How reproducible: On demand. Steps to Reproduce: 1.Reboot host - fails to recognise or initialise network devices. 2. 3. Actual results: NICs are discovered, but renumbered and not working. Expected results: No change. Additional info: --- Additional comment from gavdav.co.uk on 2010-03-13 06:56:32 EST --- Created attachment 399829 [details] dmesg of machine in fault condition --- Additional comment from gavdav.co.uk on 2010-03-13 06:57:02 EST --- Created attachment 399830 [details] lspci -vvnn of machine in fault condition --- Additional comment from gavdav.co.uk on 2010-03-13 06:57:34 EST --- Created attachment 399832 [details] eth3 udevinfo in fault condition --- Additional comment from gavdav.co.uk on 2010-03-13 06:57:50 EST --- Created attachment 399833 [details] eth4 udevinfo in fault condition --- Additional comment from gavdav.co.uk on 2010-03-13 06:58:06 EST --- Created attachment 399834 [details] eth5 udevinfo in fault condition --- Additional comment from gavdav.co.uk on 2010-03-13 06:58:23 EST --- Created attachment 399835 [details] eth6 udevinfo in fault condition --- Additional comment from gavdav.co.uk on 2010-03-13 06:58:54 EST --- Created attachment 399836 [details] dmesg of machine after cold boot --- Additional comment from gavdav.co.uk on 2010-03-13 06:59:14 EST --- Created attachment 399837 [details] lspci -vvnn of machine after cold boot --- Additional comment from gavdav.co.uk on 2010-03-13 06:59:39 EST --- Created attachment 399838 [details] eth0 udevinfo of machine after cold boot --- Additional comment from gavdav.co.uk on 2010-03-13 07:00:04 EST --- Created attachment 399839 [details] eth1 udevinfo of machine after cold boot --- Additional comment from gavdav.co.uk on 2010-03-13 07:00:26 EST --- Created attachment 399840 [details] eth2 udevinfo of machine after cold boot --- Additional comment from gavdav.co.uk on 2010-03-13 07:00:44 EST --- Created attachment 399841 [details] eth3 udevinfo of machine after cold boot --- Additional comment from mschmidt on 2010-03-15 18:16:11 EDT --- So the low 4 bytes of the MAC address get zero'd on warm reboot for some reason. 2.6.32 writes the MAC address to the card on shutdown. It calls rtl_rar_set() which does: ... RTL_W32(MAC0, low); RTL_W32(MAC4, high); ... And the card loses the 'low' part. Wild guess - perhaps the card wants to have the high half written first. Anyway, the writing of the MAC address on shutdown was introduced by the commit: commit cc098dc705895f6b0109b7e8e026ac2b8ae1c0a1 Author: Ivan Vecera Date: Sun Nov 29 23:12:52 2009 -0800 r8169: restore mac addr in rtl8169_remove_one and rtl_shutdown Gavin, could you test if reverting the commit would fix your problem? Putting Ivan to CC... Ivan, any ideas? --- Additional comment from gavdav.co.uk on 2010-03-15 18:36:59 EDT --- Sure, which kernel do you need me to test ? --- Additional comment from mschmidt on 2010-03-16 05:23:22 EDT --- I suggest you download the latest stable vanilla kernel, build it from source, and see if the bug is still reproducible with it. Then we can add some test patches to it. But before you do that, let's try a couple of easy experiments first with your current Fedora kernel: 1. What happens when you remove the module (rmmod r8169) and then load it back (modprobe r8169)? Does the MAC address get corrupted in this case also? 2. What if you rmmod the module and then reboot? Is the MAC address read correctly then? And just in case - you are not using kexec to reboot quickly, are you? --- Additional comment from gavdav.co.uk on 2010-03-21 10:21:09 EDT --- 1. If I unload the driver and reload it, it comes back with the invalid mac address (either after a bring in a working state, or soft booted with the invalid MACs prior to unloading the driver) 2. If I unload the driver and reboot, it comes back with the invalid mac addresses. Going to try and compile and build this: http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.34-rc2.tar.bz2 --- Additional comment from mschmidt on 2010-03-22 12:21:40 EDT --- Assuming you will manage to reproduce the problem with 2.6.34-rc2 ... I'm going to attach two short patches for r8169 against 2.6.34-rc2 and I would like to know if any one of them helps. Do not try to apply both of them at the same time, they are mutually exclusive. --- Additional comment from mschmidt on 2010-03-22 12:26:36 EDT --- Created attachment 401815 [details] write the high half of the MAC address first With this patch I expect one of the following results after a cold start+rmmod+modprobe cycle: 1. The MAC address will appear correct. 2. The MAC address will be invalid, but in a different way than before. Perhaps the first 4 bytes will be good and the other 2 bytes will be zeros. 3. The MAC address will be invalid in the same way as before. --- Additional comment from mschmidt on 2010-03-22 12:29:22 EDT --- Created attachment 401816 [details] flush after each write This is the other patch that might help. --- Additional comment from gavdav.co.uk on 2010-03-23 15:45:25 EDT --- Urgh - I'm rusty. Can't get what I've built to boot yet and I'm not getting much time to look at it. Will persevere (the reason I use FC12 is so I don't need to build kernels by hand any more) --- Additional comment from ivecera on 2010-03-29 06:10:03 EDT --- Recent update in Dave's tree should fix this issue. http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=78f1cd02457252e1ffbc6caa44a17424a45286b8 --- Additional comment from mschmidt on 2010-03-29 06:17:43 EDT --- Great, so it seems my original "wild guess" was correct. Ivan, do you know if the patch is considered for -stable ? --- Additional comment from ivecera on 2010-03-29 06:36:16 EDT --- AFAIK not yet but IMHO it should be --- Additional comment from christian.robert07 on 2010-04-09 19:47:31 EDT --- (In reply to comment #23) > AFAIK not yet but IMHO it should be I have the exact same bug with RedHat 5.5 (Tikanga) and the latest kernel, eg: 2.6.18-194.el5 Xtian. --- Additional comment from ivecera on 2010-04-12 05:39:33 EDT --- For RHEL the new bugzilla report should be opened. Christian, could you do it? --- Additional comment from jflorian on 2010-05-02 13:41:17 EDT --- I also have the exact same problem using a RTL8169sc/8110sc on an Asus A7V-333X mainboard and first noticed this with 2.6.32.11-99.fc12.i686.PAE. I tried updating with updates-testing enabled which resulted in 2.6.32.12-114.fc12.i686.PAE being installed, but I see no improvement. I haven't reviewed the changelog to see if that should be expected or not. --- Additional comment from gavdav.co.uk on 2010-05-04 16:09:13 EDT --- I finally read the instructions :) Have built and have booting a 2.6.34-rc6 kernel which is still exhibiting the problem. The line numbers have changed - trying to apply the patches above are failing: # cat patch.401815 | patch -p1 patching file drivers/net/r8169.c Hunk #1 FAILED at 2821. 1 out of 1 hunk FAILED -- saving rejects to file drivers/net/r8169.c.rej Can you re-issue the same patches (and how I should apply them properly) for 2.6.34-rc6 ? Dmesg information for the cards with this kernel soft booted: kernel: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded kernel: r8169 0000:00:09.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18 kernel: r8169 0000:00:09.0: (unregistered net_device): no PCI Express capability kernel: r8169 0000:00:09.0: eth0: RTL8169sc/8110sc at 0xf805e000, 00:00:00:00:c8:d5, XID 18000000 IRQ 18 kernel: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded kernel: r8169 0000:00:0b.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19 kernel: r8169 0000:00:0b.0: (unregistered net_device): no PCI Express capability kernel: r8169 0000:00:0b.0: eth1: RTL8169sc/8110sc at 0xf80ac000, 00:00:00:00:c8:d6, XID 18000000 IRQ 19 kernel: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded kernel: r8169 0000:00:0c.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 kernel: r8169 0000:00:0c.0: (unregistered net_device): no PCI Express capability kernel: r8169 0000:00:0c.0: eth2: RTL8169sc/8110sc at 0xf8110000, 00:00:00:00:c8:d7, XID 18000000 IRQ 16 kernel: via-rhine.c:v1.10-LK1.4.3 2007-03-06 Written by Donald Becker kernel: via-rhine: Broken BIOS detected, avoid_D3 enabled. kernel: via-rhine 0000:00:12.0: PCI INT A -> Link[ALKD] -> GSI 23 (level, low) -> IRQ 23 kernel: eth3: VIA Rhine II at 0xfdffa000, 00:30:18:a6:11:e0, IRQ 23. kernel: eth3: MII PHY found at address 1, status 0x7849 advertising 05e1 Link 0000. kernel: udev: renamed network interface eth0 to eth6 kernel: udev: renamed network interface eth1 to eth5 kernel: udev: renamed network interface eth2 to eth4 --- Additional comment from gavdav.co.uk on 2010-05-04 17:43:21 EDT --- ok, so I made the same edits as per http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=78f1cd02457252e1ffbc6caa44a17424a45286b8 (Because I couldn't work out how to generate a patch myself) This has fixed the problem. -Cold boot into 2.6.34-rc6. Devices report predicted mac addresses (and work) -Soft reboot into 2.6.34-rc6. Devices report predicted mac addresses (and work) dmesg of discovery: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded r8169 0000:00:09.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18 r8169 0000:00:09.0: (unregistered net_device): no PCI Express capability r8169 0000:00:09.0: eth1: RTL8169sc/8110sc at 0xf810e000, 00:30:18:ac:c8:d5, XID 18000000 IRQ 18 r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded r8169 0000:00:0b.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19 r8169 0000:00:0b.0: (unregistered net_device): no PCI Express capability r8169 0000:00:0b.0: eth2: RTL8169sc/8110sc at 0xf8112000, 00:30:18:ac:c8:d6, XID 18000000 IRQ 19 r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded r8169 0000:00:0c.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 r8169 0000:00:0c.0: (unregistered net_device): no PCI Express capability r8169 0000:00:0c.0: eth3: RTL8169sc/8110sc at 0xf8116000, 00:30:18:ac:c8:d7, XID 18000000 IRQ 16 r8169 0000:00:09.0: eth0: link up r8169 0000:00:09.0: eth0: link up r8169 0000:00:0b.0: eth1: link up r8169 0000:00:0b.0: eth1: link up r8169 0000:00:0c.0: eth2: link up This appears to have fixed it - what other tests should I do ? --- Additional comment from mschmidt on 2010-05-13 02:41:00 EDT --- The patch "r8169: fix broken register writes" and one more related patch ("r8169: more broken register writes workaround") are now included in the -stable release 2.6.32.13. You can download a RPM for F-12 from Koji: http://koji.fedoraproject.org/koji/buildinfo?buildID=173221 The build has not yet been submitted as a Fedora update, but eventually it (or a later one) will be. --- Additional comment from ipilcher on 2010-05-19 10:05:47 EDT --- I just ran into this on my Asterisk box. (The wife loves it when the phones stop working!) Seems to be fixed by 2.6.32.13-118.fc12.i686.PAE. More info available on request. --- Additional comment from yves on 2010-08-01 02:46:11 EDT --- *** Bug 583223 has been marked as a duplicate of this bug. *** --- Additional comment from yves on 2010-08-01 02:55:26 EDT --- I too am affected by this bug, but on Fedora 13, with kernel: 2.6.33.6-147.fc13.i686.PAE Is there a way to flag this bug for Fedora 13 as well as 12? --- Additional comment from cebbert on 2010-08-02 11:46:18 EDT --- (In reply to comment #32) > I too am affected by this bug, but on Fedora 13, with kernel: > 2.6.33.6-147.fc13.i686.PAE > All of the r8169 patches that went into 2.6.32.13 also went into 2.6.33.4 at the same time. So why they would fix one kernel and not the other is a mystery. > Is there a way to flag this bug for Fedora 13 as well as 12? The only way to do that is to report the bug separately against each release, but you just closed the Fedora 13 bug as a duplicate of this one. --- Additional comment from yves on 2010-08-02 12:01:04 EDT --- >> Is there a way to flag this bug for Fedora 13 as well as 12? >> The only way to do that is to report the bug separately against each release, but you just closed the Fedora 13 bug as a duplicate of this one. Should I re-open it (or re-create another one)? > All of the r8169 patches that went into 2.6.32.13 also went into 2.6.33.4 at the same time. So why they would fix one kernel and not the other is a mystery. What can we do to make sure that the patches are applied to 2.6.33.*, as well as 2.6.34 and 2.6.35 (and the future new ones)? --- Additional comment from gavdav.co.uk on 2010-08-04 15:03:51 EDT --- This problem has re-appeared. It's not exactly the same, it did it after a cold boot and I had a fight getting it working. Interfaces have unique mac addresses, but one or all of them remained unresponsive after a boot or a down/up. Uname now: 2.6.32.16-141.fc12.i686 Picked this up in dmesg: Aug 4 19:35:18 turnstile kernel: ------------[ cut here ]------------ Aug 4 19:35:18 turnstile kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xc6/0x158() Aug 4 19:35:18 turnstile kernel: Hardware name: Aug 4 19:35:18 turnstile kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out Aug 4 19:35:18 turnstile kernel: Modules linked in: xt_multiport ipt_LOG act_nat ebt_dnat ebt_snat ebtable_nat ebtables iptable_nat nf_nat_snmp_basic nf_nat_proto_udplite nf_nat_sip nf_nat_irc nf_nat_ftp nf_nat_pptp nf_nat_proto_gre nf_nat_proto_dccp nf_nat_amanda nf_nat_proto_sctp libcrc32c nf_nat_tftp nf_nat_h323 nf_nat nf_conntrack_irc nf_conntrack_ftp nf_conntrack_netbios_ns ts_kmp nf_conntrack_amanda nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_sip nf_conntrack_netlink nfnetlink nf_conntrack_tftp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_sane nf_conntrack_proto_dccp nf_conntrack_h323 nf_conntrack_ipv6 ip6t_LOG ip6table_filter ip6_tables cpufreq_ondemand acpi_cpufreq sit tunnel4 ipv6 dm_multipath uinput i2c_viapro i2c_core via_rhine r8169 mii pata_acpi ata_generic firewire_ohci sata_via firewire_core crc_itu_t [last unloaded: scsi_wait_scan] Aug 4 19:35:18 turnstile kernel: Pid: 0, comm: swapper Not tainted 2.6.32.16-141.fc12.i686 #1 Aug 4 19:35:18 turnstile kernel: Call Trace: Aug 4 19:35:18 turnstile kernel: [<c043a779>] warn_slowpath_common+0x6a/0x81 Aug 4 19:35:18 turnstile kernel: [<c071b7e1>] ? dev_watchdog+0xc6/0x158 Aug 4 19:35:18 turnstile kernel: [<c043a7ce>] warn_slowpath_fmt+0x29/0x2c Aug 4 19:35:18 turnstile kernel: [<c071b7e1>] dev_watchdog+0xc6/0x158 Aug 4 19:35:18 turnstile kernel: [<c04475ae>] ? __mod_timer+0x115/0x120 Aug 4 19:35:18 turnstile kernel: [<c04410f8>] ? local_bh_enable_ip+0xd/0xf Aug 4 19:35:18 turnstile kernel: [<c0795303>] ? _spin_unlock_bh+0x13/0x15 Aug 4 19:35:18 turnstile kernel: [<f7e423cd>] ? fib6_run_gc+0xb7/0xbe [ipv6] Aug 4 19:35:18 turnstile kernel: [<c0447277>] run_timer_softirq+0x16d/0x1f0 Aug 4 19:35:18 turnstile kernel: [<c071b71b>] ? dev_watchdog+0x0/0x158 Aug 4 19:35:18 turnstile kernel: [<c0440e72>] __do_softirq+0xb1/0x157 Aug 4 19:35:18 turnstile kernel: [<c0440f4e>] do_softirq+0x36/0x41 Aug 4 19:35:18 turnstile kernel: [<c0441041>] irq_exit+0x2e/0x61 Aug 4 19:35:18 turnstile kernel: [<c0404e3d>] do_IRQ+0x86/0x9a Aug 4 19:35:18 turnstile kernel: [<c0403c90>] common_interrupt+0x30/0x38 Aug 4 19:35:18 turnstile kernel: [<c0617074>] ? acpi_idle_enter_simple+0x10f/0x142 Aug 4 19:35:18 turnstile kernel: [<c0616da5>] acpi_idle_enter_bm+0xc7/0x287 Aug 4 19:35:18 turnstile kernel: [<c06e785c>] cpuidle_idle_call+0x73/0xcb Aug 4 19:35:18 turnstile kernel: [<c0402728>] cpu_idle+0x96/0xb2 Aug 4 19:35:18 turnstile kernel: [<c0782a1c>] rest_init+0x58/0x5a Aug 4 19:35:18 turnstile kernel: [<c09da8dd>] start_kernel+0x33c/0x341 Aug 4 19:35:18 turnstile kernel: [<c09da09e>] i386_start_kernel+0x9e/0xa5 Aug 4 19:35:18 turnstile kernel: ---[ end trace 58bb370737c8d652 ]--- What's going on ?
fix provided as http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=78f1cd02457252e1ffbc6caa44a17424a45286b8 works fine to fix cold/warm boot issue
(In reply to comment #1) > fix provided as > http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commitdiff;h=78f1cd02457252e1ffbc6caa44a17424a45286b8 > works fine to fix cold/warm boot issue Just for info: the fix is _not_ in the RHEL6beta2refresh kernel ( 2.6.32-44.2 ).
This report seems to be a duplicate of bugzilla #581654. If so, can we expect the patch to be in RHEL 5.6?
Yes, Akemi. The mentioned commit is already part of the patch that solves the bug #581654 and _will_ be in RHEL 5.6. So closing this one as a duplicate. *** This bug has been marked as a duplicate of bug 581654 ***