Description of problem: At boot, the Intel igb card fails with: [ 35.883590] igb 0000:04:00.0 enp4s0: PCIe link lost, device now detached [ 35.891333] br0: port 1(enp4s0) entered blocking state [ 35.891338] br0: port 1(enp4s0) entered disabled state [ 35.891645] device enp4s0 entered promiscuous mode [ 35.904155] igb 0000:04:00.0 enp4s0: failed to initialize vlan filtering on this port [ 35.915012] br0: port 1(enp4s0) entered blocking state [ 35.915017] br0: port 1(enp4s0) entered disabled state [ 35.931059] igb 0000:04:00.0 enp4s0: failed to initialize vlan filtering on this port It was suggested to me that this indicates a hardware failure. However this is unlikely, as simply reloading the igb module fixes the problem. I now have a script which does this after boot: modprobe -r igb sleep 1 modprobe igb sleep 1 systemctl restart network So it looks much more likely that the driver is just broken. Version-Release number of selected component (if applicable): Currently 4.11.0-0.rc4.git1.1.fc27.x86_64, but this has been happening since I bought the machine a year ago. How reproducible: 100% Steps to Reproduce: 1. Boot.
Hi Richard, is that the only output you got, or do you have also a splat like: [ 471.537833] ------------[ cut here ]------------ [ 471.537849] igb: Failed to read reg 0x8! [ 471.537904] WARNING: CPU: 1 PID: 9497 at drivers/net/ethernet/intel/igb/igb_main.c:756 igb_rd32.cold+0x30/0x3b [igb] [...] [ 471.538638] Call Trace: [ 471.538654] igb_get_link_ksettings+0x20/0x200 [igb] [ 471.538674] duplex_show+0x6e/0xc0 [ 471.538689] dev_attr_show+0x19/0x40 [ 471.538704] sysfs_kf_seq_show+0x9b/0xf0 [ 471.538720] seq_read+0xcd/0x400 [ 471.538734] vfs_read+0x9d/0x150 [ 471.538746] ksys_read+0x5f/0xe0 [ 471.538761] do_syscall_64+0x5f/0x1a0 [ 471.538776] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 471.538795] RIP: 0033:0x7ff5a09383c2 [ 471.538808] Code: c0 e9 c2 fe ff ff 50 48 8d 3d c2 0d 0a 00 e8 b5 f1 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [ 471.538862] RSP: 002b:00007ffe3e6fd9d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 471.538887] RAX: ffffffffffffffda RBX: 00000000021442e0 RCX: 00007ff5a09383c2 [ 471.538910] RDX: 0000000000001000 RSI: 000000000215a350 RDI: 0000000000000004 [ 471.538932] RBP: 00007ff5a0a0a300 R08: 0000000000000004 R09: 0000000000000070 [ 471.538955] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000021442e0 [ 471.538977] R13: 00007ff5a0a09700 R14: 0000000000000d68 R15: 0000000000000d68 [ 471.539000] ---[ end trace 0aea06ceef9e275e ]--- Have you already had the opportunity to try kernel 5.3.7-301.fc31 without your workaround? I've found this commit that worked on that part of the code: 94bc1e522b32c866d85b5af0ede55026b585ae73 maybe may be relevant for you as well.
It still happens on this same hardware with every kernel I've tried since around 2016. This machine is using the Rawhide kernel. I don't know if there's something particular about 5.3.7-301.fc31, but there's is nothing for the latest Rawhide (5.4.0-0.rc6.git0.1.fc32.x86_64). In case I missed something I will attach the complete log.
Created attachment 1633038 [details] dmesg
Relevant dmesg output: [31370.350858] ------------[ cut here ]------------ [31370.350859] igc: Failed to read reg 0xc030! [31370.350888] WARNING: CPU: 1 PID: 76852 at drivers/net/ethernet/intel/igc/igc_main.c:6641 igc_rd32+0x8d/0xa0 [igc] [31370.350897] Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc vfat fat iwlmvm snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg mac80211 snd_usb_audio snd_intel_sdw_acpi intel_rapl_msr snd_hda_codec intel_rapl_common snd_usbmidi_lib snd_ump libarc4 edac_mce_amd snd_rawmidi snd_hda_core btusb mc xfs btrtl snd_hwdep iwlwifi kvm_amd btintel snd_seq btbcm asus_nb_wmi eeepc_wmi snd_seq_device asus_wmi btmtk ledtrig_audio snd_pcm kvm uas cfg80211 sparse_keymap bluetooth irqbypass snd_timer platform_profile usb_storage pcspkr rapl wmi_bmof joydev snd i2c_piix4 k10temp rfkill soundcore gpio_amdpt gpio_generic loop zram amdgpu i2c_algo_bit drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni drm_exec polyval_generic [31370.350949] drm_suballoc_helper amdxcp nvme drm_buddy ghash_clmulni_intel gpu_sched sha512_ssse3 sha256_ssse3 sha1_ssse3 ccp nvme_core drm_display_helper sp5100_tco igc cec nvme_common video wmi ip6_tables ip_tables fuse [31370.350964] CPU: 1 PID: 76852 Comm: kworker/1:0 Not tainted 6.6.9-200.fc39.x86_64 #1 [31370.350966] Hardware name: ASUS System Product Name/ROG STRIX X670E-E GAMING WIFI, BIOS 1709 09/28/2023 [31370.350968] Workqueue: events igc_watchdog_task [igc] [31370.350974] RIP: 0010:igc_rd32+0x8d/0xa0 [igc] [31370.350979] Code: 48 c7 c6 58 29 3c c0 e8 f1 5b 9c c4 48 8b bb 28 ff ff ff e8 b5 52 55 c4 84 c0 74 bc 89 ee 48 c7 c7 80 29 3c c0 e8 63 08 d7 c3 <0f> 0b eb aa b8 ff ff ff ff e9 15 83 c5 c4 0f 1f 44 00 00 90 90 90 [31370.350981] RSP: 0018:ffffc90021affdc8 EFLAGS: 00010286 [31370.350983] RAX: 0000000000000000 RBX: ffff88810fc3ccb8 RCX: 0000000000000027 [31370.350984] RDX: ffff88883e461588 RSI: 0000000000000001 RDI: ffff88883e461580 [31370.350985] RBP: 000000000000c030 R08: 0000000000000000 R09: ffffc90021affc50 [31370.350986] R10: 0000000000000003 R11: ffffffff86346508 R12: ffff88810fc3c000 [31370.350988] R13: 0000000000000000 R14: ffff88810bdb8d40 R15: 000000000000c030 [31370.350989] FS: 0000000000000000(0000) GS:ffff88883e440000(0000) knlGS:0000000000000000 [31370.350990] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [31370.350992] CR2: 00001ad003e86000 CR3: 0000000375222000 CR4: 0000000000f50ee0 [31370.350993] PKRU: 55555554 [31370.350994] Call Trace: [31370.350996] <TASK> [31370.350997] ? igc_rd32+0x8d/0xa0 [igc] [31370.351003] ? __warn+0x81/0x130 [31370.351008] ? igc_rd32+0x8d/0xa0 [igc] [31370.351015] ? report_bug+0x171/0x1a0 [31370.351018] ? prb_read_valid+0x1b/0x30 [31370.351021] ? srso_alias_return_thunk+0x5/0x7f [31370.351025] ? handle_bug+0x3c/0x80 [31370.351027] ? exc_invalid_op+0x17/0x70 [31370.351029] ? asm_exc_invalid_op+0x1a/0x20 [31370.351034] ? igc_rd32+0x8d/0xa0 [igc] [31370.351039] ? igc_rd32+0x8d/0xa0 [igc] [31370.351044] igc_update_stats+0x8a/0x6d0 [igc] [31370.351050] igc_watchdog_task+0x9d/0x4a0 [igc] [31370.351056] process_one_work+0x171/0x340 [31370.351060] worker_thread+0x27b/0x3a0 [31370.351063] ? __pfx_worker_thread+0x10/0x10 [31370.351064] kthread+0xe5/0x120 [31370.351068] ? __pfx_kthread+0x10/0x10 [31370.351070] ret_from_fork+0x31/0x50 [31370.351074] ? __pfx_kthread+0x10/0x10 [31370.351076] ret_from_fork_asm+0x1b/0x30 [31370.351081] </TASK> [31370.351082] ---[ end trace 0000000000000000 ]--- On AMD Ryzen 7 7700 8-Core Processor running Fedora 39 (6.6.9-200.fc39.x86_64) As said previously, reload the driver (echo 1 > /sys/bus/pci/devices/<deviceId>/remove && sleep 1 && /sys/bus/pci/devices/<deviceId>/rescan) "fixes" the problem and then the problem does not seem to happen again (albeit it is seemingly random, so can not be 100% sure if the problem does not happen again)
The vendor (ASUS in my case) has published new firmware, which seems to have resolved the issue.
Just ran into this my self, seems random. Oddly running the same AM5 motherboard Adrian14 is running, Jun 24 16:44:29 fedora kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached Jun 24 16:44:29 fedora kernel: ------------[ cut here ]------------ Jun 24 16:44:29 fedora kernel: igc: Failed to read reg 0xc030! Jun 24 16:44:29 fedora kernel: WARNING: CPU: 30 PID: 39020 at drivers/net/ethernet/intel/igc/igc_main.c:6644 igc_rd32+0x88/0xa0 [igc] Jun 24 16:44:29 fedora kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chai> Jun 24 16:44:29 fedora kernel: amdxcp i2c_algo_bit drm_ttm_helper ttm crct10dif_pclmul drm_exec crc32_pclmul crc32c_intel gpu_sched polyval_clmulni polyval_generic drm_suballoc_helper drm_buddy nvme ghash_clmulni_intel drm_display_helper sha512_sss> Jun 24 16:44:29 fedora kernel: CPU: 30 PID: 39020 Comm: kworker/30:0 Tainted: P OEL 6.9.4-200.fc40.x86_64 #1 Jun 24 16:44:29 fedora kernel: Hardware name: ASUS System Product Name/ROG STRIX X670E-E GAMING WIFI, BIOS 1807 12/14/2023 Jun 24 16:44:29 fedora kernel: Workqueue: events igc_watchdog_task [igc] Jun 24 16:44:29 fedora kernel: RIP: 0010:igc_rd32+0x88/0xa0 [igc] Jun 24 16:44:29 fedora kernel: Code: 48 c7 c6 d8 28 7d c1 e8 86 ed a0 d7 48 8b bd 28 ff ff ff e8 ba e4 21 d7 84 c0 74 c5 89 de 48 c7 c7 00 29 7d c1 e8 a8 58 9b d6 <0f> 0b eb b3 83 c8 ff e9 d7 69 b9 d7 66 66 2e 0f 1f 84 00 00 00 00 Jun 24 16:44:29 fedora kernel: RSP: 0018:ffffa6934fb5fdc8 EFLAGS: 00010282 Jun 24 16:44:29 fedora kernel: RAX: 0000000000000000 RBX: 000000000000c030 RCX: 0000000000000027 Jun 24 16:44:29 fedora kernel: RDX: ffff97683e7218c8 RSI: 0000000000000001 RDI: ffff97683e7218c0 Jun 24 16:44:29 fedora kernel: RBP: ffff97514c0c2ce8 R08: 0000000000000000 R09: 6765722064616572 Jun 24 16:44:29 fedora kernel: R10: ffffa6934fb5fb88 R11: 696146203a636769 R12: 0000000000000000 Jun 24 16:44:29 fedora kernel: R13: 0000000000000000 R14: ffff975158a4b3c0 R15: 000000000000c030 Jun 24 16:44:29 fedora kernel: FS: 0000000000000000(0000) GS:ffff97683e700000(0000) knlGS:0000000000000000 Jun 24 16:44:29 fedora kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 24 16:44:29 fedora kernel: CR2: 00007fbe96604000 CR3: 0000001473428000 CR4: 0000000000f50ef0 Jun 24 16:44:29 fedora kernel: PKRU: 55555554 Jun 24 16:44:29 fedora kernel: Call Trace: Jun 24 16:44:29 fedora kernel: <TASK> Jun 24 16:44:29 fedora kernel: ? igc_rd32+0x88/0xa0 [igc] Jun 24 16:44:29 fedora kernel: ? __warn.cold+0x8e/0xe8 Jun 24 16:44:29 fedora kernel: ? igc_rd32+0x88/0xa0 [igc] Jun 24 16:44:29 fedora kernel: ? report_bug+0xff/0x140 Jun 24 16:44:29 fedora kernel: ? console_unlock+0x84/0x130 Jun 24 16:44:29 fedora kernel: ? handle_bug+0x3c/0x80 Jun 24 16:44:29 fedora kernel: ? exc_invalid_op+0x17/0x70 Jun 24 16:44:29 fedora kernel: ? asm_exc_invalid_op+0x1a/0x20 Jun 24 16:44:29 fedora kernel: ? igc_rd32+0x88/0xa0 [igc] Jun 24 16:44:29 fedora kernel: ? igc_rd32+0x88/0xa0 [igc] Jun 24 16:44:29 fedora kernel: igc_update_stats+0x97/0x760 [igc] Jun 24 16:44:29 fedora kernel: igc_watchdog_task+0xa3/0x2d0 [igc] Jun 24 16:44:29 fedora kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jun 24 16:44:29 fedora kernel: process_one_work+0x186/0x340 Jun 24 16:44:29 fedora kernel: worker_thread+0x278/0x3b0 Jun 24 16:44:29 fedora kernel: ? __pfx_worker_thread+0x10/0x10 Jun 24 16:44:29 fedora kernel: kthread+0xcf/0x100 Jun 24 16:44:29 fedora kernel: ? __pfx_kthread+0x10/0x10 Jun 24 16:44:29 fedora kernel: ret_from_fork+0x31/0x50 Jun 24 16:44:29 fedora kernel: ? __pfx_kthread+0x10/0x10 Jun 24 16:44:29 fedora kernel: ret_from_fork_asm+0x1a/0x30 Jun 24 16:44:29 fedora kernel: </TASK> Jun 24 16:44:29 fedora kernel: ---[ end trace 0000000000000000 ]--- Currently on Fedora 40, with a AMD Ryzen 9 7950x. I'm on firmware 1807 the firmware after Adrian14 reported this Removing and doing a PCIE rescan, then reloading the NIC tends to bring it back up but it's super annoying. @adrian14 what firmware did you move to that solved this for you?
@(In reply to Tim Montgomery from comment #6) > Just ran into this my self, seems random. Oddly running the same AM5 > motherboard Adrian14 is running, > > Jun 24 16:44:29 fedora kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device > now detached > Jun 24 16:44:29 fedora kernel: ------------[ cut here ]------------ > Jun 24 16:44:29 fedora kernel: igc: Failed to read reg 0xc030! > Jun 24 16:44:29 fedora kernel: WARNING: CPU: 30 PID: 39020 at > drivers/net/ethernet/intel/igc/igc_main.c:6644 igc_rd32+0x88/0xa0 [igc] > Jun 24 16:44:29 fedora kernel: Modules linked in: rfcomm snd_seq_dummy > snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 > nf_reject_ipv6 nft_reject nft_ct nft_chai> > Jun 24 16:44:29 fedora kernel: amdxcp i2c_algo_bit drm_ttm_helper ttm > crct10dif_pclmul drm_exec crc32_pclmul crc32c_intel gpu_sched > polyval_clmulni polyval_generic drm_suballoc_helper drm_buddy nvme > ghash_clmulni_intel drm_display_helper sha512_sss> > Jun 24 16:44:29 fedora kernel: CPU: 30 PID: 39020 Comm: kworker/30:0 > Tainted: P OEL 6.9.4-200.fc40.x86_64 #1 > Jun 24 16:44:29 fedora kernel: Hardware name: ASUS System Product Name/ROG > STRIX X670E-E GAMING WIFI, BIOS 1807 12/14/2023 > Jun 24 16:44:29 fedora kernel: Workqueue: events igc_watchdog_task [igc] > Jun 24 16:44:29 fedora kernel: RIP: 0010:igc_rd32+0x88/0xa0 [igc] > Jun 24 16:44:29 fedora kernel: Code: 48 c7 c6 d8 28 7d c1 e8 86 ed a0 d7 48 > 8b bd 28 ff ff ff e8 ba e4 21 d7 84 c0 74 c5 89 de 48 c7 c7 00 29 7d c1 e8 > a8 58 9b d6 <0f> 0b eb b3 83 c8 ff e9 d7 69 b9 d7 66 66 2e 0f 1f 84 00 00 00 > 00 > Jun 24 16:44:29 fedora kernel: RSP: 0018:ffffa6934fb5fdc8 EFLAGS: 00010282 > Jun 24 16:44:29 fedora kernel: RAX: 0000000000000000 RBX: 000000000000c030 > RCX: 0000000000000027 > Jun 24 16:44:29 fedora kernel: RDX: ffff97683e7218c8 RSI: 0000000000000001 > RDI: ffff97683e7218c0 > Jun 24 16:44:29 fedora kernel: RBP: ffff97514c0c2ce8 R08: 0000000000000000 > R09: 6765722064616572 > Jun 24 16:44:29 fedora kernel: R10: ffffa6934fb5fb88 R11: 696146203a636769 > R12: 0000000000000000 > Jun 24 16:44:29 fedora kernel: R13: 0000000000000000 R14: ffff975158a4b3c0 > R15: 000000000000c030 > Jun 24 16:44:29 fedora kernel: FS: 0000000000000000(0000) > GS:ffff97683e700000(0000) knlGS:0000000000000000 > Jun 24 16:44:29 fedora kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > Jun 24 16:44:29 fedora kernel: CR2: 00007fbe96604000 CR3: 0000001473428000 > CR4: 0000000000f50ef0 > Jun 24 16:44:29 fedora kernel: PKRU: 55555554 > Jun 24 16:44:29 fedora kernel: Call Trace: > Jun 24 16:44:29 fedora kernel: <TASK> > Jun 24 16:44:29 fedora kernel: ? igc_rd32+0x88/0xa0 [igc] > Jun 24 16:44:29 fedora kernel: ? __warn.cold+0x8e/0xe8 > Jun 24 16:44:29 fedora kernel: ? igc_rd32+0x88/0xa0 [igc] > Jun 24 16:44:29 fedora kernel: ? report_bug+0xff/0x140 > Jun 24 16:44:29 fedora kernel: ? console_unlock+0x84/0x130 > Jun 24 16:44:29 fedora kernel: ? handle_bug+0x3c/0x80 > Jun 24 16:44:29 fedora kernel: ? exc_invalid_op+0x17/0x70 > Jun 24 16:44:29 fedora kernel: ? asm_exc_invalid_op+0x1a/0x20 > Jun 24 16:44:29 fedora kernel: ? igc_rd32+0x88/0xa0 [igc] > Jun 24 16:44:29 fedora kernel: ? igc_rd32+0x88/0xa0 [igc] > Jun 24 16:44:29 fedora kernel: igc_update_stats+0x97/0x760 [igc] > Jun 24 16:44:29 fedora kernel: igc_watchdog_task+0xa3/0x2d0 [igc] > Jun 24 16:44:29 fedora kernel: ? srso_alias_return_thunk+0x5/0xfbef5 > Jun 24 16:44:29 fedora kernel: process_one_work+0x186/0x340 > Jun 24 16:44:29 fedora kernel: worker_thread+0x278/0x3b0 > Jun 24 16:44:29 fedora kernel: ? __pfx_worker_thread+0x10/0x10 > Jun 24 16:44:29 fedora kernel: kthread+0xcf/0x100 > Jun 24 16:44:29 fedora kernel: ? __pfx_kthread+0x10/0x10 > Jun 24 16:44:29 fedora kernel: ret_from_fork+0x31/0x50 > Jun 24 16:44:29 fedora kernel: ? __pfx_kthread+0x10/0x10 > Jun 24 16:44:29 fedora kernel: ret_from_fork_asm+0x1a/0x30 > Jun 24 16:44:29 fedora kernel: </TASK> > Jun 24 16:44:29 fedora kernel: ---[ end trace 0000000000000000 ]--- > > Currently on Fedora 40, with a AMD Ryzen 9 7950x. > I'm on firmware 1807 the firmware after Adrian14 reported this > Removing and doing a PCIE rescan, then reloading the NIC tends to bring it > back up but it's super annoying. > > @adrian14 what firmware did you move to that solved this for you? It doesn't - it *appeared* to work, but started again (and I think there's new firmware available, I'm on 2007, from 04/12/2024, and yes it still happens, less frequently). Now it seems to only happen at the most, once per session (i.e.: between power off/on), and only when the machine has been on for (an undefined) "while". I can go days (wayyyy!!!) without the issue, but it does happen. As said above, I just reload the module when the message appears in the kernel log (dmesg), so am checking dmesg every minute (could make it shorter, but meh). Some Google-fu suggests this is some return from power saving state kind of thing on the PCI bus, so was thinking of using udev "blacklist" PCI saving for all Intel devices on the PCI bus. I haven't, the scraping is working for me, generally, and I like to not waste power. Sorry.