Bug 1442638 - igb ... PCIe link lost, device now detached, but reloading the igb module fixes things
Summary: igb ... PCIe link lost, device now detached, but reloading the igb module fix...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-16 21:59 UTC by Richard W.M. Jones
Modified: 2019-11-05 18:16 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)
dmesg (91.89 KB, text/plain)
2019-11-05 18:16 UTC, Richard W.M. Jones
no flags Details

Description Richard W.M. Jones 2017-04-16 21:59:06 UTC
Description of problem:

At boot, the Intel igb card fails with:

[   35.883590] igb 0000:04:00.0 enp4s0: PCIe link lost, device now detached
[   35.891333] br0: port 1(enp4s0) entered blocking state
[   35.891338] br0: port 1(enp4s0) entered disabled state
[   35.891645] device enp4s0 entered promiscuous mode
[   35.904155] igb 0000:04:00.0 enp4s0: failed to initialize vlan filtering on this port
[   35.915012] br0: port 1(enp4s0) entered blocking state
[   35.915017] br0: port 1(enp4s0) entered disabled state
[   35.931059] igb 0000:04:00.0 enp4s0: failed to initialize vlan filtering on this port

It was suggested to me that this indicates a hardware failure.
However this is unlikely, as simply reloading the igb module
fixes the problem.  I now have a script which does this after boot:

modprobe -r igb
sleep 1
modprobe igb
sleep 1
systemctl restart network

So it looks much more likely that the driver is just broken.

Version-Release number of selected component (if applicable):

Currently 4.11.0-0.rc4.git1.1.fc27.x86_64, but this has
been happening since I bought the machine a year ago.

How reproducible:

100%

Steps to Reproduce:
1. Boot.

Comment 1 Andrea Perotti 2019-11-05 15:44:43 UTC
Hi Richard, is that the only output you got, or do you have also a splat like:

[  471.537833] ------------[ cut here ]------------
[  471.537849] igb: Failed to read reg 0x8!
[  471.537904] WARNING: CPU: 1 PID: 9497 at drivers/net/ethernet/intel/igb/igb_main.c:756 igb_rd32.cold+0x30/0x3b [igb]
[...]
[  471.538638] Call Trace:
[  471.538654]  igb_get_link_ksettings+0x20/0x200 [igb]
[  471.538674]  duplex_show+0x6e/0xc0
[  471.538689]  dev_attr_show+0x19/0x40
[  471.538704]  sysfs_kf_seq_show+0x9b/0xf0
[  471.538720]  seq_read+0xcd/0x400
[  471.538734]  vfs_read+0x9d/0x150
[  471.538746]  ksys_read+0x5f/0xe0
[  471.538761]  do_syscall_64+0x5f/0x1a0
[  471.538776]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  471.538795] RIP: 0033:0x7ff5a09383c2
[  471.538808] Code: c0 e9 c2 fe ff ff 50 48 8d 3d c2 0d 0a 00 e8 b5 f1 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[  471.538862] RSP: 002b:00007ffe3e6fd9d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  471.538887] RAX: ffffffffffffffda RBX: 00000000021442e0 RCX: 00007ff5a09383c2
[  471.538910] RDX: 0000000000001000 RSI: 000000000215a350 RDI: 0000000000000004
[  471.538932] RBP: 00007ff5a0a0a300 R08: 0000000000000004 R09: 0000000000000070
[  471.538955] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000021442e0
[  471.538977] R13: 00007ff5a0a09700 R14: 0000000000000d68 R15: 0000000000000d68
[  471.539000] ---[ end trace 0aea06ceef9e275e ]---

Have you already had the opportunity to try kernel 5.3.7-301.fc31 without your workaround?
I've found this commit that worked on that part of the code: 94bc1e522b32c866d85b5af0ede55026b585ae73
maybe may be relevant for you as well.

Comment 2 Richard W.M. Jones 2019-11-05 18:15:32 UTC
It still happens on this same hardware with every kernel I've tried since around 2016.
This machine is using the Rawhide kernel.  I don't know if there's something
particular about 5.3.7-301.fc31, but there's is nothing for the latest Rawhide
(5.4.0-0.rc6.git0.1.fc32.x86_64).  In case I missed something I will attach the
complete log.

Comment 3 Richard W.M. Jones 2019-11-05 18:16:13 UTC
Created attachment 1633038 [details]
dmesg


Note You need to log in before you can comment on or make changes to this bug.