Bug 1412426
Summary: | Ethernet driver e1000e gets confused on ThinkPad T450s on suspend/resume | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bojan Smojver <bojan> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 25 | CC: | cz172638, gansalmon, ichavero, itamar, jonathan, kernel-maint, labbott, madhu.chinakonda, mchehab |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-02-04 06:54:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Bojan Smojver
2017-01-12 02:00:08 UTC
This only happens with 4.9.2, but not with 4.8.16: Jan 11 19:18:31 <machine> kernel: e1000e 0000:00:19.0 enp0s25: 10/100 speed: disabling TSO Ditto 4.9.3. Can you do a bisect to determine which kernel commit broke suspend for you? That's going to be the fastest way to find the issue. (In reply to Laura Abbott from comment #3) > Can you do a bisect to determine which kernel commit broke suspend for you? > That's going to be the fastest way to find the issue. I can give it a try. May take a while. Keep you posted. (In reply to Laura Abbott from comment #3) > Can you do a bisect to determine which kernel commit broke suspend for you? Bisect says: -------------------------- commit e6dce825fba05f447bd22c865e27233182ab3d79 Merge: 9929780 08bf215 Author: Linus Torvalds <torvalds> Date: Mon Oct 3 20:11:49 2016 -0700 Merge tag 'tty-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gre gkh/tty Pull tty and serial updates from Greg KH: "Here is the big tty and serial patch set for 4.9-rc1. It also includes some drivers/dma/ changes, as those were needed by some serial drivers, and they were all acked by the DMA maintainer. Also in here is the long-suffering ACPI SPCR patchset, which was passed around from maintainer to maintainer like a hot-potato. Seems I was the sucker^Wlucky one. All of those patches have been acked by the various subsystem maintainers as well. All of this has been in linux-next with no reported issues" * tag 'tty-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ tty: (111 commits) Revert "serial: pl011: add console matching function" MAINTAINERS: update entry for atmel_serial driver serial: pl011: add console matching function ARM64: ACPI: enable ACPI_SPCR_TABLE ACPI: parse SPCR and enable matching console of/serial: move earlycon early_param handling to serial Revert "drivers/tty: Explicitly pass current to show_stack" tty: amba-pl011: Don't complain on -EPROBE_DEFER when no irq nios2: dts: 10m50: Add tx-threshold parameter serial: 8250: Set Altera 16550 TX FIFO Threshold serial: 8250: of: Load TX FIFO Threshold from DT Documentation: dt: serial: Add TX FIFO threshold parameter drivers/tty: Explicitly pass current to show_stack serial: imx: Fix DCD reading serial: stm32: mark symbols static where possible serial: xuartps: Add some register initialisation to cdns_early_console_setup() serial: xuartps: Removed unwanted checks while reading the error conditions serial: xuartps: Rewrite the interrupt handling logic serial: stm32: use mapbase instead of membase for DMA tty/serial: atmel: fix fractional baud rate computation ... -------------------------- No idea what tty would have with e1000e, but I don't know kernel... Anyhow, just to be sure, I double checked that revision and indeed my ethernet adapter is stuck in 10 Mb/s mode after resume and the network stops working. In better news, I also compiled the tip as of about a couple of hours ago (i.e. just before 4.10-rc4), at c92816275674c1491ce228ee49aa030a5fa1be04 and that seems to have this issue fixed. What fixed it, I don't know yet. That's good to hear the tip is working. Unfortunately I agree that a tty merge commit is weirdly unrelated to your e1000e suspend. It may just be a timing issue. If you can find a single commit from 4.9 to 4.10-rc4 that fixes it for you, we can evaluate if a backport is appropriate. (In reply to Laura Abbott from comment #7) > If you can find a single commit from 4.9 to 4.10-rc4 that > fixes it for you, we can evaluate if a backport is appropriate. I will give it a try by reversing good/bad meaning in bisect, but I don't have a lot of confidence in that process given my last attempt, to be perfectly honest. In the meantime, I'll run F25 4.9.x series on my T450s, just to make sure everything else is working as expected. I can live without suspend for a few months. In the meantime, if I had to guess, I'd say maybe: ----------------- commit 311191297125156319be8f86d546ea1c569f7e95 Author: WANG Cong <xiyou.wangcong> Date: Sat Dec 10 14:22:42 2016 -0800 e1000: use disable_hardirq() for e1000_netpoll() In commit 02cea3958664 ("genirq: Provide disable_hardirq()") Peter introduced disable_hardirq() for netpoll, but it is forgotten to use it for e1000. This patch changes disable_irq() to disable_hardirq() for e1000. Reported-by: Dave Jones <davej.uk> Suggested-by: Sabrina Dubroca <sd> Cc: Peter Zijlstra (Intel) <peterz> Cc: Jeff Kirsher <jeffrey.t.kirsher> Signed-off-by: Cong Wang <xiyou.wangcong> Signed-off-by: David S. Miller <davem> ----------------- Absolutely no idea whether it could even be related to my problems, but this commit is not in stable. The only other one not in stable is: ----------------- commit 91c527a55664ddf4bee26673a35f91748dae4142 Author: Jarod Wilson <jarod> Date: Mon Oct 17 15:54:05 2016 -0400 ethernet/intel: use core min/max MTU checking e100: min_mtu 68, max_mtu 1500 - remove e100_change_mtu entirely, is identical to old eth_change_mtu, and no longer serves a purpose. No need to set min_mtu or max_mtu explicitly, as ether_setup() will already set them to 68 and 1500. e1000: min_mtu 46, max_mtu 16110 e1000e: min_mtu 68, max_mtu varies based on adapter fm10k: min_mtu 68, max_mtu 15342 - remove fm10k_change_mtu entirely, does nothing now i40e: min_mtu 68, max_mtu 9706 i40evf: min_mtu 68, max_mtu 9706 igb: min_mtu 68, max_mtu 9216 - There are two different "max" frame sizes claimed and both checked in the driver, the larger value wasn't relevant though, so I've set max_mtu to the smaller of the two values here to retain identical behavior. igbvf: min_mtu 68, max_mtu 9216 - Same issue as igb duplicated ixgb: min_mtu 68, max_mtu 16114 - Also remove pointless old == new check, as that's done in dev_set_mtu ixgbe: min_mtu 68, max_mtu 9710 ixgbevf: min_mtu 68, max_mtu dependent on hardware/firmware - Some hw can only handle up to max_mtu 1504 on a vf, others 9710 CC: netdev.org CC: intel-wired-lan.org CC: Jeff Kirsher <jeffrey.t.kirsher> Signed-off-by: Jarod Wilson <jarod> Signed-off-by: David S. Miller <davem> ----------------- Neither commit from comment #9 fixes it, so red herring. It must be some other part of the kernel. According to my totally untrustworthy bisect between 4.9 and 4.10-rc4, this is what fixed the problem (note: bad means good here): ---------------------- a5bc01949e3b19d8a23b5eabc6fc71bb50dc820e is the first bad commit commit a5bc01949e3b19d8a23b5eabc6fc71bb50dc820e Author: Johan Hovold <johan> Date: Tue Jan 3 16:39:58 2017 +0100 USB: serial: omninet: fix NULL-derefs at open and disconnect Fix NULL-pointer dereferences at open() and disconnect() should the device lack the expected bulk-out endpoints: Unable to handle kernel NULL pointer dereference at virtual address 000000b4 ... [c0170ff0>] (__lock_acquire) from [<c0172f00>] (lock_acquire+0x108/0x264) [<c0172f00>] (lock_acquire) from [<c06a5090>] (_raw_spin_lock_irqsave+0x58/0x6c) [<c06a5090>] (_raw_spin_lock_irqsave) from [<c0470684>] (tty_port_tty_set+0x28/0xa4) [<c0470684>] (tty_port_tty_set) from [<bf08d384>] (omninet_open+0x30/0x40 [omninet]) [<bf08d384>] (omninet_open [omninet]) from [<bf07c118>] (serial_port_activate+0x68/0x98 [usbserial]) Unable to handle kernel NULL pointer dereference at virtual address 00000234 ... [<bf01f418>] (omninet_disconnect [omninet]) from [<bf0016c0>] (usb_serial_disconnect+0xe4/0x100 [usbserial]) Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable <stable.org> Signed-off-by: Johan Hovold <johan> :040000 040000 4558327df56b19a98b8e2a23fc4928017f069855 b9b4e443b280b510e6d3f890912729752cfdcf57 M drivers ---------------------- I completely don't believe it, but I have no idea what I'm talking about anyway. So, hope it helps, but I won't hold my breath. :-) (In reply to Bojan Smojver from comment #11) > According to my totally untrustworthy bisect between 4.9 and 4.10-rc4, this > is what fixed the problem (note: bad means good here): > ---------------------- > a5bc01949e3b19d8a23b5eabc6fc71bb50dc820e is the first bad commit > commit a5bc01949e3b19d8a23b5eabc6fc71bb50dc820e > Author: Johan Hovold <johan> > Date: Tue Jan 3 16:39:58 2017 +0100 Absolutely no chance of this being the fix, given it's already in 4.9.3. I'm thinking this must be a weirdly intermittent issue on my hardware and my testing efforts after each bisect were just not hitting the issue consistently. So, my claim that this was fixed in the tip are also probably false. Have you tried installing the latest firmware updates from koji? The last fedora firmware updates were September '16 and are missing some drivers now required by the 4.9 kernels. The latest, dated December '16, have being pushed to stable. Download from here: https://koji.fedoraproject.org/koji/buildinfo?buildID=822926 Install then remove and reinstall the latest kernel packages. Not guaranteed to fix this bug but worked for me on a missing amdgpu driver. (In reply to SP from comment #13) > Have you tried installing the latest firmware updates from koji? Yes. Made no difference that I could detect. (In reply to Bojan Smojver from comment #14) > (In reply to SP from comment #13) > > Have you tried installing the latest firmware updates from koji? > > Yes. Made no difference that I could detect. Have you run a dnf update today? I noticed a lot of wireless driver firmware among the updates. (In reply to SP from comment #15) > Have you run a dnf update today? I noticed a lot of wireless driver > firmware among the updates. I did, but this is not a wireless adapter. BTW, the problem is still there in 4.9.4. We'll see whether 4.9.5 is any better. (In reply to Bojan Smojver from comment #16) > We'll see whether 4.9.5 is any better. Nope, same. I'm going to close this, because nobody else appears to be able to replicate it. With 4.9.7, if I disconnect the cable, wait a while and reconnect, the connection comes back at 1 Gb/s. So, maybe some kind of weird interaction between my switch (Asus RT-AC68U) and the ethernet hardware in T450s. |