Bug 2053729
Summary: | RPi4 wired network fails - eth0 (bcmgenet): transmit queue 1 timed out, due to gcc12 inline assembly miscompilation | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Paul Whalen <pwhalen> |
Component: | kernel | Assignee: | Peter Robinson <pbrobinson> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 36 | CC: | acaringi, adscvr, airlied, alciregi, awilliam, bskeggs, fzatlouk, gmarr, hdegoede, jarodwilson, jeremy, jeremy.linton, jforbes, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, mmuehlfe, pbrobinson, ptalbert, robatino, sallyahaj, steved, vtq-gnome |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | aarch64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-5.17.2-300.fc36 kernel-5.17.3-302.fc36 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-04-19 22:04:23 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 245418 |
Description
Paul Whalen
2022-02-11 21:05:14 UTC
The 5.17-rc4 crash on the rpi400 [173353.733114] bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [173353.741855] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready dm[173356.452622] ------------[ cut here ]------------ [173356.457442] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 4 timed out [173356.464418] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x234/0x240 [173356.472915] Modules linked in: tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi btsdio mac80211 bluetooth cpufreq_dt libarc4 ecdh_generic brcmfmac brcmutil cfg80211 raspberrypi_cpufreq rfkill broadcom snd_soc_hdmi_codec bcm_phy_lib bcm2711_thermal genet iproc_rng200 mdio_bcm_unimac leds_gpio nvmem_rmem vfat fat fuse zram mmc_block vc4 snd_soc_core crct10dif_ce snd_compress gpio_raspberrypi_exp raspberrypi_hwmon dwc2 clk_bcm2711_dvp ac97_bus snd_pcm_dmaengine bcm2835_wdt snd_pcm udc_core snd_timer bcm2835_dma sdhci_iproc snd sdhci_pltfm pcie_brcmstb sdhci phy_generic soundcore drm_cma_helper i2c_dev aes_neon_bs [173356.546454] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.17.0-0.rc4.96.pbr1.fc36.aarch64 #1 [173356.554931] Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2022.04-rc1 04/01/2022 [173356.563843] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [173356.570997] pc : dev_watchdog+0x234/0x240 [173356.575157] lr : dev_watchdog+0x234/0x240 [173356.579314] sp : ffff8000080b3a40 [173356.582758] x29: ffff8000080b3a40 x28: ffffc6e8711b7000 x27: ffff8000080b3b20 [173356.590096] x26: ffffc6e870c30000 x25: 0000000000000000 x24: ffffc6e8711bec08 [173356.597432] x23: 0000000000000100 x22: ffffc6e8711b7000 x21: ffff38a4c2250000 [173356.604767] x20: 0000000000000004 x19: ffff38a4c22504c8 x18: ffffffffffffffff [173356.612102] x17: ffff71bd0ab21000 x16: ffff80000801c000 x15: 0000000000000006 [173356.619436] x14: 0000000000000000 x13: 205d323434373534 x12: ffffc6e8712ad5f0 [173356.626769] x11: 712074696d736e61 x10: ffffc6e8712ad5f0 x9 : ffffc6e86edfc78c [173356.634104] x8 : 00000000ffffdfff x7 : 000000000000003f x6 : 0000000000000000 [173356.641438] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000001000 [173356.648773] x2 : 0000000000001000 x1 : 0000000000000005 x0 : 000000000000003c [173356.656107] Call trace: [173356.658685] dev_watchdog+0x234/0x240 [173356.662493] call_timer_fn+0x3c/0x15c [173356.666304] __run_timers.part.0+0x288/0x310 [173356.670728] run_timer_softirq+0x48/0x80 [173356.674799] __do_softirq+0x128/0x360 e[173356.678601] __irq_exit_rcu+0x138/0x140 [173356.682661] irq_exit_rcu+0x1c/0x30 [173356.686291] el1_interrupt+0x38/0x54 [173356.690007] el1h_64_irq_handler+0x18/0x24 [173356.694251] el1h_64_irq+0x7c/0x80 [173356.697787] arch_cpu_idle+0x18/0x2c [173356.701502] default_idle_call+0x4c/0x140 [173356.705657] cpuidle_idle_call+0x14c/0x1a0 [173356.709905] do_idle+0xb0/0x100 [173356.713182] cpu_startup_entry+0x34/0x8c [173356.717252] secondary_start_kernel+0xe4/0x110 [173356.721852] __secondary_switched+0x94/0x98 [173356.726188] ---[ end trace 0000000000000000 ]--- So the following upstream commit is what regressed it, with a revert it works again. Digging depper. commit 9deb48b53e7f4056c2eaa2dc2ee3338df619e4f6 Author: Sergey Shtylyov <s.shtylyov> Date: Thu Jan 13 22:46:07 2022 +0300 bcmgenet: add WOL IRQ check The driver neglects to check the result of platform_get_irq_optional()'s call and blithely passes the negative error codes to devm_request_irq() (which takes *unsigned* IRQ #), causing it to fail with -EINVAL. Stop calling devm_request_irq() with the invalid IRQ #s. Fixes: 8562056f267d ("net: bcmgenet: request Wake-on-LAN interrupt") Signed-off-by: Sergey Shtylyov <s.shtylyov> Acked-by: Florian Fainelli <f.fainelli> Signed-off-by: David S. Miller <davem> Fix posted upstream for review: https://lore.kernel.org/netdev/20220222095348.2926536-1-pbrobinson@gmail.com/T/#u Hmm the real question here is does, the power/wakeup sysfs fields show up? Because this device should not be wakeable if the Wol interrupt isn't set. Thats fundamentally the bug I think. Your patch shouldn't do anything if wakeup.capable isn't somehow set. *** Bug 2065678 has been marked as a duplicate of this bug. *** So the final fix for this has landed upstream in the 5.18 merge window, it's tagged as a fix so should hopefully land in 5.17.x RSN commit 8d3ea3d402db94b61075617e71b67459a714a502 Author: Jeremy Linton <jeremy.linton> Date: Wed Mar 9 22:53:58 2022 -0600 net: bcmgenet: Use stronger register read/writes to assure ordering GCC12 appears to be much smarter about its dependency tracking and is aware that the relaxed variants are just normal loads and stores and this is causing problems like: This should be a blocker, no? We would usually block for busted networking on a supported platform. It's a conditional violation of every criterion that needs networking (browser, updates...) (In reply to Adam Williamson from comment #9) > This should be a blocker, no? We would usually block for busted networking > on a supported platform. It's a conditional violation of every criterion > that needs networking (browser, updates...) The RPi4 isn't supported. It is expected to be fixed in 5.17.2 this week. Discussed during the 2022-04-04 blocker review meeting: [1] The decision to classify this bug as an AcceptedFreezeException was made: "We are delaying the classification of this bug as a blocker but accepting it as an FE. The fix in the kernel should land later in the week, which should make this a moot point." [1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2022-04-04/f36-blocker-review.2022-04-04-16.00.log.html There is a workaround for this specific problem merged to the kernel, but deeper investigation resulted in discovering that gcc12 is failing to inline/call volatile assembly routines. This is a very serious bug, and could be causing other problems (including the seattle boot failure which only appears with gcc12) as well. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160 FEDORA-2022-af492757d9 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-af492757d9 Working with 5.17.2-300.fc36.aarch64. Tested and verified working with kernel-5.17.2-300.fc36.aarch64. FEDORA-2022-af492757d9 has been pushed to the Fedora 36 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-af492757d9` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-af492757d9 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. It's working with 5.17.2-300.fc36.aarch64, thank you. FEDORA-2022-af492757d9 has been pushed to the Fedora 36 stable repository. If problem still persists, please make note of it in this bug report. Given that I'm about to post a revert for the patch that fixes this, i'm not sure this should be closed. The correct fix is here: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aabb9a261ef060cf24fd626713f1d7d9df81aa57 and appears to be fixing the seattle boot defect as well. Happy to keep it open, but the patch that was pushed here at least seems to at least make things usable until a proper gcc fix is added and pushed out to build new kernels with right? Yes, we just need to assure that this patch remains in place until a gcc that can build it lands in fedora/build infra. Which i'm not sure of given that a kernel reversion could happen fairly fast depending on whether it gets picked up for a 5.18rc Given that fedora-5.17 is a separate branch, I just need to pay attention to stable to make sure the revert doesn't get pulled in before the compiler is ready. If it does, I can handle the tree. Rawhide may still have an issue, but it should be brief. The 5.17.3-302 build is a build of 5.17.3 with this patch backed out, but built against the new gcc 12 build. If someone could please verify that it works before I file an update, I would appreciate it. (In reply to Justin M. Forbes from comment #23) > The 5.17.3-302 build is a build of 5.17.3 with this patch backed out, but > built against the new gcc 12 build. If someone could please verify that it > works before I file an update, I would appreciate it. Tested 5.17.3-302.fc36.aarch64 on the RPi4-4GB, ethernet working as expected. FEDORA-2022-c38066128d has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-c38066128d FEDORA-2022-c38066128d has been pushed to the Fedora 36 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-c38066128d` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-c38066128d See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2022-c38066128d has been pushed to the Fedora 36 stable repository. If problem still persists, please make note of it in this bug report. |