Bug 2053729 - RPi4 wired network fails - eth0 (bcmgenet): transmit queue 1 timed out, due to gcc12 inline assembly miscompilation
Summary: RPi4 wired network fails - eth0 (bcmgenet): transmit queue 1 timed out, due t...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 36
Hardware: aarch64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Peter Robinson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2065678 (view as bug list)
Depends On:
Blocks: ARMTracker
TreeView+ depends on / blocked
 
Reported: 2022-02-11 21:05 UTC by Paul Whalen
Modified: 2022-04-19 22:04 UTC (History)
28 users (show)

Fixed In Version: kernel-5.17.2-300.fc36 kernel-5.17.3-302.fc36
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-19 22:04:23 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Paul Whalen 2022-02-11 21:05:14 UTC
1. Please describe the problem:

RPi4 wired network fails to come up on boot.

2. What is the Version-Release number of the kernel:

kernel-5.17.0-0.rc3.89.fc36

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Working in 5.17.0-0.rc0.20220112gitdaadb3bd0e8d.63.fc36.aarch64

Fails in kernel-5.17.0-0.rc2.83.fc36


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Boot 5.17rc2+ on the RPi4


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Yes

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

[   41.111495] ------------[ cut here ]------------
[   41.115055] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 1 timed out
[   41.118681] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x234/0x240
[   41.122257] Modules linked in: nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep bcm2835_v4l2(C) bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev cpufreq_dt snd_soc_hdmi_codec snd_bcm2835(C) mc hci_uart vfat btsdio btqca fat btrtl btbcm btintel brcmfmac brcmutil joydev bluetooth cfg80211 ecdh_generic raspberrypi_cpufreq rfkill vchiq(C) vc4 bcm2711_thermal iproc_rng200 snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine leds_gpio zram mmc_block dwc2 gpio_raspberrypi_exp raspberrypi_hwmon crct10dif_ce broadcom bcm_phy_lib udc_core bcm2835_wdt clk_bcm2711_dvp i2c_bcm2835 pwm_bcm2835 genet bcm2835_dma snd_pcm sdhci_iproc sdhci_pltfm mdio_bcm_unimac snd_timer sdhci pcie_brcmstb phy_generic snd soundcore drm_cma_helper sunrpc
[   41.122594]  be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_msghandler fuse aes_neon_bs
[   41.147038] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C       --------- ---  5.17.0-0.rc3.89.fc36.aarch64 #1
[   41.151465] Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2022.04-rc1 04/01/2022
[   41.155900] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   41.160331] pc : dev_watchdog+0x234/0x240
[   41.164743] lr : dev_watchdog+0x234/0x240
[   41.169123] sp : ffffd2399c3b39e0
[   41.173465] x29: ffffd2399c3b39e0 x28: ffffd2399c3b7000 x27: ffffd2399c3b3ac0
[   41.177852] x26: ffffd2399be30000 x25: 0000000000000000 x24: ffffd2399c3bec08
[   41.182204] x23: 0000000000000100 x22: ffffd2399c3b7000 x21: ffff205a8eec0000
[   41.186551] x20: 0000000000000001 x19: ffff205a8eec04c8 x18: ffffffffffffffff
[   41.190887] x17: ffff4e219f8de000 x16: 0000000000000001 x15: 0000000000000006
[   41.195231] x14: 0000000000000000 x13: ffffd2399b0500f8 x12: ffffd2399c4ad5f0
[   41.199557] x11: 00000000ffffffff x10: ffffd2399c4ad5f0 x9 : ffffd23999fe7bb4
[   41.203865] x8 : 00000000ffffdfff x7 : ffffd2399c4ad5f0 x6 : 0000000000000001
[   41.208151] x5 : ffff205b3b711408 x4 : 0000000000000000 x3 : 0000000000000027
[   41.212418] x2 : 0000000000000023 x1 : 00000000ffffffd8 x0 : 000000000000003c
[   41.216680] Call trace:
[   41.220887]  dev_watchdog+0x234/0x240
[   41.225084]  call_timer_fn+0x3c/0x15c
[   41.229281]  __run_timers.part.0+0x288/0x310
[   41.233489]  run_timer_softirq+0x48/0x80
[   41.237677]  __do_softirq+0x128/0x360
[   41.241840]  __irq_exit_rcu+0x138/0x140
[   41.246009]  irq_exit_rcu+0x1c/0x30
[   41.250172]  el1_interrupt+0x38/0x54
[   41.254307]  el1h_64_irq_handler+0x18/0x24
[   41.258424]  el1h_64_irq+0x7c/0x80
[   41.262517]  arch_cpu_idle+0x18/0x2c
[   41.266595]  default_idle_call+0x4c/0x140
[   41.270654]  cpuidle_idle_call+0x14c/0x1a0
[   41.274688]  do_idle+0xb0/0x100
[   41.278685]  cpu_startup_entry+0x34/0x8c
[   41.282671]  rest_init+0xd0/0xe0
[   41.286637]  arch_call_rest_init+0x1c/0x28
[   41.290599]  start_kernel+0x480/0x49c
[   41.294529]  __primary_switched+0xc0/0xc8
[   41.298448] ---[ end trace 0000000000000000 ]---

Comment 2 Peter Robinson 2022-02-21 14:47:22 UTC
The 5.17-rc4 crash on the rpi400

[173353.733114] bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[173353.741855] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
dm[173356.452622] ------------[ cut here ]------------
[173356.457442] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 4 timed out
[173356.464418] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x234/0x240
[173356.472915] Modules linked in: tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi btsdio mac80211 bluetooth cpufreq_dt libarc4 ecdh_generic brcmfmac brcmutil cfg80211 raspberrypi_cpufreq rfkill broadcom snd_soc_hdmi_codec bcm_phy_lib bcm2711_thermal genet iproc_rng200 mdio_bcm_unimac leds_gpio nvmem_rmem vfat fat fuse zram mmc_block vc4 snd_soc_core crct10dif_ce snd_compress gpio_raspberrypi_exp raspberrypi_hwmon dwc2 clk_bcm2711_dvp ac97_bus snd_pcm_dmaengine bcm2835_wdt snd_pcm udc_core snd_timer bcm2835_dma sdhci_iproc snd sdhci_pltfm pcie_brcmstb sdhci phy_generic soundcore drm_cma_helper i2c_dev aes_neon_bs
[173356.546454] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.17.0-0.rc4.96.pbr1.fc36.aarch64 #1
[173356.554931] Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2022.04-rc1 04/01/2022
[173356.563843] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[173356.570997] pc : dev_watchdog+0x234/0x240
[173356.575157] lr : dev_watchdog+0x234/0x240
[173356.579314] sp : ffff8000080b3a40
[173356.582758] x29: ffff8000080b3a40 x28: ffffc6e8711b7000 x27: ffff8000080b3b20
[173356.590096] x26: ffffc6e870c30000 x25: 0000000000000000 x24: ffffc6e8711bec08
[173356.597432] x23: 0000000000000100 x22: ffffc6e8711b7000 x21: ffff38a4c2250000
[173356.604767] x20: 0000000000000004 x19: ffff38a4c22504c8 x18: ffffffffffffffff
[173356.612102] x17: ffff71bd0ab21000 x16: ffff80000801c000 x15: 0000000000000006
[173356.619436] x14: 0000000000000000 x13: 205d323434373534 x12: ffffc6e8712ad5f0
[173356.626769] x11: 712074696d736e61 x10: ffffc6e8712ad5f0 x9 : ffffc6e86edfc78c
[173356.634104] x8 : 00000000ffffdfff x7 : 000000000000003f x6 : 0000000000000000
[173356.641438] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000001000
[173356.648773] x2 : 0000000000001000 x1 : 0000000000000005 x0 : 000000000000003c
[173356.656107] Call trace:
[173356.658685]  dev_watchdog+0x234/0x240
[173356.662493]  call_timer_fn+0x3c/0x15c
[173356.666304]  __run_timers.part.0+0x288/0x310
[173356.670728]  run_timer_softirq+0x48/0x80
[173356.674799]  __do_softirq+0x128/0x360
e[173356.678601]  __irq_exit_rcu+0x138/0x140
[173356.682661]  irq_exit_rcu+0x1c/0x30
[173356.686291]  el1_interrupt+0x38/0x54
[173356.690007]  el1h_64_irq_handler+0x18/0x24
[173356.694251]  el1h_64_irq+0x7c/0x80
[173356.697787]  arch_cpu_idle+0x18/0x2c
[173356.701502]  default_idle_call+0x4c/0x140
[173356.705657]  cpuidle_idle_call+0x14c/0x1a0
[173356.709905]  do_idle+0xb0/0x100
[173356.713182]  cpu_startup_entry+0x34/0x8c
[173356.717252]  secondary_start_kernel+0xe4/0x110
[173356.721852]  __secondary_switched+0x94/0x98
[173356.726188] ---[ end trace 0000000000000000 ]---

Comment 3 Peter Robinson 2022-02-21 17:25:20 UTC
So the following upstream commit is what regressed it, with a revert it works again. Digging depper.

commit 9deb48b53e7f4056c2eaa2dc2ee3338df619e4f6
Author: Sergey Shtylyov <s.shtylyov>
Date:   Thu Jan 13 22:46:07 2022 +0300

    bcmgenet: add WOL IRQ check
    
    The driver neglects to check the result of platform_get_irq_optional()'s
    call and blithely passes the negative error codes to devm_request_irq()
    (which takes *unsigned* IRQ #), causing it to fail with -EINVAL.
    Stop calling devm_request_irq() with the invalid IRQ #s.
    
    Fixes: 8562056f267d ("net: bcmgenet: request Wake-on-LAN interrupt")
    Signed-off-by: Sergey Shtylyov <s.shtylyov>
    Acked-by: Florian Fainelli <f.fainelli>
    Signed-off-by: David S. Miller <davem>

Comment 4 Peter Robinson 2022-02-22 10:04:44 UTC
Fix posted upstream for review:
https://lore.kernel.org/netdev/20220222095348.2926536-1-pbrobinson@gmail.com/T/#u

Comment 6 Jeremy Linton 2022-03-02 04:00:45 UTC
Hmm the real question here is does, the power/wakeup sysfs fields show up? Because this device should not be wakeable if the Wol interrupt isn't set.

Thats fundamentally the bug I think. Your patch shouldn't do anything if wakeup.capable isn't somehow set.

Comment 7 Sally 2022-03-19 02:36:12 UTC
*** Bug 2065678 has been marked as a duplicate of this bug. ***

Comment 8 Peter Robinson 2022-03-27 10:01:48 UTC
So the final fix for this has landed upstream in the 5.18 merge window, it's tagged as a fix so should hopefully land in 5.17.x RSN

commit 8d3ea3d402db94b61075617e71b67459a714a502
Author: Jeremy Linton <jeremy.linton>
Date:   Wed Mar 9 22:53:58 2022 -0600

    net: bcmgenet: Use stronger register read/writes to assure ordering
    
    GCC12 appears to be much smarter about its dependency tracking and is
    aware that the relaxed variants are just normal loads and stores and
    this is causing problems like:

Comment 9 Adam Williamson 2022-04-01 21:42:20 UTC
This should be a blocker, no? We would usually block for busted networking on a supported platform. It's a conditional violation of every criterion that needs networking (browser, updates...)

Comment 10 Paul Whalen 2022-04-04 16:19:37 UTC
(In reply to Adam Williamson from comment #9)
> This should be a blocker, no? We would usually block for busted networking
> on a supported platform. It's a conditional violation of every criterion
> that needs networking (browser, updates...)

The RPi4 isn't supported. It is expected to be fixed in 5.17.2 this week.

Comment 11 František Zatloukal 2022-04-04 19:01:54 UTC
Discussed during the 2022-04-04 blocker review meeting: [1]

The decision to classify this bug as an AcceptedFreezeException was made:

"We are delaying the classification of this bug as a blocker but accepting it as an FE. The fix in the kernel should land later in the week, which should make this a moot point."

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2022-04-04/f36-blocker-review.2022-04-04-16.00.log.html

Comment 12 Jeremy Linton 2022-04-06 16:57:48 UTC
There is a workaround for this specific problem merged to the kernel, but deeper investigation resulted in discovering that gcc12 is failing to inline/call volatile assembly routines. This is a very serious bug, and could be causing other problems (including the seattle boot failure which only appears with gcc12) as well.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160

Comment 13 Fedora Update System 2022-04-08 17:48:18 UTC
FEDORA-2022-af492757d9 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-af492757d9

Comment 14 Paul Whalen 2022-04-08 17:55:36 UTC
Working with 5.17.2-300.fc36.aarch64.

Comment 15 Geoffrey Marr 2022-04-08 18:08:08 UTC
Tested and verified working with kernel-5.17.2-300.fc36.aarch64.

Comment 16 Fedora Update System 2022-04-08 18:57:46 UTC
FEDORA-2022-af492757d9 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-af492757d9`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-af492757d9

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Sally 2022-04-08 20:08:28 UTC
It's working with 5.17.2-300.fc36.aarch64, thank you.

Comment 18 Fedora Update System 2022-04-11 03:33:46 UTC
FEDORA-2022-af492757d9 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 19 Jeremy Linton 2022-04-11 15:22:49 UTC
Given that I'm about to post a revert for the patch that fixes this,  i'm not sure this should be closed.

The correct fix is here: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=aabb9a261ef060cf24fd626713f1d7d9df81aa57

and appears to be fixing the seattle boot defect as well.

Comment 20 Justin M. Forbes 2022-04-11 16:18:36 UTC
Happy to keep it open, but the patch that was pushed here at least seems to at least make things usable until a proper gcc fix is added and pushed out to build new kernels with right?

Comment 21 Jeremy Linton 2022-04-11 17:57:06 UTC
Yes, we just need to assure that this patch remains in place until a gcc that can build it lands in fedora/build infra. Which i'm not sure of given that a kernel reversion could happen fairly fast depending on whether it gets picked up for a 5.18rc

Comment 22 Justin M. Forbes 2022-04-11 19:04:07 UTC
Given that fedora-5.17 is a separate branch, I just need to pay attention to stable to make sure the revert doesn't get pulled in before the compiler is ready. If it does, I can handle the tree. Rawhide may still have an issue, but it should be brief.

Comment 23 Justin M. Forbes 2022-04-18 00:38:39 UTC
The 5.17.3-302 build is a build of 5.17.3 with this patch backed out, but built against the new gcc 12 build. If someone could please verify that it works before I file an update, I would appreciate it.

Comment 24 Paul Whalen 2022-04-19 13:52:19 UTC
(In reply to Justin M. Forbes from comment #23)
> The 5.17.3-302 build is a build of 5.17.3 with this patch backed out, but
> built against the new gcc 12 build. If someone could please verify that it
> works before I file an update, I would appreciate it.

Tested 5.17.3-302.fc36.aarch64 on the RPi4-4GB, ethernet working as expected.

Comment 25 Fedora Update System 2022-04-19 14:09:38 UTC
FEDORA-2022-c38066128d has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-c38066128d

Comment 26 Fedora Update System 2022-04-19 17:28:07 UTC
FEDORA-2022-c38066128d has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-c38066128d`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-c38066128d

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 27 Fedora Update System 2022-04-19 22:04:23 UTC
FEDORA-2022-c38066128d has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.