wpa_supplicant 2.11 (Fedora package wpa_supplicant-2.11-7.fc43) intermittently fails to reassociate to an 802.11be Multi-Link Operation (MLO) access point due to a state race between wpa_supplicant and mac80211. When the race fires, wpa_supplicant emits three back-to-back kernel errors — "nl80211: kernel reports: link ID must for MLO group key" followed by "set_key failed; err=-22 Invalid argument" — for group key slots 0, 1, and 4 (GTK, rotating GTK, IGTK). The association fails, mac80211 link-timeout fires after ~17 seconds, and NetworkManager retries every ~90 seconds. Each retry takes the same code path and hits the same race, resulting in a permanent disconnect loop for the user until they intervene. Reliability: the race reliably fires after a fresh boot when NetworkManager auto-connects to the MLO SSID (one morning session on the affected box produced 42 errors across 14 retry cycles). In controlled nmcli cycling it is intermittent — roughly 1 in 5 attempts. See Steps to Reproduce for two reproducer variants (reboot-based for reliability, nmcli-based for speed). Root cause: wpa_clear_keys() in wpa_supplicant/wpa_supplicant.c hardcodes link_id=-1 when issuing DEL_KEY during pre-authentication cleanup in sme_send_authentication(). During reassociation, the function is called in a window where wpa_s->valid_links has already been pre-cleared by wpas_reset_mlo_info(), but the kernel's wdev->valid_links is still set from the previous MLO connection. The cfg80211 check nl80211_validate_key_link_id() (commit e7a7b84e3317 in v6.1, stable since 2022) rejects the mismatch with -EINVAL. The race window is closed/open depending on how mac80211 processes the deauth event relative to the new auth, which is why it depends on wpa_supplicant's in-memory state, NM's connection-switch path, and the driver's event ordering. The bug is fixed in hostap.git master (v2.12-devel, ~1750 commits past the 2.11 tag), verified at runtime by building master from source, installing via systemd override, and running the reproducer 9 times with 0 errors (compared to reproducible intermittent failure on the same test sequence with stock 2.11). The wpa_clear_keys() function body itself is unchanged between hostap_2_11 and master — the fix is somewhere in adjacent state-coordination code that I did not bisect to pin down. The bug is hardware-agnostic. I observed it on a Realtek RTL8922AU Wi-Fi 7 adapter, but the error string has been reported on Intel BE200 (Unix StackExchange #791474) and MediaTek mt7925 (lore.kernel.org/linux-wireless discussions by Zac Bowling, January 2026). It lives in generic wpa_supplicant + cfg80211 state coordination, not in any vendor driver. Affected package is identical across all currently-active Fedora branches (F42: 2.11-6.fc42, F43: 2.11-7.fc43, F44: 2.11-9.fc44, Rawhide: 2.11-9.fc44). No version bump is scheduled in any branch. A backport or version bump to any branch benefits all of them. This is the third wpa_supplicant 2.11 nl80211 key-validation regression reported on Fedora (see "See Also" for the other two), which collectively suggest the 2.11 package is due for an update. Workaround: use a non-MLO (WPA2-PSK or SAE single-link) SSID on the same router. Most Wi-Fi 7 APs broadcast a parallel legacy SSID alongside their MLO SSID. The legacy SSID uses a cleanup path that does not hit the race, and still provides full 802.11be PHY at 160 MHz / ~1441 Mbps — only multi-link bonding is lost. Intermittent — it is a state race. Reliably happens after a fresh boot when NetworkManager auto-connects to the MLO SSID from its saved profile (42 errors across 14 retry cycles observed in one morning session on an affected box). Under manual nmcli cycling, it happens roughly 1 in 5 attempts. Once it does trigger, it cascades — NetworkManager retries every ~90 seconds, and each retry hits it again, producing a permanent disconnect loop from the user's perspective. Reproducible: Sometimes Steps to Reproduce: Prerequisites: (a) an 802.11be Wi-Fi 7 AP with MLO enabled on at least two links (b) an MLO-capable client adapter on a Fedora 43 box (c) a NetworkManager connection profile for the MLO SSID using WPA3-Personal / SAE, named "<MLO-SSID>" below (d) a second NetworkManager profile for a non-MLO SSID on the same router (legacy WPA2 or separate 5 GHz / 6 GHz SSID), named "<LegacySSID>" below. Most Wi-Fi 7 routers broadcast a parallel legacy SSID alongside the MLO one. There are two ways to reproduce. Reproducer A (reboot-based) is 100% reliable but disruptive. Reproducer B (nmcli loop) is 15-25% per attempt but can be run repeatedly in a few minutes. --- Reproducer A (reliable — after fresh boot) --- 1. Ensure the NetworkManager profile "<MLO-SSID>" has autoconnect=yes and has been successfully connected at least once in the past (which populates wpa_supplicant's PMKSA cache via NM's saved state). 2. Reboot the system. 3. Wait for NetworkManager to auto-connect to "<MLO-SSID>" on boot. 4. Within 30-60 seconds, observe the disconnect loop in the journal: journalctl -b -u wpa_supplicant | grep -c "link ID must for MLO group key" # Expected on buggy 2.11: many errors (morning session on our box # produced 42 across 14 NM retry cycles) # Expected after a fix: 0 The reason this is reliable: after a fresh boot, wpa_supplicant starts with no in-memory state, and NetworkManager's first connection attempt goes through a code path that naturally hits the race window. --- Reproducer B (fast, intermittent) --- 1. Enable msgdump debug logging via DBus (non-disruptive, no service restart needed): sudo dbus-send --system --print-reply \ --dest=fi.w1.wpa_supplicant1 /fi/w1/wpa_supplicant1 \ org.freedesktop.DBus.Properties.Set \ string:fi.w1.wpa_supplicant1 string:DebugLevel \ variant:string:msgdump 2. Run this loop, which switches between the MLO and non-MLO SSIDs and counts "link ID must" errors per cycle. Expect 1-2 cycles out of 10 to trigger the failure: for n in $(seq 1 10); do nmcli connection up "<LegacySSID>" sleep 3 T=$(date +%s) nmcli connection up "<MLO-SSID>" sleep 5 ERRS=$(journalctl --since "@$T" -u wpa_supplicant \ | grep -c "link ID must for MLO group key") if [ "$ERRS" -gt 0 ]; then echo "cycle $n: FAIL (errors=$ERRS)" else echo "cycle $n: ok" fi done 3. Restore info debug level: sudo dbus-send --system --print-reply \ --dest=fi.w1.wpa_supplicant1 /fi/w1/wpa_supplicant1 \ org.freedesktop.DBus.Properties.Set \ string:fi.w1.wpa_supplicant1 string:DebugLevel \ variant:string:info Expected result: at least one "FAIL (errors=3)" out of 10 cycles (usually 1-3). On a fixed build, all 10 cycles should report "ok". GOTCHA 1: Do NOT use "rfkill block wifi; rfkill unblock wifi" as a state reset. rfkill wipes the PMKSA cache as a side effect, and the next auth then runs the fresh-SAE path which does NOT trigger the bug. Use clean nmcli commands only. GOTCHA 2: A single "nmcli connection up <MLO-SSID>" followed by "down" then "up" the same profile does NOT reliably reproduce — the clean "down" propagates a deauth event to the kernel, which clears wdev->valid_links before the next auth begins. The switch-through- second-profile pattern in Reproducer B is the minimum reliable nmcli-level reproducer we found. Actual Results: wpa_supplicant emits three back-to-back errors on every reassociation attempt after the initial connection: wpa_supplicant: nl80211: kernel reports: link ID must for MLO group key wpa_supplicant: nl80211: set_key failed; err=-22 Invalid argument (three times, one per group key slot: 0 = GTK, 1 = rotating GTK, 4 = IGTK) The connection never transitions past the 4-way handshake into GROUP_HANDSHAKE. mac80211 reports "link timed out" after ~17 seconds. NetworkManager retries every ~90 seconds. The disconnect loop is effectively permanent until the user manually switches to a non-MLO SSID. Captured msgdump excerpt from the failing sequence (RTL8922AU + kernel 6.17.7-ba29.fc43, BSSIDs anonymized): SME: Trying to authenticate with XX:XX:XX:XX:XX:XX (SSID='<MLO-SSID>' freq=6295 MHz) EAPOL: External notification - portValid=0 EAPOL: SUPP_PAE entering state DISCONNECTED EAPOL: Supplicant port status: Unauthorized EAPOL: SUPP_BE entering state INITIALIZE EAPOL: SUPP_PAE entering state CONNECTING wpa_driver_nl80211_set_key: ifindex=3 alg=0 addr=(nil) key_idx=0 key_flag=0x10 link_id=-1 nl80211: DEL_KEY broadcast key nl80211: kernel reports: link ID must for MLO group key nl80211: set_key failed; err=-22 Invalid argument wpa_driver_nl80211_set_key: ifindex=3 alg=0 addr=(nil) key_idx=1 key_flag=0x10 link_id=-1 nl80211: DEL_KEY broadcast key nl80211: kernel reports: link ID must for MLO group key nl80211: set_key failed; err=-22 Invalid argument wpa_driver_nl80211_set_key: ifindex=3 alg=0 addr=(nil) key_idx=4 key_flag=0x10 link_id=-1 nl80211: DEL_KEY broadcast key nl80211: kernel reports: link ID must for MLO group key nl80211: set_key failed; err=-22 Invalid argument State: COMPLETED -> AUTHENTICATING The three failing calls are DEL_KEY operations on group key slots 0, 1, and 4, all with link_id=-1 and key_flag=0x10 (KEY_FLAG_GROUP). The failure is deterministic — these are the same three calls every cycle. Full capture (both failing 2.11 run and passing hostap-master run) is available on request. Expected Results: Reassociation should complete cleanly without any "link ID must for MLO group key" errors. The wpa_supplicant state machine should transition through 4WAY_HANDSHAKE -> GROUP_HANDSHAKE -> COMPLETED, followed by CTRL-EVENT-CONNECTED with the AP MLD address in the event payload (ap_mld_addr=XX:XX:XX:XX:XX:XX). NetworkManager should report "Connected" on the first attempt. Data traffic should flow within a second of the connection event. Verified at runtime using hostap.git master (v2.12-devel-hostap_2_11-1750-ge747e30a1) as a drop-in replacement: 9 successive reproductions with the exact trigger sequence above, all 9 connected cleanly on the first attempt with 0 "link ID must" errors. Additional Information: === WHERE THE BUG LIVES === Offending caller: wpa_clear_keys() in wpa_supplicant/wpa_supplicant.c (~line 888 in v2.11), invoked from sme_send_authentication() in wpa_supplicant/sme.c (~line 1828 in v2.11) as pre-authentication cleanup: void wpa_clear_keys(struct wpa_supplicant *wpa_s, const u8 *addr) { int i, max = 6; for (i = 0; i < max; i++) { if (wpa_s->keys_cleared & BIT(i)) continue; wpa_drv_set_key(wpa_s, -1, WPA_ALG_NONE, NULL, i, 0, NULL, 0, NULL, 0, KEY_FLAG_GROUP); /* ^^^ hardcoded link_id = -1 */ } ... } At this moment wpa_s->valid_links has already been cleared by wpas_reset_mlo_info() on the preceding disassoc, but the kernel's wdev->valid_links is still non-zero from the previous MLO connection. The cfg80211 check in net/wireless/nl80211.c::nl80211_validate_key_link_id() rejects the mismatch with -EINVAL. That check was added by commit e7a7b84e3317 in v6.1 (2022) and has been stable ever since. The bug only triggers via the PMKSA-caching path. A fresh SAE handshake takes a different cleanup route and does not hit this bug — which is why users typically hit it starting with the second reconnection to the same AP, not the first one after boot. === UPSTREAM FIX STATUS === Fixed in hostap.git master. The fix is NOT in wpa_clear_keys() itself (that function is unchanged between hostap_2_11 and HEAD) — it is in adjacent state-coordination code. I did not run git bisect to identify the exact commit, but I did narrow the search to these three candidates, any of which may be the fix or part of a fix set: 7a1893fd3aa8805f4733493f20e2acac2d67ab50 Kavita Kavita, 2025-05-09 "MLD: Handle link reconfiguration updates from the driver" - Most plausible main fix candidate; adds NL80211_CMD_ASSOC_MLO_RECONF handling and installs group keys via the per-link path. c7139cc28a07803cf1ce97134b16fbbdbdf744cb Kavita Kavita, 2025-05-09 "MLD: Clear group keys for removed links" - Companion to the above; handles NL80211_CMD_LINKS_REMOVED cleanup. b807ddd8ec7c66c0e8c15eb8060b06bc74515a80 Benjamin Berg (Intel), 2025-06-18 "BSS: Set valid_links for all links and return usable links" - Makes valid_links parsing more consistent in wpa_bss_update(). A git-bisect run against the reproducer in "Steps to Reproduce" would pin the exact commit set in approximately one hour of test cycles. Relevant upstream context — Benjamin Berg in his January 2026 hostap series "Fix address confusion after MLO re-authentication" ( https://lists.infradead.org/pipermail/hostap/2026-January/044376.html ) explicitly noted as a known gap: "we will also need to do something similar when the client switches between using MLO and not using it" — which is exactly the scenario in this bug report (switching from a non-MLO fallback SSID to an MLO SSID on the same router). === ALL ACTIVE FEDORA BRANCHES SHIP THE SAME AFFECTED 2.11 === Fedora 42: wpa_supplicant-2.11-6.fc42 Fedora 43: wpa_supplicant-2.11-7.fc43 <-- verified reproducible (this report) Fedora 44: wpa_supplicant-2.11-9.fc44 Rawhide: wpa_supplicant-2.11-9.fc44 No version bump is present in any branch. A backport to 2.11 or a version bump in Rawhide would benefit all currently-active versions at once. === RELATED EXISTING FEDORA BUGS (NOT DUPLICATES) === Collectively these suggest a pattern of wpa_supplicant 2.11 nl80211 key-validation regressions: Bug 2328551 (2024-11-24): "wpa_supplicant 2.11 breaks Wi-Fi hotspot with 'key setting validation failed'" - Different error string, hotspot AP mode (not STA MLO), but same root area. - Reported against F40/F41, still NEW in F43. Bug 2437182 (2026-02-05): "Keep getting 'key not allowed' error in syslogs" - Different error string, Intel AX210 (Wi-Fi 6E, not MLO-capable), no clear trigger. - Same package version 2.11-7.fc43. This bug (the third in the same family) brings visibility to the broader pattern. === REQUESTED ACTION === Either of the following would resolve this, maintainer's choice: Option A (preferred): backport the specific fix commit(s) to the 2.11 package. The candidate commits above are a starting point; a git bisect against the reproducer would pin the exact set. Option B: schedule a version bump when hostap has a new tagged release. There has been no upstream hostap release since 2.11 in February 2024; master has ~1750 unreleased commits. If a version bump is preferred, the non-MLO fallback SSID workaround can be documented as a known issue for affected users in the meantime. I am happy to help test a backport if one is produced. I have the failing version, the verified-fixed master build on the same hardware, and a clean reproduction procedure. === RACE-CONDITION NATURE OF THE BUG === This is a state race between wpa_supplicant and the kernel/mac80211, not a deterministic failure. Specifically: wpa_supplicant's wpa_s->valid_links is cleared by wpas_reset_mlo_info() during disassoc processing, but the kernel's wdev->valid_links is cleared only when mac80211 finishes processing its own deauth event. The two can be out of sync for a brief window during reassociation. If sme_send_authentication() calls wpa_clear_keys() in that window, the DEL_KEY calls pass link_id=-1 while the kernel still thinks the wdev is MLO, and cfg80211 rejects them. The exact timing of this race depends on wpa_supplicant's in-memory state (fresh-from-boot vs long-running), NetworkManager's connection switch implementation, rtw89/driver event ordering, and the AP's MLO beacon/link advertisement. After a fresh boot, the first auto-connect to the MLO SSID reliably hits the race because everything is cold-path. In controlled nmcli cycling, the hit rate is about 15-25% per attempt. Once the bug is triggered, NetworkManager retries every ~90 seconds. Each retry takes the same code path and hits the same race, producing a permanent disconnect loop for the user until they intervene (e.g., switch to a non-MLO SSID or reboot into a state that happens not to hit the race on first try). On the affected box, a single morning session between 11:56 and 12:20 local time produced 42 "link ID must for MLO group key" errors across 14 NetworkManager retry cycles (essentially 3 per retry). === RELATED KERNEL-SIDE COSMETIC FIX (UNRELATED BUT SAME STRING) === The in-source kernel error string "link ID must for MLO group key" has a grammar bug (missing "be set"). Patch submitted to linux-wireless on 2026-04-14, message-ID <20260414122728.92234-1-loukot>, targeting wireless-next via Johannes Berg. This is a pure string change with no functional impact and is unrelated to the wpa_supplicant bug in this report — it just makes the error string grep-friendly for future users hitting similar issues. See also: https://bugzilla.redhat.com/show_bug.cgi?id=2328551 https://bugzilla.redhat.com/show_bug.cgi?id=2437182