2343174 – (CVE-2025-21681) CVE-2025-21681 kernel: openvswitch: fix lockup on tx to unregistering netdev with carrier

Bug 2343174 (CVE-2025-21681) - CVE-2025-21681 kernel: openvswitch: fix lockup on tx to unregistering netdev with carrier

Summary: CVE-2025-21681 kernel: openvswitch: fix lockup on tx to unregistering netdev ...

Keywords:
Status:	NEW
Alias:	CVE-2025-21681
Product:	Security Response
Classification:	Other
Component:	vulnerability
Sub Component:
Version:	unspecified
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Product Security DevOps Team
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2025-01-31 12:01 UTC by OSIDB Bzimport
Modified:	2025-01-31 18:37 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Embargoed:

Attachments	(Terms of Use)

Description OSIDB Bzimport 2025-01-31 12:01:56 UTC

In the Linux kernel, the following vulnerability has been resolved:

openvswitch: fix lockup on tx to unregistering netdev with carrier

Commit in a fixes tag attempted to fix the issue in the following
sequence of calls:

    do_output
    -> ovs_vport_send
       -> dev_queue_xmit
          -> __dev_queue_xmit
             -> netdev_core_pick_tx
                -> skb_tx_hash

When device is unregistering, the 'dev->real_num_tx_queues' goes to
zero and the 'while (unlikely(hash >= qcount))' loop inside the
'skb_tx_hash' becomes infinite, locking up the core forever.

But unfortunately, checking just the carrier status is not enough to
fix the issue, because some devices may still be in unregistering
state while reporting carrier status OK.

One example of such device is a net/dummy.  It sets carrier ON
on start, but it doesn't implement .ndo_stop to set the carrier off.
And it makes sense, because dummy doesn't really have a carrier.
Therefore, while this device is unregistering, it's still easy to hit
the infinite loop in the skb_tx_hash() from the OVS datapath.  There
might be other drivers that do the same, but dummy by itself is
important for the OVS ecosystem, because it is frequently used as a
packet sink for tcpdump while debugging OVS deployments.  And when the
issue is hit, the only way to recover is to reboot.

Fix that by also checking if the device is running.  The running
state is handled by the net core during unregistering, so it covers
unregistering case better, and we don't really need to send packets
to devices that are not running anyway.

While only checking the running state might be enough, the carrier
check is preserved.  The running and the carrier states seem disjoined
throughout the code and different drivers.  And other core functions
like __dev_direct_xmit() check both before attempting to transmit
a packet.  So, it seems safer to check both flags in OVS as well.

Comment 1 Pedro Sampaio 2025-01-31 18:26:16 UTC

Upstream advisory:
https://lore.kernel.org/linux-cve-announce/2025013103-CVE-2025-21681-ed9d@gregkh/T

Note You need to log in before you can comment on or make changes to this bug.