Bug 2301519 (CVE-2024-42152)

Summary:	CVE-2024-42152 kernel: nvmet: fix a possible leak when destroy a ctrl during qp establishment
Product:	[Other] Security Response	Reporter:	OSIDB Bzimport <bzimport>
Component:	vulnerability	Assignee:	Product Security DevOps Team <prodsec-dev>
Status:	NEW ---	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	unspecified	CC:	dfreiber, drow, jburrell, vkumar
Target Milestone:	---	Keywords:	Security
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	kernel 5.10.222, kernel 5.15.163, kernel 6.1.98, kernel 6.6.39, kernel 6.9.9, kernel 6.10	Doc Type:	If docs needed, set a value
Doc Text:	A vulnerability was found in the Linux kernel's nvme driver. A lack of proper checks can lead to a race condition during the destruction of a queue pair when a controller is being established. This issue can lead to system instability or crashes.	Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2302019
Bug Blocks:

Description OSIDB Bzimport 2024-07-30 08:40:28 UTC

In the Linux kernel, the following vulnerability has been resolved:

nvmet: fix a possible leak when destroy a ctrl during qp establishment

In nvmet_sq_destroy we capture sq->ctrl early and if it is non-NULL we
know that a ctrl was allocated (in the admin connect request handler)
and we need to release pending AERs, clear ctrl->sqs and sq->ctrl
(for nvme-loop primarily), and drop the final reference on the ctrl.

However, a small window is possible where nvmet_sq_destroy starts (as
a result of the client giving up and disconnecting) concurrently with
the nvme admin connect cmd (which may be in an early stage). But *before*
kill_and_confirm of sq->ref (i.e. the admin connect managed to get an sq
live reference). In this case, sq->ctrl was allocated however after it was
captured in a local variable in nvmet_sq_destroy.
This prevented the final reference drop on the ctrl.

Solve this by re-capturing the sq->ctrl after all inflight request has
completed, where for sure sq->ctrl reference is final, and move forward
based on that.

This issue was observed in an environment with many hosts connecting
multiple ctrls simoutanuosly, creating a delay in allocating a ctrl
leading up to this race window.

Comment 1 Mauro Matteo Cascella 2024-07-31 11:06:26 UTC

Upstream advisory:
https://lore.kernel.org/linux-cve-announce/2024073034-CVE-2024-42152-c248@gregkh/T

Comment 2 Mauro Matteo Cascella 2024-07-31 11:06:46 UTC

Created kernel tracking bugs for this issue:

Affects: fedora-all [bug 2302019]

Comment 10 errata-xmlrpc 2024-08-28 12:21:17 UTC

This issue has been addressed in the following products:

  Red Hat Enterprise Linux 9

Via RHSA-2024:5928 https://access.redhat.com/errata/RHSA-2024:5928

Comment 11 errata-xmlrpc 2024-09-24 00:39:57 UTC

This issue has been addressed in the following products:

  Red Hat Enterprise Linux 8

Via RHSA-2024:7001 https://access.redhat.com/errata/RHSA-2024:7001

Comment 12 errata-xmlrpc 2024-09-24 02:35:27 UTC

This issue has been addressed in the following products:

  Red Hat Enterprise Linux 8

Via RHSA-2024:7000 https://access.redhat.com/errata/RHSA-2024:7000

Comment 14 errata-xmlrpc 2024-10-30 00:13:07 UTC

This issue has been addressed in the following products:

  Red Hat Enterprise Linux 9.2 Extended Update Support

Via RHSA-2024:8614 https://access.redhat.com/errata/RHSA-2024:8614

Comment 15 errata-xmlrpc 2024-10-30 00:32:21 UTC

This issue has been addressed in the following products:

  Red Hat Enterprise Linux 9.2 Extended Update Support

Via RHSA-2024:8613 https://access.redhat.com/errata/RHSA-2024:8613