Bug 2301519 (CVE-2024-42152)

Summary: CVE-2024-42152 kernel: nvmet: fix a possible leak when destroy a ctrl during qp establishment
Product: [Other] Security Response Reporter: OSIDB Bzimport <bzimport>
Component: vulnerabilityAssignee: Product Security DevOps Team <prodsec-dev>
Status: NEW --- QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: dfreiber, drow, jburrell, vkumar
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel 5.10.222, kernel 5.15.163, kernel 6.1.98, kernel 6.6.39, kernel 6.9.9, kernel 6.10 Doc Type: If docs needed, set a value
Doc Text:
A vulnerability was found in the Linux kernel's nvme driver. A lack of proper checks can lead to a race condition during the destruction of a queue pair when a controller is being established. This issue can lead to system instability or crashes.
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2302019    
Bug Blocks:    

Description OSIDB Bzimport 2024-07-30 08:40:28 UTC
In the Linux kernel, the following vulnerability has been resolved:

nvmet: fix a possible leak when destroy a ctrl during qp establishment

In nvmet_sq_destroy we capture sq->ctrl early and if it is non-NULL we
know that a ctrl was allocated (in the admin connect request handler)
and we need to release pending AERs, clear ctrl->sqs and sq->ctrl
(for nvme-loop primarily), and drop the final reference on the ctrl.

However, a small window is possible where nvmet_sq_destroy starts (as
a result of the client giving up and disconnecting) concurrently with
the nvme admin connect cmd (which may be in an early stage). But *before*
kill_and_confirm of sq->ref (i.e. the admin connect managed to get an sq
live reference). In this case, sq->ctrl was allocated however after it was
captured in a local variable in nvmet_sq_destroy.
This prevented the final reference drop on the ctrl.

Solve this by re-capturing the sq->ctrl after all inflight request has
completed, where for sure sq->ctrl reference is final, and move forward
based on that.

This issue was observed in an environment with many hosts connecting
multiple ctrls simoutanuosly, creating a delay in allocating a ctrl
leading up to this race window.

Comment 1 Mauro Matteo Cascella 2024-07-31 11:06:26 UTC
Upstream advisory:
https://lore.kernel.org/linux-cve-announce/2024073034-CVE-2024-42152-c248@gregkh/T

Comment 2 Mauro Matteo Cascella 2024-07-31 11:06:46 UTC
Created kernel tracking bugs for this issue:

Affects: fedora-all [bug 2302019]

Comment 10 errata-xmlrpc 2024-08-28 12:21:17 UTC
This issue has been addressed in the following products:

  Red Hat Enterprise Linux 9

Via RHSA-2024:5928 https://access.redhat.com/errata/RHSA-2024:5928

Comment 11 errata-xmlrpc 2024-09-24 00:39:57 UTC
This issue has been addressed in the following products:

  Red Hat Enterprise Linux 8

Via RHSA-2024:7001 https://access.redhat.com/errata/RHSA-2024:7001

Comment 12 errata-xmlrpc 2024-09-24 02:35:27 UTC
This issue has been addressed in the following products:

  Red Hat Enterprise Linux 8

Via RHSA-2024:7000 https://access.redhat.com/errata/RHSA-2024:7000

Comment 14 errata-xmlrpc 2024-10-30 00:13:07 UTC
This issue has been addressed in the following products:

  Red Hat Enterprise Linux 9.2 Extended Update Support

Via RHSA-2024:8614 https://access.redhat.com/errata/RHSA-2024:8614

Comment 15 errata-xmlrpc 2024-10-30 00:32:21 UTC
This issue has been addressed in the following products:

  Red Hat Enterprise Linux 9.2 Extended Update Support

Via RHSA-2024:8613 https://access.redhat.com/errata/RHSA-2024:8613