Bug 1992700

Summary: blk-mq: fix kernel panic when iterating over flush request
Product: Red Hat Enterprise Linux 8 Reporter: Ming Lei <minlei>
Component: kernelAssignee: Ming Lei <minlei>
kernel sub component: Block Layer QA Contact: ChanghuiZhong <czhong>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bgoncalv, brdeoliv, coughlan, czhong, jmoyer, lmiksik, nyewale, revers
Version: 8.5Keywords: Bugfix, Triaged
Target Milestone: rc   
Target Release: 8.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-4.18.0-340.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 19:27:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Ming Lei 2021-08-11 15:25:21 UTC
Description of problem:

https://lore.kernel.org/linux-block/20210811142624.618598-1-ming.lei@redhat.com/T/#u

    blk-mq: fix kernel panic during iterating over flush request
    
    For fixing use-after-free during iterating over requests, we grabbed
    request's refcount before calling ->fn in commit 2e315dc07df0 ("blk-mq:
    grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter").
    Turns out this way may cause kernel panic when iterating over one flush
    request:
    
    1) old flush request's tag is just released, and this tag is reused by
    one new request, but ->rqs[] isn't updated yet
    
    2) the flush request can be re-used for submitting one new flush command,
    so blk_rq_init() is called at the same time
    
    3) meantime blk_mq_queue_tag_busy_iter() is called, and old flush request
    is retrieved from ->rqs[tag]; when blk_mq_put_rq_ref() is called,
    flush_rq->end_io may not be updated yet, so NULL pointer dereference
    is triggered in blk_mq_put_rq_ref().
    
    Fix the issue by calling refcount_set(&flush_rq->ref, 1) after
    flush_rq->end_io is set. So far the only other caller of blk_rq_init() is
    scsi_ioctl_reset() in which the request doesn't enter block IO stack and
    the request reference count isn't used, so the change is safe.
    
    Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in
    blk_mq_tagset_busy_iter")
    Reported-by: "Blank-Burian, Markus, Dr." <blankburian>
    Tested-by: "Blank-Burian, Markus, Dr." <blankburian>
    Signed-off-by: Ming Lei <ming.lei>

2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter") has been merged to rhel8.5


Version-Release number of selected component (if applicable):


How reproducible:

About one time after running some container workloads for 30 minutes


Steps to Reproduce:

N/A


Actual results:

kernel panic when running some container workloads via openstack

Expected results:

No kernel panic and the workloads can be run successfully


Additional info:

Comment 19 ChanghuiZhong 2021-09-08 03:53:35 UTC
sanity test passed with kernel-4.18.0-340.el8:
https://beaker.engineering.redhat.com/jobs/5775992
https://beaker.engineering.redhat.com/jobs/5776549

can not reproduce this issue in blktests test

all patches has included to kernel tree:
$ git log kernel-4.18.0-340.el8 --oneline --grep=1992700
7e74656663d7 Merge: blk-mq: fix kernel panic when iterating over flush request
fd9ee21126cb blk-mq: fix is_flush_rq
964bb31688ac blk-mq: fix kernel panic during iterating over flush request


Move to verified + sanityonly

Comment 29 errata-xmlrpc 2021-11-09 19:27:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4356