Bug 1992700 - blk-mq: fix kernel panic when iterating over flush request
Summary: blk-mq: fix kernel panic when iterating over flush request
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel
Version: 8.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.5
Assignee: Ming Lei
QA Contact: ChanghuiZhong
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-11 15:25 UTC by Ming Lei
Modified: 2021-11-10 06:27 UTC (History)
8 users (show)

Fixed In Version: kernel-4.18.0-340.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-09 19:27:01 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/rhel/src/kernel rhel-8 merge_requests 1220 0 None None None 2021-08-24 00:37:40 UTC
Red Hat Issue Tracker RHELPLAN-93250 0 None None None 2021-08-11 15:27:42 UTC
Red Hat Product Errata RHSA-2021:4356 0 None None None 2021-11-09 19:27:18 UTC

Description Ming Lei 2021-08-11 15:25:21 UTC
Description of problem:

https://lore.kernel.org/linux-block/20210811142624.618598-1-ming.lei@redhat.com/T/#u

    blk-mq: fix kernel panic during iterating over flush request
    
    For fixing use-after-free during iterating over requests, we grabbed
    request's refcount before calling ->fn in commit 2e315dc07df0 ("blk-mq:
    grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter").
    Turns out this way may cause kernel panic when iterating over one flush
    request:
    
    1) old flush request's tag is just released, and this tag is reused by
    one new request, but ->rqs[] isn't updated yet
    
    2) the flush request can be re-used for submitting one new flush command,
    so blk_rq_init() is called at the same time
    
    3) meantime blk_mq_queue_tag_busy_iter() is called, and old flush request
    is retrieved from ->rqs[tag]; when blk_mq_put_rq_ref() is called,
    flush_rq->end_io may not be updated yet, so NULL pointer dereference
    is triggered in blk_mq_put_rq_ref().
    
    Fix the issue by calling refcount_set(&flush_rq->ref, 1) after
    flush_rq->end_io is set. So far the only other caller of blk_rq_init() is
    scsi_ioctl_reset() in which the request doesn't enter block IO stack and
    the request reference count isn't used, so the change is safe.
    
    Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in
    blk_mq_tagset_busy_iter")
    Reported-by: "Blank-Burian, Markus, Dr." <blankburian>
    Tested-by: "Blank-Burian, Markus, Dr." <blankburian>
    Signed-off-by: Ming Lei <ming.lei>

2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter") has been merged to rhel8.5


Version-Release number of selected component (if applicable):


How reproducible:

About one time after running some container workloads for 30 minutes


Steps to Reproduce:

N/A


Actual results:

kernel panic when running some container workloads via openstack

Expected results:

No kernel panic and the workloads can be run successfully


Additional info:

Comment 19 ChanghuiZhong 2021-09-08 03:53:35 UTC
sanity test passed with kernel-4.18.0-340.el8:
https://beaker.engineering.redhat.com/jobs/5775992
https://beaker.engineering.redhat.com/jobs/5776549

can not reproduce this issue in blktests test

all patches has included to kernel tree:
$ git log kernel-4.18.0-340.el8 --oneline --grep=1992700
7e74656663d7 Merge: blk-mq: fix kernel panic when iterating over flush request
fd9ee21126cb blk-mq: fix is_flush_rq
964bb31688ac blk-mq: fix kernel panic during iterating over flush request


Move to verified + sanityonly

Comment 29 errata-xmlrpc 2021-11-09 19:27:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4356


Note You need to log in before you can comment on or make changes to this bug.