RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2007890 - [RHEL9] blktests block/009 failure
Summary: [RHEL9] blktests block/009 failure
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Ming Lei
QA Contact: Zhang Yi
URL:
Whiteboard:
Depends On: 2001733
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-26 00:14 UTC by Ming Lei
Modified: 2023-08-08 03:07 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2001733
Environment:
Last Closed: 2022-04-17 06:17:49 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-98188 0 None None None 2021-09-26 00:18:26 UTC

Description Ming Lei 2021-09-26 00:14:55 UTC
+++ This bug was initially created as a clone of Bug #2001733 +++

Description of problem:
[RHEL8] blktests block/009 failure

Version-Release number of selected component (if applicable):
4.18.0-340.el8.x86_64

How reproducible:
10%

Steps to Reproduce:
1. blktests block/009
2.
3.

Actual results:


Expected results:


Additional info:

[root@storageqe-62 blktests]# ./check block/009
block/009 (check page-cache coherency after BLKDISCARD)      [failed]
    runtime  0.727s  ...  0.722s
    --- tests/block/009.out	2021-09-06 09:01:00.942957774 -0400
    +++ /mnt/tests/kernel/storage/SSD/nvme_blktest/blktests/results/nodev/block/009.out.bad	2021-09-06 23:52:03.038467698 -0400
    @@ -1,6 +1,10 @@
     Running block/009
     0000000 0000 0000 0000 0000 0000 0000 0000 0000
     *
    +1ffe000 aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa
    +*
    +1fff000 0000 0000 0000 0000 0000 0000 0000 0000
    +*
    ...
    (Run 'diff -u tests/block/009.out /mnt/tests/kernel/storage/SSD/nvme_blktest/blktests/results/nodev/block/009.out.bad' to see the entire diff)

[root@storageqe-62 blktests]# cat results/nodev/block/
001.out             009                 015                 020.full            024                 029.full            fio-output-031.txt
001.runtime         009.full            016                 021                 025                 030                 
002                 009.out.bad         017                 021.full            027                 031                 
002.full            010                 018                 022                 028                 031.full            
006                 014                 020                 023                 029                 fio-output-029.txt  
[root@storageqe-62 blktests]# cat results/nodev/block/009.
009.full     009.out.bad  
[root@storageqe-62 blktests]# cat results/nodev/block/009.out.bad 
Running block/009
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
1ffe000 aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa
*
1fff000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
2000000
Test complete

--- Additional comment from Ming Lei on 2021-09-15 01:45:44 UTC ---


v5.14 adds ->invalidate_lock for avoiding the race, and most of that changes are in
mm/fs.

Also blkdev_fallocate() isn't covered in Jan's patchset, I will cook a patch to fix
rhel9 first.

Then we can discuss if this kind of issue need to be fixed in rhel8 given it has been
long time in linux kernel.

Seems no one complained it before?

Thanks

--- Additional comment from Ming Lei on 2021-09-15 01:52:38 UTC ---

(In reply to Ming Lei from comment #1)
> v5.14 adds ->invalidate_lock for avoiding the race, and most of that changes
> are in
> mm/fs.

oops, ->invalidate_lock isn't included in v5.14, but it has been in latest linus tree
and will be released in v5.15.

I wil test recent linus tree and see if the issue exists.

--- Additional comment from Ming Lei on 2021-09-15 09:29:54 UTC ---

(In reply to Ming Lei from comment #2)
> (In reply to Ming Lei from comment #1)
> > v5.14 adds ->invalidate_lock for avoiding the race, and most of that changes
> > are in
> > mm/fs.
> 
> oops, ->invalidate_lock isn't included in v5.14, but it has been in latest
> linus tree
> and will be released in v5.15.
> 
> I wil test recent linus tree and see if the issue exists.

The issue can be reproduced in latest linus tree v5.15-rc1+, just after fixing one
kernel panic and block/009 can be run.

--- Additional comment from Ming Lei on 2021-09-15 12:45:57 UTC ---

(In reply to Ming Lei from comment #3)
> (In reply to Ming Lei from comment #2)
> > (In reply to Ming Lei from comment #1)
> > > v5.14 adds ->invalidate_lock for avoiding the race, and most of that changes
> > > are in
> > > mm/fs.
> > 
> > oops, ->invalidate_lock isn't included in v5.14, but it has been in latest
> > linus tree
> > and will be released in v5.15.
> > 
> > I wil test recent linus tree and see if the issue exists.
> 
> The issue can be reproduced in latest linus tree v5.15-rc1+, just after
> fixing one
> kernel panic and block/009 can be run.

The following patch can fix the same issue on rhel9:

https://lore.kernel.org/linux-block/20210915123545.1000534-1-ming.lei@redhat.com/T/#u

Comment 1 Ming Lei 2021-09-26 00:16:25 UTC
The following patch aimed at v5.15 should address the issue:

https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=block-5.15&id=f278eb3d8178f9c31f8dfad7e91440e603dd7f1a

Comment 2 Masayoshi Mizuma (Fujitsu) 2021-11-24 15:51:35 UTC
Hi Ming,

This issue may happen a data corruption because the fallocate() can read invalid page caches, right?
So it would be better if the patch can be backported to rhel9 kernel by the GA.

I believe the patch is already merged in v5.15.

    f278eb3d8178 block: hold ->invalidate_lock in blkdev_fallocate

$ git describe --contain f278eb3d8178
v5.15-rc3~10^2
$

Thanks,
Masa

Comment 3 Ming Lei 2021-11-26 02:44:03 UTC
(In reply to Masayoshi Mizuma (Fujitsu) from comment #2)
> Hi Ming,
> 
> This issue may happen a data corruption because the fallocate() can read
> invalid page caches, right?
> So it would be better if the patch can be backported to rhel9 kernel by the
> GA.
> 
> I believe the patch is already merged in v5.15.
> 
>     f278eb3d8178 block: hold ->invalidate_lock in blkdev_fallocate
> 
> $ git describe --contain f278eb3d8178
> v5.15-rc3~10^2
> $
> 

That depends on 730633f0b7f9 ("mm: Protect operations adding pages to page cache
with invalidate_lock") and related patch series.

Cc our mm and vfs guys.

Thanks,

Comment 4 Rafael Aquini 2021-11-29 19:48:19 UTC
(In reply to Ming Lei from comment #3)
[...]
> That depends on 730633f0b7f9 ("mm: Protect operations adding pages to page
> cache
> with invalidate_lock") and related patch series.
> 

We'll be pulling in commit 730633f0b7f9 (and its MM friends) through Bug 2023396

work on the MR for the aforementioned BZ is wrapping up and we intend to soon have
it pushed to gitlab. This BZ can be set to depend on 2023396 if commit 730633f0b7f9
is its only requirement.

-- Rafael

Comment 5 Ming Lei 2021-12-14 03:23:01 UTC

BTW, all fixes have been foled into MR148 for addressing BZ2018403.

Once MR148 is merged, this BZ can be marked as on_qa.


Note You need to log in before you can comment on or make changes to this bug.