Bug 2221314 - Metadata CRC error detected when mounting xfs zram device on ppc64le
Summary: Metadata CRC error detected when mounting xfs zram device on ppc64le
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: ppc64le
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2023-07-07 22:06 UTC by Michael Armijo
Modified: 2023-08-19 01:15 UTC (History)
21 users (show)

Fixed In Version: kernel-6.4.11-100.fc37 kernel-6.4.11-200.fc38
Clone Of:
Environment:
Last Closed: 2023-08-19 00:48:07 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Michael Armijo 2023-07-07 22:06:24 UTC
1. Please describe the problem:

Mounting a xfs file system zram device on ppc64le fails with: 
"can't read superblock on /dev/zram0".

dmesg output shows several errors related to metadata:

[ 3247.206007] XFS (zram0): Mounting V5 Filesystem 0b7d6149-614c-4f4c-9a1f-a80a9810f58f
[ 3247.210781] XFS (zram0): Metadata CRC error detected at xfs_agf_read_verify+0x108/0x150 [xfs], xfs_agf block 0x80008 
[ 3247.211121] XFS (zram0): Unmount and run xfs_repair
[ 3247.211198] XFS (zram0): First 128 bytes of corrupted metadata buffer:
[ 3247.211293] 00000000: fe ed ba be 00 00 00 00 00 00 00 02 00 00 00 00 ................
[ 3247.211405] 00000010: 00 00 00 00 00 00 00 18 00 00 00 01 00 00 00 00  ................
[ 3247.211515] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.211625] 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.211735] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.211842] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.211951] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.212063] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.212171] XFS (zram0): metadata I/O error in "xfs_read_agf+0xb4/0x180 [xfs]" at daddr 0x80008 len 8 error 74
[ 3247.212485] XFS (zram0): Error -117 reserving per-AG metadata reserve pool.
[ 3247.212497] XFS (zram0): Corruption of in-memory data (0x8) detected at xfs_fs_reserve_ag_blocks+0x1e0/0x220 [xfs] (fs/xfs/xfs_fsops.c:587).  Shutting down filesystem.
[ 3247.212828] XFS (zram0): Please unmount the filesystem and rectify the problem(s)
[ 3247.212943] XFS (zram0): Ending clean mount
[ 3247.212970] XFS (zram0): Error -5 reserving per-AG metadata reserve pool.

2. What is the Version-Release number of the kernel:

[core@cosa-devsh ~]$ rpm -qa | grep kernel
kernel-modules-core-6.5.0-0.rc0.20230705gitd528014517f2.10.fc39.ppc64le
kernel-core-6.5.0-0.rc0.20230705gitd528014517f2.10.fc39.ppc64le
kernel-modules-6.5.0-0.rc0.20230705gitd528014517f2.10.fc39.ppc64le
kernel-6.5.0-0.rc0.20230705gitd528014517f2.10.fc39.ppc64le

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

This was first seen in kernel: 6.4.0-0.rc0.20230428git33afd4b76393.7.fc39. There were no errors in previous kernel versions.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

create and mount a zram xfs device in rawhide ppc64le. The steps I used to recreate this were:

1. modprobe zram num_devices=0
2. read dev < /sys/class/zram-control/hot_add
3. echo 10G > /sys/block/zram"${dev}"/disksize    #(any disksize causes errors)
4. mkfs.xfs /dev/zram"${dev}"
5. mount -t xfs /dev/zram"${dev}" /tmp

These steps succeed on x86_64

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Yes

6. Are you running any modules that not shipped with directly Fedora's kernel?:


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Dusty Mabe 2023-08-02 03:19:07 UTC
We did a kernel bisect with the reproducer developed by @marmijo above and found the offending commit:

```
[root@ibm-p8-kvm-03-guest-02 linux]# git bisect good
af8b04c63708fa730c0257084fab91fb2a9cecc4 is the first bad commit
commit af8b04c63708fa730c0257084fab91fb2a9cecc4
Author: Christoph Hellwig <hch>
Date:   Tue Apr 11 19:14:46 2023 +0200

    zram: simplify bvec iteration in __zram_make_request
    
    bio_for_each_segment synthetize bvecs that never cross page boundaries, so
    don't duplicate that work in an inner loop.
    
    Link: https://lkml.kernel.org/r/20230411171459.567614-5-hch@lst.de
    Signed-off-by: Christoph Hellwig <hch>
    Reviewed-by: Sergey Senozhatsky <senozhatsky>
    Acked-by: Minchan Kim <minchan>
    Cc: Jens Axboe <axboe>
    Signed-off-by: Andrew Morton <akpm>

 drivers/block/zram/zram_drv.c | 42 +++++++++++-------------------------------
 1 file changed, 11 insertions(+), 31 deletions(-)
```

Comment 2 Dusty Mabe 2023-08-02 03:35:58 UTC
cross referencing:

- Fedora CoreOS issue tracker:
    - https://github.com/coreos/fedora-coreos-tracker/issues/1489
- linux-kernel.org post:
    - https://lkml.org/lkml/2023/8/1/1629

Comment 4 Dusty Mabe 2023-08-11 20:27:05 UTC
The fix landed upstream in:

```
commit 95848dcb9d676738411a8ff70a9704039f1b3982                                                                                                                                                                                                      
Refs: v6.4-11516-g95848dcb9d67                                                                                                                                                                                                                       
Author:     Christoph Hellwig <hch>                                                                                                                                                                                                           
AuthorDate: Sat Aug 5 07:55:37 2023 +0200                                                                                                                                                                                                            
Commit:     Jens Axboe <axboe>                                                                                                                                                                                                             
CommitDate: Sat Aug 5 16:13:15 2023 -0600                                                                                                                                                                                                            

    zram: take device and not only bvec offset into account

    Commit af8b04c63708 ("zram: simplify bvec iteration in
    __zram_make_request") changed the bio iteration in zram to rely on the
    implicit capping to page boundaries in bio_for_each_segment.  But it
    failed to care for the fact zram not only care about the page alignment
    of the bio payload, but also the page alignment into the device.  For
    buffered I/O and swap those are the same, but for direct I/O or kernel
    internal I/O like XFS log buffer writes they can differ.

    Fix this by open coding bio_for_each_segment and limiting the bvec len
    so that it never crosses over a page alignment boundary in the device
    in addition to the payload boundary already taken care of by
    bio_iter_iovec.

    Cc: stable.org                                                                                                                                                                                                                       
    Fixes: af8b04c63708 ("zram: simplify bvec iteration in __zram_make_request")
    Reported-by: Dusty Mabe <dusty>                                                                                                                                                                                                    
    Signed-off-by: Christoph Hellwig <hch>                                                                                                                                                                                                    
    Acked-by: Sergey Senozhatsky <senozhatsky>                                                                                                                                                                                          
    Link: https://lore.kernel.org/r/20230805055537.147835-1-hch@lst.de
    Signed-off-by: Jens Axboe <axboe>                                                                                                                                                                                                      
---                                                                                                                                                                                                                                                  
 drivers/block/zram/zram_drv.c | 32 ++++++++++++++++++++------------                                                                                                                                                                                 
 1 file changed, 20 insertions(+), 12 deletions(-)
```

Comment 5 Fedora Update System 2023-08-16 20:34:48 UTC
FEDORA-2023-1ccaad9e2e has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-1ccaad9e2e

Comment 6 Fedora Update System 2023-08-16 20:34:49 UTC
FEDORA-2023-cb2ef9c22c has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2023-cb2ef9c22c

Comment 7 Fedora Update System 2023-08-17 01:30:50 UTC
FEDORA-2023-1ccaad9e2e has been pushed to the Fedora 38 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-1ccaad9e2e`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-1ccaad9e2e

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 8 Fedora Update System 2023-08-17 01:32:34 UTC
FEDORA-2023-cb2ef9c22c has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-cb2ef9c22c`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-cb2ef9c22c

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 9 Fedora Update System 2023-08-19 00:48:07 UTC
FEDORA-2023-cb2ef9c22c has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 10 Fedora Update System 2023-08-19 01:15:00 UTC
FEDORA-2023-1ccaad9e2e has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.