1. Please describe the problem: Mounting a xfs file system zram device on ppc64le fails with: "can't read superblock on /dev/zram0". dmesg output shows several errors related to metadata: [ 3247.206007] XFS (zram0): Mounting V5 Filesystem 0b7d6149-614c-4f4c-9a1f-a80a9810f58f [ 3247.210781] XFS (zram0): Metadata CRC error detected at xfs_agf_read_verify+0x108/0x150 [xfs], xfs_agf block 0x80008 [ 3247.211121] XFS (zram0): Unmount and run xfs_repair [ 3247.211198] XFS (zram0): First 128 bytes of corrupted metadata buffer: [ 3247.211293] 00000000: fe ed ba be 00 00 00 00 00 00 00 02 00 00 00 00 ................ [ 3247.211405] 00000010: 00 00 00 00 00 00 00 18 00 00 00 01 00 00 00 00 ................ [ 3247.211515] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3247.211625] 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3247.211735] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3247.211842] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3247.211951] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3247.212063] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 3247.212171] XFS (zram0): metadata I/O error in "xfs_read_agf+0xb4/0x180 [xfs]" at daddr 0x80008 len 8 error 74 [ 3247.212485] XFS (zram0): Error -117 reserving per-AG metadata reserve pool. [ 3247.212497] XFS (zram0): Corruption of in-memory data (0x8) detected at xfs_fs_reserve_ag_blocks+0x1e0/0x220 [xfs] (fs/xfs/xfs_fsops.c:587). Shutting down filesystem. [ 3247.212828] XFS (zram0): Please unmount the filesystem and rectify the problem(s) [ 3247.212943] XFS (zram0): Ending clean mount [ 3247.212970] XFS (zram0): Error -5 reserving per-AG metadata reserve pool. 2. What is the Version-Release number of the kernel: [core@cosa-devsh ~]$ rpm -qa | grep kernel kernel-modules-core-6.5.0-0.rc0.20230705gitd528014517f2.10.fc39.ppc64le kernel-core-6.5.0-0.rc0.20230705gitd528014517f2.10.fc39.ppc64le kernel-modules-6.5.0-0.rc0.20230705gitd528014517f2.10.fc39.ppc64le kernel-6.5.0-0.rc0.20230705gitd528014517f2.10.fc39.ppc64le 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : This was first seen in kernel: 6.4.0-0.rc0.20230428git33afd4b76393.7.fc39. There were no errors in previous kernel versions. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: create and mount a zram xfs device in rawhide ppc64le. The steps I used to recreate this were: 1. modprobe zram num_devices=0 2. read dev < /sys/class/zram-control/hot_add 3. echo 10G > /sys/block/zram"${dev}"/disksize #(any disksize causes errors) 4. mkfs.xfs /dev/zram"${dev}" 5. mount -t xfs /dev/zram"${dev}" /tmp These steps succeed on x86_64 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Yes 6. Are you running any modules that not shipped with directly Fedora's kernel?: 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag.
We did a kernel bisect with the reproducer developed by @marmijo above and found the offending commit: ``` [root@ibm-p8-kvm-03-guest-02 linux]# git bisect good af8b04c63708fa730c0257084fab91fb2a9cecc4 is the first bad commit commit af8b04c63708fa730c0257084fab91fb2a9cecc4 Author: Christoph Hellwig <hch> Date: Tue Apr 11 19:14:46 2023 +0200 zram: simplify bvec iteration in __zram_make_request bio_for_each_segment synthetize bvecs that never cross page boundaries, so don't duplicate that work in an inner loop. Link: https://lkml.kernel.org/r/20230411171459.567614-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch> Reviewed-by: Sergey Senozhatsky <senozhatsky> Acked-by: Minchan Kim <minchan> Cc: Jens Axboe <axboe> Signed-off-by: Andrew Morton <akpm> drivers/block/zram/zram_drv.c | 42 +++++++++++------------------------------- 1 file changed, 11 insertions(+), 31 deletions(-) ```
cross referencing: - Fedora CoreOS issue tracker: - https://github.com/coreos/fedora-coreos-tracker/issues/1489 - linux-kernel.org post: - https://lkml.org/lkml/2023/8/1/1629
https://lore.kernel.org/all/20230805055537.147835-1-hch@lst.de/
The fix landed upstream in: ``` commit 95848dcb9d676738411a8ff70a9704039f1b3982 Refs: v6.4-11516-g95848dcb9d67 Author: Christoph Hellwig <hch> AuthorDate: Sat Aug 5 07:55:37 2023 +0200 Commit: Jens Axboe <axboe> CommitDate: Sat Aug 5 16:13:15 2023 -0600 zram: take device and not only bvec offset into account Commit af8b04c63708 ("zram: simplify bvec iteration in __zram_make_request") changed the bio iteration in zram to rely on the implicit capping to page boundaries in bio_for_each_segment. But it failed to care for the fact zram not only care about the page alignment of the bio payload, but also the page alignment into the device. For buffered I/O and swap those are the same, but for direct I/O or kernel internal I/O like XFS log buffer writes they can differ. Fix this by open coding bio_for_each_segment and limiting the bvec len so that it never crosses over a page alignment boundary in the device in addition to the payload boundary already taken care of by bio_iter_iovec. Cc: stable.org Fixes: af8b04c63708 ("zram: simplify bvec iteration in __zram_make_request") Reported-by: Dusty Mabe <dusty> Signed-off-by: Christoph Hellwig <hch> Acked-by: Sergey Senozhatsky <senozhatsky> Link: https://lore.kernel.org/r/20230805055537.147835-1-hch@lst.de Signed-off-by: Jens Axboe <axboe> --- drivers/block/zram/zram_drv.c | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-) ```
FEDORA-2023-1ccaad9e2e has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-1ccaad9e2e
FEDORA-2023-cb2ef9c22c has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2023-cb2ef9c22c
FEDORA-2023-1ccaad9e2e has been pushed to the Fedora 38 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-1ccaad9e2e` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-1ccaad9e2e See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2023-cb2ef9c22c has been pushed to the Fedora 37 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-cb2ef9c22c` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-cb2ef9c22c See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2023-cb2ef9c22c has been pushed to the Fedora 37 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2023-1ccaad9e2e has been pushed to the Fedora 38 stable repository. If problem still persists, please make note of it in this bug report.