Bug 1698057
| Summary: | Level 0 xfsdump of filesystem on drbd partition cannot be restored with xfsrestore [rhel-7.9.z] | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | S Massey <smassey> | ||||||||||
| Component: | xfsdump | Assignee: | Eric Sandeen <esandeen> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Murphy Zhou <xzhou> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 7.5 | CC: | chorn, ddouwsma, dwysocha, esandeen, fsorenso, hmatsumo, jamesb, kpfleming, masanari.iida, tnagata, xzhou, yoguma, zlang | ||||||||||
| Target Milestone: | rc | Keywords: | Reopened, Triaged, ZStream | ||||||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||||||
| Hardware: | x86_64 | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | xfsdump-3.1.7-2.el7_9 | Doc Type: | If docs needed, set a value | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | |||||||||||||
| : | 2116962 2168000 (view as bug list) | Environment: | |||||||||||
| Last Closed: | 2022-09-20 09:01:19 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | |||||||||||||
| Bug Blocks: | 2116962, 2168000 | ||||||||||||
| Attachments: |
|
||||||||||||
|
Description
S Massey
2019-04-09 14:15:38 UTC
Right, this sounds like the bug fixed by https://git.kernel.org/pub/scm/fs/xfs/xfsdump-dev.git/commit/?id=25195ebf107dc81b1b7cea1476764950e1d6cc9d but that should be present in the xfspdump-3.1.7 package in RHEL7. Do you have bind mounts involved in your setup? bug https://bugzilla.redhat.com/show_bug.cgi?id=1405285 has a lot of debugging steps shown. Can you try again with "-v trace" and include that output? Can you provide the core file from the coredump above? Just to verify version, here's the first few lines of logging from the originally posted example: xfsdump: using file dump (drive_simple) strategy xfsdump: version 3.1.7 (dump format 3.0) xfsdump: level 0 dump of ##########:/data xfsdump: dump date: Tue Apr 9 10:21:49 2019 xfsdump: session id: f21054e9-49f0-4ed8-96e9-bb2f3f18f1a3 xfsdump: session label: "##########" xfsdump: NOTE: root ino 64 differs from mount dir ino 128, bind mount? *** CORRECTION *** drbd partition has external metadata (in case that matters) Created attachment 1553897 [details]
xfsrestore trace
Attached trace (-v 5) of failed xfsrestore
(In reply to Eric Sandeen from comment #2) > Right, this sounds like the bug fixed by > https://git.kernel.org/pub/scm/fs/xfs/xfsdump-dev.git/commit/ > ?id=25195ebf107dc81b1b7cea1476764950e1d6cc9d > but that should be present in the xfspdump-3.1.7 package in RHEL7. > > Do you have bind mounts involved in your setup? > > bug https://bugzilla.redhat.com/show_bug.cgi?id=1405285 has a lot of > debugging steps shown. > > Can you try again with "-v trace" and include that output? Can you provide > the core file from the coredump above? There are no explicit bind mounts. (In reply to smassey from comment #6) > (In reply to Eric Sandeen from comment #2) > > Do you have bind mounts involved in your setup? > There are no explicit bind mounts. and yet: > xfsdump: NOTE: root ino 64 differs from mount dir ino 128, bind mount? providing /proc/mounts would be helpful too, I think. I'll look over the trace. Created attachment 1553903 [details]
Debug of xfsrestore w/symbols
Debug session of xfsrestore w/symbols, showing backtrace and values of variables at assert failure.
Just to double check, was the dump itself (which fails to restore) created with xfsdump v3.1.7 as well, or with a prior version? (In reply to Eric Sandeen from comment #7) > (In reply to smassey from comment #6) > > (In reply to Eric Sandeen from comment #2) > > > > > Do you have bind mounts involved in your setup? > > > There are no explicit bind mounts. > > and yet: > > > xfsdump: NOTE: root ino 64 differs from mount dir ino 128, bind mount? > > providing /proc/mounts would be helpful too, I think. I'll look over the > trace. # cat /proc/mounts rootfs / rootfs rw 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 devtmpfs /dev devtmpfs rw,nosuid,size=74171572k,nr_inodes=18542893,mode=755 0 0 securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0 tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0 tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0 cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0 pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0 cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_prio,net_cls 0 0 cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0 cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0 cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpuacct,cpu 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0 cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0 cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0 cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0 cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0 configfs /sys/kernel/config configfs rw,relatime 0 0 /dev/sdb2 / xfs rw,relatime,attr2,inode64,noquota 0 0 systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=25,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21595 0 0 hugetlbfs /dev/hugepages hugetlbfs rw,relatime 0 0 mqueue /dev/mqueue mqueue rw,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,relatime 0 0 /dev/sdb3 /tmp xfs rw,relatime,attr2,inode64,noquota 0 0 /dev/sdb1 /boot xfs rw,relatime,attr2,inode64,noquota 0 0 sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0 /etc/auto.misc /misc autofs rw,relatime,fd=6,pgrp=16069,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=32134 0 0 -hosts /net autofs rw,relatime,fd=12,pgrp=16069,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=61375 0 0 /etc/auto.xxx /xxx autofs rw,relatime,fd=18,pgrp=16069,timeout=300,minproto=5,maxproto=5,indirect,pipe_ino=61379 0 0 /dev/drbd0 /data xfs rw,relatime,attr2,inode64,noquota 0 0 tmpfs /run/user/500 tmpfs rw,nosuid,nodev,relatime,size=14836788k,mode=700,uid=500,gid=500 0 0 tmpfs /run/user/1012 tmpfs rw,nosuid,nodev,relatime,size=14836788k,mode=700,uid=1012,gid=1013 0 0 tmpfs /run/user/1002 tmpfs rw,nosuid,nodev,relatime,size=14836788k,mode=700,uid=1002,gid=1003 0 0 (In reply to Eric Sandeen from comment #9) > Just to double check, was the dump itself (which fails to restore) created > with xfsdump v3.1.7 as well, or with a prior version? yes, ref comment 3 Maybe not drbd related. Since we have external metadata, I thought I would lvm-snapshot the underlying partition on the secondary and try making a backup from that. Files system looks fine from command line; mounts fine, traversable, etc. Same xfsdump/restore behavior: # vgs VG #PV #LV #SN Attr VSize VFree data 1 3 1 wz--n- 1.39t <3.72g # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert data data owi-aos--- 1.00t databk data swi-aos--- 400.00g data 0.19 metadata data -wi-ao---- 256.00m # grep databk /proc/mounts /dev/mapper/data-databk /mnt/data xfs ro,relatime,attr2,inode64,noquota 0 0 # xfsdump -l 0 -L sample -M sample - /mnt/data | xfsrestore -v trace -t - xfsrestore: using file dump (drive_simple) strategy xfsrestore: version 3.1.7 (dump format 3.0)xfsdump: using file dump (drive_simple) strategy xfsdump: version 3.1.7 (dump format 3.0) xfsrestore: searching media for dump xfsdump: WARNING: most recent level 0 dump was interrupted, but not resuming that dump since resume (-R) option not specified xfsdump: level 0 dump of ##########:/mnt/data xfsdump: dump date: Tue Apr 9 14:16:56 2019 xfsdump: session id: f144ade1-628d-4b75-bd1f-6636618d4337 xfsdump: session label: "sample" xfsdump: NOTE: root ino 64 differs from mount dir ino 128, bind mount? xfsdump: ino map phase 1: constructing initial dump list xfsdump: ino map phase 2: skipping (no pruning necessary) xfsdump: ino map phase 3: skipping (only one dump stream) xfsdump: ino map construction complete xfsdump: estimated dump size: 720882449792 bytes xfsdump: creating dump session media file 0 (media 0, file 0) xfsdump: dumping ino map xfsdump: dumping directories xfsrestore: examining media file 0 xfsrestore: file 0 in object 0 of stream 0 xfsrestore: file 0 in stream, file 0 of dump 0 on object xfsrestore: dump description: xfsrestore: hostname: ########## xfsrestore: mount point: /mnt/data xfsrestore: volume: /dev/mapper/data-databk xfsrestore: session time: Tue Apr 9 14:16:56 2019 xfsrestore: level: 0 xfsrestore: session label: "sample" xfsrestore: media label: "sample" xfsrestore: file system id: 019ff618-2141-4007-a765-fa7027e42d9a xfsrestore: session id: f144ade1-628d-4b75-bd1f-6636618d4337 xfsrestore: media id: 8a29e1bd-14b1-4b7b-98ed-478617790071 xfsrestore: searching media for directory dump xfsrestore: dump session label: "sample" xfsrestore: dump session id: f144ade1-628d-4b75-bd1f-6636618d4337 xfsrestore: stream 0, object 0, file 0 xfsrestore: initializing directory attributes registry xfsrestore: initializing directory entry name registry xfsrestore: initializing directory hierarchy image xfsrestore: reading directories xfsrestore: reading the ino map xfsrestore: reading the directories xfsrestore: tree.c:757: tree_begindir: Assertion `ino != persp->p_rootino || hardh == persp->p_rooth' failed. xfsdump: ending media file xfsdump: media file size 327680 bytes xfsdump: dump size (non-dir files) : 0 bytes xfsdump: NOTE: dump interrupted: 1 seconds elapsed: may resume later using -R option xfsdump: Dump Status: INTERRUPT Aborted What does: # ls -id /mnt/data and # xfs_db -r -c "sb 0" -p "rootino" /dev/mapper/data-databk say? The root cause in this case seems to be that
/* lookup head of hardlink list
*/
hardh = link_hardh( ino, gen );
yielded NH_NULL, so
assert( ino != persp->p_rootino || hardh == persp->p_rooth );
the (hardh == persp->p_rooth) part of the test failed and tripped the assert in this case.
I'm still stumped by the "bind mount" warning you got from dump, though.
(In reply to Eric Sandeen from comment #13) > What does: > > # ls -id /mnt/data > > and > > # xfs_db -r -c "sb 0" -p "rootino" /dev/mapper/data-databk > > say? # ls -id /mnt/data 128 /mnt/data # xfs_db -r -c "sb 0" -p "rootino" /dev/mapper/data-databk # xfs_db -r -c "sb 0" -c blockget -c blockuse /dev/data/databk block 0 (0/0) type sb Sorry, typo'd that first one: # xfs_db -r -c "sb 0" -c "p rootino" /dev/mapper/data-databk (should also be 128) (In reply to Eric Sandeen from comment #16) > Sorry, typo'd that first one: > > # xfs_db -r -c "sb 0" -c "p rootino" /dev/mapper/data-databk > > (should also be 128) ah, ok: # ls -id /mnt/data; xfs_db -r -c "sb 0" -c "p rootino" /dev/mapper/data-databk 128 /mnt/data rootino = 128 Thanks - and yet dump thinks the root inode is 64. Putting together another test now - thanks for all the info! Created attachment 1553947 [details]
bulkstat test
Can you do:
# yum install xfsprogs-devel
# gcc -o test_bulkstat test_bulkstat.c
# ./test_bulkstat /dev/device
(for the data device you're having trouble with) and see what it says?
# ./test_bulkstat /dev/data/databk
bulkstat: Inappropriate ioctl for device
Failed to bulkstat /dev/data/databk
# ./test_bulkstat /dev/mapper/data-databk
bulkstat: Inappropriate ioctl for device
Failed to bulkstat /dev/mapper/data-databk
# man xfsctl...
Filesystem Operations
In order to effect one of the following operations, the pathname and descriptor arguments passed to xfsctl() can be any open file in the XFS
filesystem in question.
...
XFS_IOC_FSBULKSTAT
...
# ./test_bulkstat /mnt/data
Stat root ino 128; bulkstat root ino 64
Does that work?
We have several other clusters, set up following a similar procedure (fresh xfs created on drbd partition on lv), and all give: Stat root ino 64; bulkstat root ino 64 Sorry for all the thinkos. Yes, it needs to be pointed at the mount point not the device, my mistake. Ok, so - the code added in the commit mentioned above to deal with bind mounts tries to use the bulkstat interface to determine the root inode (aka the lowest inode number on the filesystem), rather than stat-ing the mount point (which may not be the root directory). For some reason, this filesystem is coming up with a root inode of 64 from the bulkstat interface, whereas statfs & xfs_db (correctly) identify inode 128. Thanks for being so helpful with all my requests, can you also do: # xfs_info /mnt/data and # xfs_db -r -c "sb 0" -c "p" /dev/mapper/data-databk to print out the entire superblock? Hopefully w/ the geometry I can sort out why bulkstat is returning inode 64. Thanks, -Eric # xfs_info /mnt/data
meta-data=/dev/mapper/data-databk isize=256 agcount=4, agsize=67108864 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0 finobt=0 spinodes=0
data = bsize=4096 blocks=268435456, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal bsize=4096 blocks=131072, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
# xfs_db -r -c "sb 0" -c p /dev/mapper/data-databk
magicnum = 0x58465342
blocksize = 4096
dblocks = 268435456
rblocks = 0
rextents = 0
uuid = 019ff618-2141-4007-a765-fa7027e42d9a
logstart = 134217732
rootino = 128
rbmino = 129
rsumino = 130
rextsize = 1
agblocks = 67108864
agcount = 4
rbmblocks = 0
logblocks = 131072
versionnum = 0xb4a4
sectsize = 512
inodesize = 256
inopblock = 16
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 8
inopblog = 4
agblklog = 26
rextslog = 0
inprogress = 0
imax_pct = 5
icount = 78656
ifree = 38467
fdblocks = 104339553
frextents = 0
uquotino = null
gquotino = null
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 0
width = 0
dirblklog = 0
logsectlog = 0
logsectsize = 0
logsunit = 1
features2 = 0x8a
bad_features2 = 0x8a
features_compat = 0
features_ro_compat = 0
features_incompat = 0
features_log_incompat = 0
crc = 0 (unchecked)
spino_align = 0
pquotino = 0
lsn = 0
meta_uuid = 00000000-0000-0000-0000-000000000000
Very strange. I'll look more at the bulkstat interface, I don't see how it can possibly be returning an inode with a lower number than the root inode ... I'm not sure how sensitive the data on the filesystem is, but an xfs_metadump image should allow me to reproduce the problem quickly if you are willing to provide it. A metadump obfuscates file names by default and contains no file data. # xfs_metadump /dev/data/databk - | strings ...shows significant amounts of what appears to be file content. # find /mnt/data -print0 | xargs -0 ls -id | gawk '$1<128{print $1}' | sort -n | pr -16at
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
These are ordinary files and directories, created over the course of the last 8 years, some in the last few days.
(In reply to S Massey from comment #27) > # find /mnt/data -print0 | xargs -0 ls -id | gawk '$1<128{print $1}' | sort > -n | pr -16at > 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 > 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 > 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 > 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 > > These are ordinary files and directories, created over the course of the > last 8 years, some in the last few days. ** last 2 years ** Newer xfs_metadump does a better job of scrubbing out leakage. Short file names & attrs come through unobfsucated. No data blocks are included. In any case, that's fine. Ok, so - you actually have inodes allocated < 128. Thanks, I was going to ask that as well. I'm stumped on how a filesystem gets into this state, but apparently it can happen. I've never seen it before. But, in this case, the heuristic in xfsdump is clearly incorrect. FWIW, a prior RHEL7 version of xfsdump doesn't have this bulkstat heuristic, and without bind mounts, it should work just fine for you as a short term workaround. Any idea if this filesystem once had stripe geometry set? (it doens't today) - that can affect inode allocation. In any case, I guess the root cause for file inode numbers < root inode number isn't important at this point, it clearly can happen, and the test in xfsdump is problematic as a result. (In reply to Eric Sandeen from comment #30) > Any idea if this filesystem once had stripe geometry set? (it doens't > today) - that can affect inode allocation. > > In any case, I guess the root cause for file inode numbers < root inode > number isn't important at this point, it clearly can happen, and the test in > xfsdump is problematic as a result. Too bad there's not an ioctl that will just get superblock info (rootino, at least) from the underlying device. Re bind-mounts: Is it likely to be intuitive to most that xfsdump - /mnt/bindMountedNonRootSmallDir will dump everything in the possibly giant by comparison underlying filesystem, and that restored paths will not match the bind-mounted paths? Or should there be a warning and prompt? Or error out: if it's a bind-mounted non-root directory, won't the entire filesystem be mounted somewhere? The trick is finding the actual root inode of the bind-mounted subdir. Before the commit mentioned above we'd just stat the mount point and get its inode. It's not just that it led to unexpected results, it actually tripped this same assert (but for different reasons...) We thought the bulkstat trick would work, under the assumption that the lowest inode on the filesystem should be the first-allocated root inode, and we can obtain that with bulkstat. It looks like we'll need a different mechanism for detecting a bind mount mountpoint and obtaining the proper filesystem root inode. Yes, a dedicated ioctl would be nice... Retrieved 3.1.7 source and reverted just https://git.kernel.org/pub/scm/fs/xfs/xfsdump-dev.git/commit/?id=25195ebf107dc81b1b7cea1476764950e1d6cc9d to build minimally modified xfsdump. Works for me on the fs at issue. Good to hear. dchinner has a theory about how a filesystem could possibly get inodes allocated below the root inode number - it's a real edge case you hit :( Finally stumbled on a better way to obtain the root inode - bulkstat retrieves inodes in order starting with teh lowest one, but the root inode is the only one that should have a generation "0" - so this should be a way to ensure that the heuristic finds the proper root inode, not just the lowest numbered inode. I'm hitting this same problem. Is there any movement on a solution to this problem, or an upstream revert of the bind mount commit? Details: xfsdump on a freshly formatted filesystem (xfsprogs 5.6.0, xfsdump 3.1.9, kernel 5.7.0), is getting the wrong root inode number from bulkstat. Results of the debug output: $ ls -id /mnt/source; sudo xfs_db -r -c "sb 0" -c "p rootino" /dev/storage/source 8192 /mnt/source rootino = 8192 $ sudo ./test_bulkstat /mnt/source Stat root ino 8192; bulkstat root ino 128 Restoring this dump doesn't crash xfsrestore, but seems to create a corrupt restore with many messages of this type: xfsrestore: NOTE: ino 17874 salvaging file, placing in orphanage/8192.0/XXXX xfsrestore: NOTE: ino 17875 salvaging file, placing in orphanage/8192.0/XXXX xfsrestore: NOTE: ino 17876 salvaging file, placing in orphanage/8192.0/XXXX xfsrestore: NOTE: ino 17877 salvaging file, placing in orphanage/8192.0/XXXX xfsrestore: NOTE: ino 17878 salvaging file, placing in orphanage/8192.0/XXXX xfsrestore: NOTE: ino 17879 salvaging file, placing in orphanage/8192.0/XXXX xfsrestore: NOTE: ino 17880 salvaging file, placing in orphanage/8192.0/XXXX I've reverted commit https://git.kernel.org/pub/scm/fs/xfs/xfsdump-dev.git/commit/?id=25195ebf107dc81b1b7cea1476764950e1d6cc9d and am re-running the xfsdump/xfsrestore now. Background: Using xfsdump - /mnt/source | xfsrestore - /mnt/dest to do data migration from an older filesystem to a temp location, then back to the original hardware after reformatting with rmapbt=1. /mnt/source is a freshly created filesystem which was created by dumping another, older XFS filesystem. /mnt/dest is a freshly created filesystem /mnt/source is created with mkfs.xfs without any additional options /mnt/dest is created with mkfs.xfs -m rmapbt=1 Ok, I almost think we need to just revert 25195eb xfsdump: handle bind mount targets to resolve this, it was trying to fix a corner case of bind mounts, and broke a common case of non-bind-mounts. Upstream we have a better way to handle this, but it won't be possible in RHEL7. Hm, on second thought, reverting it won't fix existing dumps. The problem is that for unique geometries, xfsdump 3.1.7 put the wrong root inode number in the dumpfile, I think. Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7. From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. From the RHEL life cycle page: https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase "During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available." If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes: https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns. [0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7 Apologies for the inadvertent closure. I apologize for this being open for so long, but at this point can you please let me know if the customer is still waiting for a fix, and whether this is something we need to address in RHEL7, or if it's now too late or irrelevant? Thanks, -Eric Per comment 33, we are using a modified version of xfsdump on the affected servers. We can continue to do that until eol/upgrade, barring some future important update to xfsdump that makes the reversion difficult -- something I am guessing is unlikely at this point? Regarding your comment 43, it occurs to me that whereas pre-existing dumps may be unusable, folks are continuing to create new dumps and those may also be unusable, unbeknownst to them. Some fix might at least allow their next backup to be usable (assuming they keep up with updates). I am curious whether these dumps with the wrong rootino can be patched -- is it as simple as changing that field (some short sequence(s) of bytes)? Upstream xfsdump has been fixed to no longer generate these broken dumps. Unfortunately there is not yet any well-tested method to recover from a pre-existing broken dump. We can at least get the fix for the root cause into RHEL7, now. -Eric (In reply to Eric Sandeen from comment #74) > Upstream xfsdump has been fixed to no longer generate these broken dumps. > Unfortunately there is not yet any well-tested method to recover from a > pre-existing broken dump. > > We can at least get the fix for the root cause into RHEL7, now. > > -Eric Ok I think we need zstream=+ to do an errata, and QE to ack. Zorro, a zstream update of rhel7 xfsdump is proposed, please help to review. Thanks! Created attachment 1867699 [details]
Test cases for xfsrestore
I've created some test cases for the different causes of the xfsrestore problem.
I'm also looking at Gao's work upstream with a view to auto detecting the
problem and using his method of finding the root inode to resolve the problem.
This will likely end up in a new bug that targets only the restore side.
Hm, I might be confusing the new-ish tape problem with the original root inode detection problem....? Oh. Ok; Donald's testcases seem to be already-broken dumps. The bugfix so far simply stops creating these broken dumps in the first place. These bugzillas have gotten a bit confusing, but I think it is well worth fixing the root cause now, and if we think a workaround in xfsrestore for existing dumps may be possible, that should be a new bug with a new errata. Eric xfs/544 and xfs/545 should verify that the root cause is fixed. Thanks Eric for the clarification!
Newer xfstests can't be built on rhel7 and xfs/54{4,5} are not in older xfstests.
Manually run some xfstests quick tests on the new build, looks fine.
[root@xzhou-rhel-79-updates-202208020_x86_64 xfstests]# ./check -g xfs/quick
FSTYP -- xfs (non-debug)
PLATFORM -- Linux/x86_64 xzhou-rhel-79-updates-202208020_x86_64 3.10.0-1160.76.1.el7.x86_64 #1 SMP Tue Jul 26 14:15:37 UTC 2022
MKFS_OPTIONS -- -f -f -b size=4096 /dev/loop1
MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/loop1 /loopsch
xfs/001 7s
...
Failures: xfs/096 xfs/122 xfs/148 xfs/263 xfs/289 xfs/491 xfs/492 xfs/493 xfs/499 xfs/500 xfs/514 xfs/515
Failed 12 of 255 tests
[root@xzhou-rhel-79-updates-202208020_x86_64 xfstests]# rpm -qf /usr/sbin/xfsrestore
xfsdump-3.1.7-2.el7_9.x86_64
[root@xzhou-rhel-79-updates-202208020_x86_64 xfstests]#
#baseline
Failures: xfs/096 xfs/122 xfs/148 xfs/263 xfs/289 xfs/491 xfs/492 xfs/493 xfs/499 xfs/500 xfs/514 xfs/515
Failed 12 of 255 tests
[root@xzhou-rhel-79-updates-202208020_x86_64 xfstests]# rpm -qf /usr/sbin/xfsrestore
xfsdump-3.1.7-1.el7.x86_64
[root@xzhou-rhel-79-updates-202208020_x86_64 xfstests]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (xfsdump bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:6573 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |