shutdown system with usb drive attached, xfs recovered but I thought I'd run xfs_repair to check the disk. Found that each time xfs_repair runs, it lists multiple different errors, such as: block (7,1302-1303) multiply claimed by bno space tree, state - 2 clearing needsrepair flag and regenerating metadata block (1,5468-5469) multiply claimed by bno space tree, state - 2 block (0,12921436-12921436) multiply claimed by bno space tree, state - 2 block (3,2173-2174) multiply claimed by bno space tree, state - 2 block (6,3451-3452) multiply claimed by bno space tree, state - 2 block (4,527038-527039) multiply claimed by bno space tree, state - 2 block (5,2686-2687) multiply claimed by bno space tree, state - 2 block (7,1302-1303) multiply claimed by cnt space tree, state - 2 Metadata CRC error detected at 0x5640aac626ad, xfs_cntbt block 0x100184740/0x1000 btree block 2/198890 is suspect, error -74 bad magic # 0 in btcnt block 2/198890 each time I run I get different errors. When I check smarttools it says no errors, drive is OK. I even ran longtest on 2 different drives and they both reported no errors, yet xfs_repair has issues every time it is run. I've run seagate linux tools, and they also report the drive is OK. This is also happening on a drive I purchased this last week. I suppose it could be defective, but having this problem on 4 different USB drives at the same time seems a bit too much of a coincidence. Reproducible: Always Steps to Reproduce: See above. Actual Results: See above. Expected Results: See above. I'm at a bit of a loss as to whether this is a file system bug, or if xfs_repair is broken. Shouldn't 1 or 2 passes of xfs_repair fix issues if they actually exist. Seems odd that it just finds another problem everytime it is run. I am aware of the xfs bug: https://www.phoronix.com/news/Linux-6.3.5-Released I am running 6.3.5 but for a time was running 6.3.3 and 6.3.4 - but didn't encounter any errors in the journal at that time so thought I wasn't affected by the bug. Is it ill-advised to run xfs_repair unless you're in a situation of last resort, i.e. it is a tool that can cause damage? Thanks!
I'm doing some more testing moving the usb drives to another system using another cable and am now getting different results. It will take some time for me to do different tests so would like to keep this open just in case I can't figure this out. In the meantime if you have any tips or ideas of other things to try, please advise. Lowering priority and severity.
OK, I purchased a new usb 3.2 card and new cables and the problem still occurs on my workstation. If I plug it into the USB port on my laptop it works fine. On the workstation, no errors in the logs about the usb drive. For some reason xfs_repair is detecting issues which do not exist. Any ideas to figure out what is going on?
For starters, this has nothing to do with the kernel bug reported by phoronix. Please provide an xfs_metadump of the filesystem in question[1] (this will let us recreate the problem from a filesystem image), and let us know which version of xfs_repair you are using (xfs_repair -V) If the metadump image is too big to attach to the bug, feel free to reach out to me via email. [1] # umount /dev/whatever # xfs_metadump /dev/whatever filename.meta # bzip2 filename.meta
Thanks Eric, I have found some additional information which may help. Please review and then let me know what additional information I should provide to assist. I am using xfs_repair version 6.1.0 First of all, to try to resolve the issue I purchased a new USB card: Inateck PCIe to USB 3.2 Gen 2 Card with 20 Gbps Bandwidth, 3 USB Type-A and 2 USB Type-C Ports, RedComets U21 I then purchased all new USB cables. The problem still occurs. I found however, that if I unplug the drive from the workstation and run xfs_repair on my laptop, then it runs clean, and finds no errors. To me, that seems to imply that XFS itself is running fine on the workstation but that xfs_repair for some reason is reporting false positives. It's not causing a problem for me in that these USB drives are being used for backup purposes and I can just restore the drive by running another rsync, but the concerning thing is that is it appears, at least for some systems, if you run xfs_repair against usb drives, then you'll lose data because xfs_repair will be moving files unnecessarily to lost+found. I suppose the data is still actually there in lost+found, but IMO it would be a PITA to get the files renamed, etc. Below is a sample of the errors I receive on my workstation, followed by the clean run on my laptop. I've deleted quite a bit of the error messages, just wanted to give you an idea of the type of issues being reported. >>>>>> HERE ARE THE ERRORS SHOWN ON THE WORKSTATION >>>>>> HERE ARE THE ERRORS SHOWN ON THE WORKSTATION >>>>>> HERE ARE THE ERRORS SHOWN ON THE WORKSTATION xfs_repair -n /dev/sdh xfs_repair reported alot of issues, I deleted quite a bit, here is a small sample: Phase 1 - find and verify superblock… Phase 2 - using internal log - zero log… - scan filesystem freespace and inode maps… - found root inode chunk Phase 3 - for each AG… - scan (but don’t clear) agi unlinked lists… - process known inodes and perform inode discovery… - agno = 0 - agno = 1 inode identifier 2147862912 mismatch on inode 2147869056 would have cleared inode 2147869056 inode identifier 2147862913 mismatch on inode 2147869057 would have cleared inode 2147869057 - agno = 2 inode identifier 4295556032 mismatch on inode 4295565184 would have cleared inode 4295565184 inode identifier 4295556033 mismatch on inode 4295565185 would have cleared inode 4295565185 - agno = 3 inode identifier 6453851908 mismatch on inode 6453931524 would have cleared inode 6453931524 inode identifier 6453851909 mismatch on inode 6453931525 would have cleared inode 6453931525 - agno = 4 - agno = 5 - agno = 6 inode identifier 12885557312 mismatch on inode 12885563840 would have cleared inode 12885563840 - agno = 7 - process newly discovered inodes… Phase 4 - check for duplicate blocks… - setting up duplicate extent list… - check for inodes claiming duplicate blocks… - agno = 0 - agno = 4 - agno = 1 - agno = 7 - agno = 6 - agno = 3 - agno = 2 - agno = 5 entry “background-43.jpg” at block 1 offset 96 in directory inode 2147622656 references free inode 2147869056 would clear inode number in entry at offset 96… entry “04_last_night.opus” at block 0 offset 200 in directory inode 2147864530 references free inode 2147869057 would clear inode number in entry at offset 200… inode identifier 2147862912 mismatch on inode 2147869056 would have cleared inode 2147869056 inode identifier 2147862913 mismatch on inode 2147869057 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity… - traversing filesystem … Metadata CRC error detected at 0x55f6d4375b70, xfs_dir3_block block 0x347e8/0x1000 expected owner inode 152592, got 146784, directory block 215016 would rebuild directory inode 152592 would create missing “.” entry in dir ino 152592 entry “background-43.jpg” in directory inode 2147622656 points to free inode 2147869056, would junk entry bad hash table for directory inode 2147622656 (no data entry): would rebuild would rebuild directory inode 2147622656 Metadata CRC error detected at 0x55f6d4377460, xfs_dir3_leaf1 block 0x284019d48/0x1000 leaf block 8388608 for directory inode 10737498380 bad CRC would rebuild directory inode 10737498380 - traversal finished … - moving disconnected inodes to lost+found … disconnected inode 751929, would move to lost+found disconnected inode 751930, would move to lost+found disconnected inode 751931, would move to lost+found disconnected inode 751932, would move to lost+found Phase 7 - verify link counts… would have reset inode 4304567810 nlinks from 6708 to 6706 No modify flag set, skipping filesystem flush and exiting. >>>>>> HERE IS THE CLEAN RUN ON MY LAPTOP >>>>>> HERE IS THE CLEAN RUN ON MY LAPTOP >>>>>> HERE IS THE CLEAN RUN ON MY LAPTOP And here is the same drive a few moments later with xfs_repair on my laptop: xfs_repair /dev/sdb Phase 1 - find and verify superblock… Phase 2 - using internal log - zero log… - scan filesystem freespace and inode maps… - found root inode chunk Phase 3 - for each AG… - scan and clear agi unlinked lists… - process known inodes and perform inode discovery… - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes… Phase 4 - check for duplicate blocks… - setting up duplicate extent list… - check for inodes claiming duplicate blocks… - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 Phase 5 - rebuild AG headers and trees… - reset superblock… Phase 6 - check inode connectivity… - resetting contents of realtime bitmap and summary inodes - traversing filesystem … - traversal finished … - moving disconnected inodes to lost+found … Phase 7 - verify and correct link counts… done
Please provide the metadump as requested, then we can quickly differentiate between "this set of on-disk metadata is in fact repairable in one pass by xfs_repair" and "something about the metadata on *this hardware* does not stay 'repaired' after an xfs_repair run." I do tend to suspect hardware errors are a possible culprit. For example: > inode identifier 2147862912 mismatch on inode 2147869056 Those 2 numbers in binary are: 10000000000001011100100110000000 10000000000001011110000110000000 ^ > inode identifier 2147862913 mismatch on inode 2147869057 10000000000001011100100110000001 10000000000001011110000110000001 ^ Those looks suspiciously like bit-flips. > inode identifier 4295556033 mismatch on inode 4295565185 100000000000010001111101111000001 100000000000010010001111110000001 That one is more than one bit off though, so not sure. This is almost never the culprit, but any chance you have memory errors? Is the laptop flaky in any other way? Can you run a memory tester, just for fun?
Hi Eric, thanks for the quick reply. The metadata file is too large for attachment, so I'll send a link to your email. If you have any issues downloading, please let me know. Also, just to make sure you understand: The error is happening on my workstation. The same drive when attached to my laptop and then using xfs_repair finishes with no errors. That is why I'm thinking it was a false positive. My workstation doesn't have any errors in the journal nor warnings regarding memory. I'll look into running a memory tester. Here is the info from neofetch, fyi. OS: Fedora release 38 (Thirty Eight) x86_64 .:cccccccccccccccccccccccccc:. Kernel: 6.3.8-200.fc38.x86_64 .;ccccccccccccc;.:dddl:.;ccccccc;. Uptime: 15 hours, 7 mins .:ccccccccccccc;OWMKOOXMWd;ccccccc:. Packages: 4935 (rpm), 1 (flatpak) .:ccccccccccccc;KMMc;cc;xMMc:ccccccc:. Shell: bash 5.2.15 ,cccccccccccccc;MMM.;cc;;WW::cccccccc, Resolution: 1920x1080 :cccccccccccccc;MMM.;cccccccccccccccc: DE: Plasma :ccccccc;oxOOOo;MMM0OOk.;cccccccccccc: WM: kwin cccccc:0MMKxdd:;MMMkddc.;cccccccccccc; Theme: Breeze [GTK2], Adwaita [GTK3] ccccc:XM0';cccc;MMM.;cccccccccccccccc' Icons: breeze [GTK2], Adwaita [GTK3] ccccc;MMo;ccccc;MMW.;ccccccccccccccc; Terminal: konsole ccccc;0MNc.ccc.xMMd:ccccccccccccccc; CPU: AMD FX-8350 (8) @ 3.740GHz cccccc;dNMWXXXWM0::cccccccccccccc:, GPU: AMD ATI Radeon HD 7850 / R7 265 / R9 270 1024SP cccccccc;.:odl:.;cccccccccccccc:,. Memory: 10599MiB / 31975MiB
Ah ok so I had the laptop & workstation backwards - in any case, it does seem to point to hardware. The metadump you provided (thanks) contains no metadata inconsistencies, xfs_repair runs clean. Did you gather it from the laptop or from the workstation?
(In reply to Eric Sandeen from comment #7) > Ah ok so I had the laptop & workstation backwards - in any case, it does > seem to point to hardware. > > The metadump you provided (thanks) contains no metadata inconsistencies, > xfs_repair runs clean. Did you gather it from the laptop or from the > workstation? Hey Eric, again thanks for the quick response. I gathered it from the system that is getting the errors, the workstation. So to rehash: 1. xfs_repair gets alot of errors when running against usb drives on my workstation. 2. xfs_metadata creates a file with no errors on my the workstation 3. when running xfs_repair against the drive on the laptop, it finds no errors, which I suppose makes sense, since the file created by xfs_metadata on the workstation has no errors. The following may or may not mean anything, because I know nothing about XFS. I'm just hacking around with bard.google.com helping me - but I thought I'd share: I was curious so I ran xfs_db -f sea8000_2.meta and received this reply: xfs_db: sea8000_2.meta is not a valid XFS filesystem (unexpected SB magic number 0x5846534d) Use -F to force a read attempt. Then I used -F: xfs_db: sea8000_2.meta is not a valid XFS filesystem (unexpected SB magic number 0x5846534d) xfs_db: V1 inodes unsupported. Please try an older xfsprogs. I then found this: You can easily check what on-disk format you are using by running xfs_info /mount/point. It will say crc=0 if you are using v4 and crc=1 if you are using v5. So then I ran xfs_info: xfs_info /dev/sdh meta-data=/dev/sdh isize=512 agcount=8, agsize=268435455 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=0 inobtcount=0 nrext64=0 data = bsize=4096 blocks=1953506645, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 My understanding is that crc=1 means I'm using the V5 format, so my understanding is I shouldn't be getting this message.
P.S. Also, another data point that may or may not be helpful. These drives do not have a partition table, I created with mkfs.xfs /dev/xxx
The xfs metadump is not a file system image, it is more like a file that contains a filesystem image. You can use xfs_mdrestore to turn it back into a proper image that xfs_db can look at. Whether or not you have a partition table doesn't really matter at all here. At this point I think you have a hardware issue somewhere. If you have a disposable drive that you can overwrite, perhaps you can try some block device integrity checking or read/write IO to see what you get. Or, back up to an older kernel just to see if there is some regression in a hardware driver. But I'm just throwing darts now ... this does not look like an xfsprogs bug to me, so I'm not sure how much further help I can provide.
Thanks for the reply Eric. I'm trying to understand what type of hardware issue it could be. When I remove the drive from my workstation and plug into the laptop, it works fine - so to me that would eliminate the drive itself from having a hardware issue. I purchased a new usb card for my computer and the issue still occurs, wouldn't that eliminate the usb port as a cause? When I run xfs_metadump it creates the file image cleanly, it has no errors. Since no errors are found on the laptop when I run xfs_repair, and when I create the xfs_metadump on the workstation, it has no errors, I would conclude that xfs itself is not having any issues on the workstation. Is that not the correct assumption? The only issue is xfs_repair running on this particular machine, apparently for whatever reason detecting false positives. Why would it be a hardware issue if xfs itself and xfs_metadata are working properly. The thing that is failing is xfs_repair.
Are the workstation & the laptop the same architecture? Do kernel version or xfsprogs version differ? Might be interesting to try to boot the laptop kernel version on the workstation, and install the same xfsprogs if that is at all possible. If it's same arch, same kernel, and same xfsprogs but it finds errors on one machine and not the other I'm kind of out of ideas aside from a hardware difference (or problem). Is there /any/ chance that when you're running repair on the workstation, it's been magically mounted somewhere else by $SOMETHING (systemd-fu or gnome-fu), and xfs_repair is trying to repair a live, mounted device?
Hi Eric, Thanks again for trying to help. Much appreciated. Yes, it's definitely weird, the workstation and laptop are the same architecture, using the same version of kernel and same version of xfsprogs. I don't believe the drive is mounted anywhere else. I just purchased yet another usb drive and am getting the same result, so that is two new drives that are experience the same issue. I agree that there is something on my workstation that is causing the issue with xfs_repair, but I don't have a clue as to what it could be. Since when running xfs_repair on the drive using the laptop runs clean I'm assuming that the data on the drive is good. If you can think of some additional way to trace xfs_repair to find out what could be happening let me know and I'll try it. My workstation was built in 2015 so maybe it's some bios quirk that xfs_repair doesn't like for some reason.
The only other thing I can think of is to test your memory, I'm afraid. Or if you have space, dd the drive (without mounting it on either) to a file image on both systems, and compare the results. if they differ, the problem lies well outside xfs_repair.