2208553 – xfs metadata corruption after upgrade to 6.3.3 kernel

Bug 2208553 - xfs metadata corruption after upgrade to 6.3.3 kernel

Summary: xfs metadata corruption after upgrade to 6.3.3 kernel

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	38
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-05-19 14:00 UTC by Chris Caudle
Modified:	2023-06-02 19:19 UTC (History)
CC List:	38 users (show)
Fixed In Version:	kernel-6.3.4-101.fc37 kernel-6.3.4-201.fc38
Clone Of:
Environment:
Last Closed:	2023-05-31 01:44:55 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Chris Caudle 2023-05-19 14:00:28 UTC

Updated to kernel 6.3.3 from updates testing repository.
Several minutes after logging in again I began to get errors indicating that some files could not be written.  Shutdown with systemctl and while rebooting a filesystem error was detected and the system dropped to single user rescue mode.
This message was in the journal:

kernel: XFS (sdd1): Metadata corruption detected at xfs_buf_ioend+0x154/0x5f0 [xfs], xfs_inode block 0x897ed60 xfs_inode_buf_verify
kernel: XFS (sdd1): Unmount and run xfs_repair
kernel: XFS (sdd1): First 128 bytes of corrupted metadata buffer:
kernel: 00000000: 41 75 74 6f 6d 61 74 69 6f 6e 43 6f 6e 74 72 6f  AutomationContro
kernel: 00000010: 6c 45 45 53 61 49 53 34 5f 45 45 43 34 45 4f 53  lEESaIS4_EEC4EOS
kernel: 00000020: 61 49 53 74 31 30 5f 4c 69 73 74 5f 6e 6f 64 65  aISt10_List_node
kernel: 00000030: 49 53 34 5f 45 45 00 4e 6f 42 61 73 65 48 6f 6f  IS4_EE.NoBaseHoo
kernel: 00000040: 6b 49 64 00 72 65 76 65 72 73 65 5f 69 74 65 72  kId.reverse_iter
kernel: 00000050: 61 74 6f 72 3c 73 74 64 3a 3a 5f 4c 69 73 74 5f  ator<std::_List_
kernel: 00000060: 63 6f 6e 73 74 5f 69 74 65 72 61 74 6f 72 3c 73  const_iterator<s
kernel: 00000070: 69 67 63 3a 3a 69 6e 74 65 72 6e 61 6c 3a 3a 74  igc::internal::t
kernel: XFS (sdd1): metadata I/O error in "xfs_imap_to_bp+0x5c/0x80 [xfs]" at daddr 0x897ed60 len 32 error 117

I attempted to run xfs_repair on the device (it happened to be the device mounted as /home, so the system files all loaded correct from /), but xfs_repair reported that there were dirty entries in the journal, so I should mount to let the journal entries replay.  Mount failed with an error so I had to run xfs_repair -L to ignore the log replay.  

After running xfs_repair -L I ran xfs_repair with no additional options to verify that the filesystem was marked as clean, then rebooted.
After logging  in the same error occurred within a few minutes.

Did the same xfs_repair, had a journal/log entry, tried to mount, couldn't mount, xfs_repair -L, xfs_repair two more times to make sure the filesystem was OK, rebooted again.
After logging in the same error occurred within a few minutes.

Ran xfs_repair -L etc. from rescue mode, rebooted back to previous 6.2.14 kernel.

Has been running around 12 hours with no errors so far, so seems highly likely that something in 6.3.3 is choking on my XFS filesystem.  
I have three devices formatted as XFS, /, a data volume on /mnt/data, and /home.
It is possible that /home was the only filesystem getting much write data right after boot, but I would assume that /var has a lot of writes as well (the only volume not xfs is /boot formatted as ext4).

Reproducible: Always

Steps to Reproduce:
1. Upgrade to kernel 6.3.3 from updates-testing on system with xfs filesystems
2. Reboot to 6.3.3
3.
Actual Results:  
Filesystem errors

Expected Results:  
Filesystem which works with 6.2.14 continues to work with 6.3.3

Comment 1 Justin M. Forbes 2023-05-19 14:24:50 UTC

A couple of questions.

A) What mount options are you using for home?
B) What is the drive health like? smartctl should be able to help here.
c) Do you recall when the filesystem was made? Fedora version/kernel version/year some ballpark.

Comment 2 Chris Caudle 2023-05-19 15:04:20 UTC

(In reply to Justin M. Forbes from comment #1)
> A) What mount options are you using for home?

UUID=60cf5755-8094-46d2-a297-c0b4c1ce03d9  /home    xfs     defaults        0 2

> B) What is the drive health like? smartctl should be able to help here.

The "drive" is actually a logical volume on an 8-drive RAID 5 array.
All the physical drives and logical volumes are reported OK by the array controller (HP SmartArray P420)
=> ctrl slot=2 pd all show status

   physicaldrive 1I:2:1 (port 1I:box 2:bay 1, 300 GB): OK
   physicaldrive 1I:2:2 (port 1I:box 2:bay 2, 300 GB): OK
   physicaldrive 1I:2:3 (port 1I:box 2:bay 3, 300 GB): OK
   physicaldrive 1I:2:4 (port 1I:box 2:bay 4, 300 GB): OK
   physicaldrive 2I:2:5 (port 2I:box 2:bay 5, 300 GB): OK
   physicaldrive 2I:2:6 (port 2I:box 2:bay 6, 300 GB): OK
   physicaldrive 2I:2:7 (port 2I:box 2:bay 7, 300 GB): OK
   physicaldrive 2I:2:8 (port 2I:box 2:bay 8, 300 GB): OK


=> ctrl slot=2 ld all show status

   logicaldrive 1 (597 MB, RAID 5): OK  <-- this is /boot ext4
   logicaldrive 2 (4.0 GB, RAID 5): OK  <-- this is swap
   logicaldrive 3 (100.0 GB, RAID 5): OK <-- this is / xfs
   logicaldrive 4 (100.0 GB, RAID 5): OK <-- this is /home xfs
   logicaldrive 5 (1.7 TB, RAID 5): OK <-- this is /mnt/data xfs


> c) Do you recall when the filesystem was made? Fedora version/kernel
> version/year some ballpark.

The /etc/fstab file was created in June 2018, so I would expect that is when all the filesystems were first formatted.

Comment 3 Chris Caudle 2023-05-19 15:16:35 UTC

Minor correction: my currently running kernel is 6.2.15, I had already updated from 6.2.14, so the note above about going from 6.3.3 back to 6.2.14 should actually be went from 6.3.3 back to 6.2.15.

Comment 4 Justin M. Forbes 2023-05-23 12:24:05 UTC

Mind giving https://koji.fedoraproject.org/koji/taskinfo?taskID=101466330 a try to see if that fixes things for you?

Comment 5 Chris Caudle 2023-05-23 14:55:07 UTC

(In reply to Justin M. Forbes from comment #4)
> Mind giving https://koji.fedoraproject.org/koji/taskinfo?taskID=101466330 a
> try to see if that fixes things for you?

Sorry, I don't follow.  That link seems to be to a src rpm for the kernel.  Are you asking me to rebuild the kernel rpms with a configuration change?

Comment 6 Justin M. Forbes 2023-05-23 15:11:59 UTC

Sorry, that was the top level build, clicking on the x86_64 task takes you to https://koji.fedoraproject.org/koji/taskinfo?taskID=101466371 which links the actual built rpms.

Comment 7 Justin M. Forbes 2023-05-23 21:08:15 UTC

So it seems that my backport was not the culprit, and people are still able to reproduce after a revert. The next question is whether this reproduces on 6.4-rc3 or not.  Is it already fixed upstream, and the fix needs to be backported?  Please try:

https://koji.fedoraproject.org/koji/buildinfo?buildID=2204155

Comment 8 Justin M. Forbes 2023-05-23 21:31:49 UTC

Just for additional discussion on this issue:

https://www.spinics.net/lists/linux-xfs/msg71084.html

and 

https://bodhi.fedoraproject.org/updates/FEDORA-2023-514965dd8a

Comment 9 Dave Chinner 2023-05-24 06:03:44 UTC

We need to know the full hardware and storage config of the machines hitting this problem - CPUs, memory, storage devices. sotrage device configs, storage stack layout, filesystem configs (xfs_info output), mount options, etc. This will help us find commonalities between teh environments that are seeing the issue.

We need the full dmesg output from boot time when a corruption is hit. Please do not chop bits out of the logs that you think we don't need, that always results in the bits we actually need getting chopped out.

Can you please turn /proc/fs/xfs/error_level up to 11 before reproducing the issue. This will dump more infromationin teh logs when certain XFS errors are hit.

If the problem does not take down the root filesystem, can you please unmount the filesytsem that was shut down and take a metadump of it. PLease compress the metadump and provide a link for us to access it. This will allow us to examine the state of the filesystem on disk at teh time of the corruption.

We also need a description of teh workload that is being run when these problems are hit. That will tell us how the filesytem is being modified in memory when the corruption is hit. It will also give us an idea of what the difference is between the on-disk state in teh metadump and the in-memory state when teh corruption is detected. This will give us some idea where the corrupted metadata is coming from.

-Dave.

Comment 10 Rune Kleveland 2023-05-24 14:31:46 UTC

Hi!

All our "crawling servers" seems to crash consistently when using this kernel, and it seems to always be the root partition crashing. The crawling servers has high load and basically runs many instances of Google Chrome.

Here is an attempt to provide information about a particular crash. This is a server installed as Fedora 31 and upgraded to Fedora 38 step by step. It has worked fine till now.

After booting to kernel 6.3.3, the file system crashed almost immediately. We did not have to start the processes. At other occasions it has taken longer, up to a day, and on those the file system is created more recently. It's just a hunch, we do not have that many observations. I believe we have 8 crashes totally now of 12 servers running this kernel for some time.

Here are links to what you hopefully requested for the particular crash:

https://pascal.opoint.com/kernel6.3/fstab
https://pascal.opoint.com/kernel6.3/hardwareinfo
https://pascal.opoint.com/kernel6.3/kernellog
https://pascal.opoint.com/kernel6.3/metadump

Please let me know if we can do anything else to help.

/Rune

Comment 11 Eric Sandeen 2023-05-24 22:16:32 UTC

I think I noticed that Rune & Chris both have HP 420 raid controllers. Is this correct and I assume these are independent reports?

Comment 12 Eric Sandeen 2023-05-24 22:21:54 UTC

More questions - Rune, what was the previous kernel you were running without this problem?

Chris's report said the last good kernel was 6.2.14. Narrowing that down might help.

Talking to Dave, we also wonder how these raid controllers are configured. They advertise no write cache, can you confirm that the drives behind these controllers also have write caches disabled, and/or that their battery backup (if they have one) is performing correctly?

Comment 13 Dave Chinner 2023-05-25 06:33:25 UTC

(In reply to Rune Kleveland from comment #10)
> Hi!
> 
> All our "crawling servers" seems to crash consistently when using this
> kernel, and it seems to always be the root partition crashing. The crawling
> servers has high load and basically runs many instances of Google Chrome.
> 
> Here is an attempt to provide information about a particular crash. This is
> a server installed as Fedora 31 and upgraded to Fedora 38 step by step. It
> has worked fine till now.
> 
> After booting to kernel 6.3.3, the file system crashed almost immediately.
> We did not have to start the processes. At other occasions it has taken
> longer, up to a day, and on those the file system is created more recently.
> It's just a hunch, we do not have that many observations. I believe we have
> 8 crashes totally now of 12 servers running this kernel for some time.
> 
> Here are links to what you hopefully requested for the particular crash:
> 
> https://pascal.opoint.com/kernel6.3/fstab
> https://pascal.opoint.com/kernel6.3/hardwareinfo
> https://pascal.opoint.com/kernel6.3/kernellog
> https://pascal.opoint.com/kernel6.3/metadump

So this has failed at runtime because it tripped over an inode block full of garbage:

May 24 12:04:12 newton59.opoint.com kernel: XFS (sda3): Metadata corruption detected at xfs_buf_ioend+0x154/0x5f0 [xfs], xfs_inode block 0x112ac740 xfs_inode_buf_verify
May 24 12:04:12 newton59.opoint.com kernel: XFS (sda3): Unmount and run xfs_repair
May 24 12:04:12 newton59.opoint.com kernel: XFS (sda3): First 128 bytes of corrupted metadata buffer:
May 24 12:04:12 newton59.opoint.com kernel: 00000000: 31 45 20 24 41 41 46 46 32 44 39 46 38 44 43 43  1E $AAFF2D9F8DCC
May 24 12:04:12 newton59.opoint.com kernel: 00000010: 46 41 30 32 33 31 33 33 36 36 41 38 36 41 46 46  FA02313366A86AFF
May 24 12:04:12 newton59.opoint.com kernel: 00000020: 38 37 35 33 42 41 38 31 41 38 39 46 20 24 41 43  8753BA81A89F $AC
May 24 12:04:12 newton59.opoint.com kernel: 00000030: 31 34 44 37 37 37 33 42 46 41 31 44 32 35 45 34  14D7773BFA1D25E4
May 24 12:04:12 newton59.opoint.com kernel: 00000040: 43 46 42 39 34 36 34 38 46 30 42 43 38 39 33 44  CFB94648F0BC893D
May 24 12:04:12 newton59.opoint.com kernel: 00000050: 44 31 39 45 33 37 20 24 41 44 41 44 41 31 44 45  D19E37 $ADADA1DE
May 24 12:04:12 newton59.opoint.com kernel: 00000060: 46 30 34 44 45 44 37 41 36 42 42 33 32 43 38 30  F04DED7A6BB32C80
May 24 12:04:12 newton59.opoint.com kernel: 00000070: 44 36 44 43 35 39 33 42 35 42 43 30 38 39 45 46  D6DC593B5BC089EF

Note the address of this block: 0x112ac740

It is also tripping over a directory data block full of garbage:

May 24 12:04:12 newton59.opoint.com kernel: XFS (sda3): metadata I/O error in "xfs_imap_to_bp+0x5c/0x80 [xfs]" at daddr 0x112ac740 len 32 error 117
May 24 12:06:11 newton59.opoint.com kernel: XFS (sda3): Metadata CRC error detected at xfs_dir3_block_read_verify+0xd4/0x110 [xfs], xfs_dir3_block block 0x112a9be0 
May 24 12:06:11 newton59.opoint.com kernel: XFS (sda3): Unmount and run xfs_repair
May 24 12:06:11 newton59.opoint.com kernel: XFS (sda3): First 128 bytes of corrupted metadata buffer:
May 24 12:06:11 newton59.opoint.com kernel: 00000000: 35 44 33 43 31 39 30 35 42 43 20 24 46 45 41 46  5D3C1905BC $FEAF
May 24 12:06:11 newton59.opoint.com kernel: 00000010: 30 41 42 30 39 30 38 35 33 31 35 43 44 33 39 35  0AB09085315CD395
May 24 12:06:12 newton59.opoint.com kernel: 00000020: 46 35 31 31 43 44 45 45 39 31 30 32 34 36 32 46  F511CDEE9102462F
May 24 12:06:12 newton59.opoint.com kernel: 00000030: 35 38 39 35 0a 70 20 61 63 63 65 70 74 20 34 33  5895.p accept 43
May 24 12:06:12 newton59.opoint.com kernel: 00000040: 2c 35 33 2c 38 30 2c 31 31 30 2c 31 34 33 2c 31  ,53,80,110,143,1
May 24 12:06:12 newton59.opoint.com kernel: 00000050: 39 34 2c 32 32 30 2c 34 34 33 2c 38 37 33 2c 39  94,220,443,873,9
May 24 12:06:12 newton59.opoint.com kernel: 00000060: 39 31 2c 39 39 33 2d 39 39 35 2c 31 31 39 34 2c  91,993-995,1194,
May 24 12:06:12 newton59.opoint.com kernel: 00000070: 31 32 39 33 2c 33 36 39 30 2c 34 33 32 31 2c 35  1293,3690,4321,5

Note the address of this block: 0x112a9be0

And when I try to replay the log from the metadata to mount the image, it's trips over a freespace btree block full of garbage at block 0x112a9b08 and goes no further.

Lets put those three corruptions together: 0x112ac740, 0x112a9be0 and 0x112a9b08.

They are all within a range of ~5.8MB of disk space.

That's highly suspicious. So to do a quick scan of the state of all the metadata, I ran 'xfs_repair -n dump.img' to see what active metadata blocks it found that were corrupt:

$ grep Metadata t.t
Metadata CRC error detected at 0x5625c07feecd, xfs_bnobt block 0x112a9b08/0x1000
Metadata CRC error detected at 0x5625c07feecd, xfs_bnobt block 0x112ac150/0x1000
Metadata CRC error detected at 0x5625c07feecd, xfs_bnobt block 0x112a9bc0/0x1000
Metadata CRC error detected at 0x5625c07feecd, xfs_bnobt block 0x112a9b90/0x1000
Metadata CRC error detected at 0x5625c07feecd, xfs_bnobt block 0x112a9b40/0x1000
Metadata CRC error detected at 0x5625c07feecd, xfs_cntbt block 0x112ac140/0x1000
Metadata CRC error detected at 0x5625c07feecd, xfs_cntbt block 0x112ab920/0x1000
Metadata CRC error detected at 0x5625c07feecd, xfs_cntbt block 0x112aa3c0/0x1000
Metadata CRC error detected at 0x5625c07feecd, xfs_cntbt block 0x112a9c80/0x1000
Metadata CRC error detected at 0x5625c07feecd, xfs_cntbt block 0x112ac158/0x1000
Metadata CRC error detected at 0x5625c07feecd, xfs_cntbt block 0x112a9828/0x1000
Metadata CRC error detected at 0x5625c07feecd, xfs_cntbt block 0x112aad30/0x1000
Metadata corruption detected at 0x5625c07eb846, xfs_inode block 0x112aa3a0/0x4000
Metadata corruption detected at 0x5625c07eb846, xfs_inode block 0x112aa920/0x4000
Metadata corruption detected at 0x5625c07eb846, xfs_inode block 0x112aac00/0x4000
Metadata corruption detected at 0x5625c07eb846, xfs_inode block 0x112aac20/0x4000
Metadata corruption detected at 0x5625c07eb846, xfs_inode block 0x112abac0/0x4000
Metadata corruption detected at 0x5625c07eb846, xfs_inode block 0x112ab980/0x4000
Metadata corruption detected at 0x5625c07eb846, xfs_inode block 0x112ac1c0/0x4000
Metadata corruption detected at 0x5625c07eb846, xfs_inode block 0x112ac2c0/0x4000
Metadata corruption detected at 0x5625c07eb846, xfs_inode block 0x112ac2e0/0x4000
Metadata corruption detected at 0x5625c07eb846, xfs_inode block 0x112ac740/0x4000
Metadata corruption detected at 0x5625c07eb846, xfs_inode block 0x112ac760/0x4000
Metadata CRC error detected at 0x5625c08176c3, xfs_dir3_block block 0x112a9b20/0x1000
Metadata CRC error detected at 0x5625c08176c3, xfs_dir3_block block 0x112a9d50/0x1000
Metadata CRC error detected at 0x5625c08176c3, xfs_dir3_block block 0x112a9d58/0x1000
Metadata CRC error detected at 0x5625c08176c3, xfs_dir3_block block 0x112a9d48/0x1000
Metadata CRC error detected at 0x5625c08176c3, xfs_dir3_block block 0x112a9be0/0x1000
Metadata CRC error detected at 0x5625c08176c3, xfs_dir3_block block 0x112ab468/0x1000
Metadata CRC error detected at 0x5625c08177a3, xfs_dir3_data block 0x112a97d0/0x1000
Metadata CRC error detected at 0x5625c08177a3, xfs_dir3_data block 0x112a97e8/0x1000
Metadata CRC error detected at 0x5625c08176c3, xfs_dir3_block block 0x112aa998/0x1000
Metadata CRC error detected at 0x5625c08176c3, xfs_dir3_block block 0x112ab918/0x1000
Metadata CRC error detected at 0x5625c08176c3, xfs_dir3_block block 0x112ab900/0x1000

Yup, there were are. THere are lots of corrupt metadata blocks in the range from daddr 0x112a9820 to 0x112ab920. Metadata corruption is limited to a range of ~2100 sectors or just on 11MB of the entire block device space (of 300GB!). All the metadata blocks in the range have exactly the same sort of ascii data in them, which indicates it was probably a single corruption event that occurred. Note that the range of the corruption is likely larger than 2100 sectors - metadump only captures the metadata blocks and not the file data or empty space in teh filesystem so teh corruption likely extends further. There's a good chance that file data was also corrupted by whatever occurred.

I don't recognise what the data in teh metadata blocks is - does anyoe else?

As it stands, I'm having trouble seeing how this is being caused by an XFS. All of these metadata blocks are allocated and written independently at different times, so nothign XFs does would cause them all to be overwritten at the same time. I also can't see how it is failure to persist data/metadata because that leaves previous versions of the metadata behind and that's clearly not the case here. This looks much more like a misdirected user data write has occurred than anything else.

Now, XFS could map a data write incorrectly and submit it to the wrong place, but that implies metadata in memory has already been corrupted and we haven't tripped over it in any way before the user data write occurs. This could happen if there was some kind of kernel memory corruption occuring outside XFS, too. However, given the absense of any evidence that XFS is actually corrupted in memory or that any other sort of memory corruption has been occuring, there's no real evidence to point to XFS mapping data writes over the top of it's own metadata.

So I'm not ruling out that it is an XFS problem, it just looks unlikely at this point. We'll see as more data comes in.

Next question: does this problem still occur on a 6.4-rc3 kernel? Has anyone tested that yet?

-Dave.

Comment 14 Rune Kleveland 2023-05-25 06:43:37 UTC

My report is independent.

We do have P420 controllers with firmware 8.32. It also happened on a P830i controller with firmware 5.02. The latest good kernel id 6.2.15 for sure, we run it on 80 servers now.

One of the servers with this problem had caching disabled due to bad battery, the others had caching enabled.

Here is the output from ssacli 5.10 on two servers:

------------------------------------------
ssacli ctrl all show config detail

Smart Array P420i in Slot 0 (Embedded)
   Bus Interface: PCI
   Slot: 0
   Serial Number: 00143802604ACA0
   Cache Serial Number: PBKUD0BRH6WF7H
   RAID 6 Status: Enabled
   Controller Status: OK
   Hardware Revision: B
   Firmware Version: 8.32
   Firmware Supports Online Firmware Activation: False
   Rebuild Priority: Low
   Expand Priority: Medium
   Surface Scan Delay: 15 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 0 secs
   Cache Board Present: True
   Cache Status: Permanently Disabled
   Cache Status Details: Cache disabled; backup power source failed to charge to an acceptable level
   Cache Disable Reason: Permanent disable condition. The posted write cache has been disabled because the backup power source attached to the flash-backed write cache module has failed to charge.
   Drive Write Cache: Disabled
   Total Cache Size: 2.0
   Total Cache Memory Available: 1.8
   Battery Backed Cache Size: 1.8
   No-Battery Write Cache: Disabled
   SSD Caching RAID5 WriteBack Enabled: False
   SSD Caching Version: 1
   Cache Backup Power Source: Capacitors
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: Failed (Replace Batteries)
   SATA NCQ Supported: True
   Spare Activation Mode: Activate on physical drive failure (default)
   Controller Temperature (C): 77
   Cache Module Temperature (C): 56
   Capacitor Temperature  (C): 23
   Number of Ports: 2 Internal only
   Encryption: Not Set
   Driver Name: hpsa
   Driver Version: 3.4.20
   Driver Supports SSD Smart Path: True
   PCI Address (Domain:Bus:Device.Function): 0000:02:00.0
   Port Max Phy Rate Limiting Supported: False
   Host Serial Number: CZJ3170HPG
   Sanitize Erase Supported: False
   Primary Boot Volume: None
   Secondary Boot Volume: None



   Internal Drive Cage at Port 1I, Box 1, OK

      Drive Bays: 4
      Port: 1I
      Box: 1
      Location: Internal

   Physical Drives
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS HDD, 300 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS HDD, 300 GB, OK)



   Internal Drive Cage at Port 2I, Box 0, OK

      Drive Bays: 4
      Port: 2I
      Box: 0
      Location: Internal

   Physical Drives
      None attached


   Port Name: 1I
         Port ID: 0
         Port Connection Number: 0
         SAS Address: 500143802604ACA0
         Port Location: Internal

   Port Name: 2I
         Port ID: 1
         Port Connection Number: 1
         SAS Address: 500143802604ACA4
         Port Location: Internal

   Array: A
      Interface Type: SAS
      Unused Space: 0 MB (0.00%)
      Used Space: 558.73 GB (100.00%)
      Status: OK
      Array Type: Data 
      Smart Path: disable


      Logical Drive: 1
         Size: 279.37 GB
         Fault Tolerance: 1
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Unrecoverable Media Errors: None
         Caching:  Disabled
         Unique Identifier: 600508B1001CCB753BAE91CE7C5952BB
         Disk Name: /dev/sda 
         Mount Points: 278.3 GiB Partition   2, 1024 MiB Partition   1 /, /boot
         Disk Partition Information
            Partition   2: Basic, 278.3 GiB, /
            Partition   1: Basic, 1024 MiB, /boot
         Logical Drive Label: ADA98A1400143802604ACA047C0
         Mirror Group 1:
            physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS HDD, 300 GB, OK)
         Mirror Group 2:
            physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS HDD, 300 GB, OK)
         Drive Type: Data
         LD Acceleration Method: All disabled


      physicaldrive 1I:1:1
         Port: 1I
         Box: 1
         Bay: 1
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Rotational Speed: 10000
         Firmware Revision: SB2B
         Serial Number: EB71PCB1ALNW
         WWID: 500003946818356A
         Model: IBM-ESXSMBF2300RC
         Current Temperature (C): 28
         PHY Count: 2
         PHY Transfer Rate: 6.0Gbps, Unknown
         PHY Physical Link Rate: Unknown, Unknown
         PHY Maximum Link Rate: Unknown, Unknown
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None

      physicaldrive 1I:1:2
         Port: 1I
         Box: 1
         Bay: 2
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Rotational Speed: 10000
         Firmware Revision: SB2B
         Serial Number: EB71PCB1ALNB
         WWID: 50000394681830A2
         Model: IBM-ESXSMBF2300RC
         Current Temperature (C): 28
         PHY Count: 2
         PHY Transfer Rate: 6.0Gbps, Unknown
         PHY Physical Link Rate: Unknown, Unknown
         PHY Maximum Link Rate: Unknown, Unknown
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None


   SEP (Vendor ID PMCSIERA, Model SRCv8x6G) 380 
      Device Number: 380
      Firmware Version: RevB
      WWID: 500143802604ACAF
      Vendor ID: PMCSIERA
      Model: SRCv8x6G


-------------------------------------------

[root@newton62 mrexec]# ssacli ctrl all show config detail

Smart Array P420i in Slot 0 (Embedded)
   Bus Interface: PCI
   Slot: 0
   Serial Number: 001438022023B20
   Cache Serial Number: PBKUC0BRHA909B
   RAID 6 Status: Enabled
   Controller Status: OK
   Hardware Revision: B
   Firmware Version: 8.32
   Firmware Supports Online Firmware Activation: False
   Rebuild Priority: Low
   Expand Priority: Medium
   Surface Scan Delay: 15 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 0 secs
   Cache Board Present: True
   Cache Status: OK
   Cache Ratio: 10% Read / 90% Write
   Drive Write Cache: Disabled
   Total Cache Size: 1.0
   Total Cache Memory Available: 0.8
   Battery Backed Cache Size: 0.8
   No-Battery Write Cache: Disabled
   SSD Caching RAID5 WriteBack Enabled: False
   SSD Caching Version: 1
   Cache Backup Power Source: Capacitors
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK
   SATA NCQ Supported: True
   Spare Activation Mode: Activate on physical drive failure (default)
   Controller Temperature (C): 81
   Cache Module Temperature (C): 55
   Capacitor Temperature  (C): 24
   Number of Ports: 2 Internal only
   Encryption: Not Set
   Driver Name: hpsa
   Driver Version: 3.4.20
   Driver Supports SSD Smart Path: True
   PCI Address (Domain:Bus:Device.Function): 0000:02:00.0
   Port Max Phy Rate Limiting Supported: False
   Host Serial Number: CZJ"400Dc
   Sanitize Erase Supported: False
   Primary Boot Volume: None
   Secondary Boot Volume: None



   Internal Drive Cage at Port 1I, Box 1, OK

      Drive Bays: 4
      Port: 1I
      Box: 1
      Location: Internal

   Physical Drives
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS HDD, 300 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS HDD, 300 GB, OK)



   Internal Drive Cage at Port 2I, Box 0, OK

      Drive Bays: 4
      Port: 2I
      Box: 0
      Location: Internal

   Physical Drives
      None attached


   Port Name: 1I
         Port ID: 0
         Port Connection Number: 0
         SAS Address: 5001438022023B20
         Port Location: Internal

   Port Name: 2I
         Port ID: 1
         Port Connection Number: 1
         SAS Address: 5001438022023B24
         Port Location: Internal

   Array: A
      Interface Type: SAS
      Unused Space: 0 MB (0.00%)
      Used Space: 558.73 GB (100.00%)
      Status: OK
      Array Type: Data 
      Smart Path: disable


      Logical Drive: 1
         Size: 279.37 GB
         Fault Tolerance: 1
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Unrecoverable Media Errors: None
         Caching:  Enabled
         Unique Identifier: 600508B1001CFEA859297B81728FCD49
         Disk Name: /dev/sda 
         Mount Points: 274.3 GiB Partition   3, 1024 MiB Partition   1 /, /boot
         Disk Partition Information
            Partition   3: Basic, 274.3 GiB, /
            Partition   1: Basic, 1024 MiB, /boot
         Logical Drive Label: ADA8464D5001438022023B20 9070
         Mirror Group 1:
            physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS HDD, 300 GB, OK)
         Mirror Group 2:
            physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS HDD, 300 GB, OK)
         Drive Type: Data
         LD Acceleration Method: Controller Cache


      physicaldrive 1I:1:1
         Port: 1I
         Box: 1
         Bay: 1
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Rotational Speed: 10000
         Firmware Revision: SB28
         Serial Number: EB71PB90J2KH
         WWID: 50000393881275F6
         Model: IBM-ESXSMBF2300RC
         Current Temperature (C): 30
         PHY Count: 2
         PHY Transfer Rate: 6.0Gbps, Unknown
         PHY Physical Link Rate: Unknown, Unknown
         PHY Maximum Link Rate: Unknown, Unknown
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None

      physicaldrive 1I:1:2
         Port: 1I
         Box: 1
         Bay: 2
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Drive exposed to OS: False
         Logical/Physical Block Size: 512/512
         Rotational Speed: 10000
         Firmware Revision: HPDA
         Serial Number: PMVV8RBB
         WWID: 5000CCA0212FC98D
         Model: HP      EG0300FBDBR
         Current Temperature (C): 28
         Maximum Temperature (C): 45
         PHY Count: 2
         PHY Transfer Rate: 6.0Gbps, Unknown
         PHY Physical Link Rate: Unknown, Unknown
         PHY Maximum Link Rate: Unknown, Unknown
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False
         Shingled Magnetic Recording Support: None


   SEP (Vendor ID PMCSIERA, Model SRCv8x6G) 380 
      Device Number: 380
      Firmware Version: RevB
      WWID: 5001438022023B2F
      Vendor ID: PMCSIERA
      Model: SRCv8x6G

------------------------------------------

Comment 15 Justin M. Forbes 2023-05-25 17:27:55 UTC

Rune,  mind giving that rawhide kernel a try to see if 6.4-rc3 shows the issue too?  There is a link to the koji build in comment 7.

Comment 16 Rune Kleveland 2023-05-25 19:05:34 UTC

I tested the 6.4 kernel for an hour, and nothing bad was observed in the kernel log. Everything seems to work after a reboot also.

Comment 17 Eric Sandeen 2023-05-25 19:53:50 UTC

Rune - just for what it's worth, there's file data in the metadata, for example:

00000b20  2d 2d 2d 2d 42 45 47 49  4e 20 52 53 41 20 50 55  |----BEGIN RSA PU|
00000b30  42 4c 49 43 20 4b 45 59  2d 2d 2d 2d 2d 0a 4d 49  |BLIC KEY-----.MI|
00000b40  47 4a 41 6f 47 42 41 4e  30 55 2f 68 44 6d 77 79  |GJAoGBAN0U/hDmwy|

as well as maybe some Tor onion keys?

(might take down the URL if any of that concerns you)

Comment 18 Eric Sandeen 2023-05-25 20:24:46 UTC

Just for fun, I grabbed a HP ProLiant DL360 with a P420i RAID, migrated from the 6.2 kernel to the 6.3.3 kernel with various writes along the way and saw no problems after the upgrade.

Comment 19 Rune Kleveland 2023-05-25 20:51:03 UTC

Thanks for the info about file data. I took down the URL, but I am not concerned. If anyone needs the metadump, please let me know.

It's weird that I can reproduce so easily and you can't. Please contact me directly if you want access to a server where I think you can reproduce easily, and I will try to arrange that.

Comment 20 Gary Buhrmaster 2023-05-25 21:14:44 UTC

Another data point.

After about 14 days of uptime with kernel 6.3.1:

May 24 15:14:59 dvr0.lan kernel: XFS (sdc1): Reverse Mapping BTree record corruption in AG 9 detected!
May 24 15:14:59 dvr0.lan kernel: XFS (sdc1): Owner 0x3c0000082, flags 0x4, start block 0x558f681 block count 0x0
May 24 15:14:59 dvr0.lan kernel: XFS (sdc1): Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll+0x285/0x690 [xfs] (fs/xfs/libxfs/xfs_defer.c:573).  Shutting down filesystem.
May 24 15:14:59 dvr0.lan kernel: XFS (sdc1): Please unmount the filesystem and rectify the problem(s)

xfs_repair failed (said to mount to replay log)

Attempt to mount:

May 24 19:21:19 dvr0.lan kernel: XFS (sdc1): Mounting V5 Filesystem c80e09b1-f667-4d9f-b6c7-0a8989574238
May 24 19:21:19 dvr0.lan kernel: XFS (sdc1): Starting recovery (logdev: internal)
May 24 19:21:19 dvr0.lan kernel: 00000000: 40 12 01 00 01 00 00 00 30 98 41 94 3b 99 ff ff  @.......0.A.;...
May 24 19:21:19 dvr0.lan kernel: XFS (sdc1): Internal error xfs_rui_item_recover at line 502 of file fs/xfs/xfs_rmap_item.c.  Caller xlog_recover_p
rocess_intents+0xa3/0x300 [xfs]
May 24 19:21:19 dvr0.lan kernel: CPU: 3 PID: 165795 Comm: mount Not tainted 6.3.1-200.fc38.x86_64 #1
May 24 19:21:19 dvr0.lan kernel: Hardware name:  /DZ68BC, BIOS BCZ6810H.86A.0039.2012.1018.1115 10/18/2012
May 24 19:21:19 dvr0.lan kernel: Call Trace:
May 24 19:21:19 dvr0.lan kernel:  <TASK>
May 24 19:21:19 dvr0.lan kernel:  dump_stack_lvl+0x47/0x60
May 24 19:21:19 dvr0.lan kernel:  xfs_corruption_error+0x94/0xa0 [xfs]
May 24 19:21:19 dvr0.lan kernel:  ? xlog_recover_process_intents+0xa3/0x300 [xfs]
May 24 19:21:19 dvr0.lan kernel:  xfs_rui_item_recover+0x142/0x410 [xfs]
May 24 19:21:19 dvr0.lan kernel:  ? xlog_recover_process_intents+0xa3/0x300 [xfs]
May 24 19:21:19 dvr0.lan kernel:  ? xfs_read_agf+0x94/0x130 [xfs]
May 24 19:21:19 dvr0.lan kernel:  ? xfs_buf_read_map+0x62/0x2a0 [xfs]
May 24 19:21:19 dvr0.lan kernel:  ? xlog_recover_process_intents+0xa3/0x300 [xfs]
May 24 19:21:19 dvr0.lan kernel:  ? __pfx_xfs_rui_item_recover+0x10/0x10 [xfs]
May 24 19:21:19 dvr0.lan kernel:  xlog_recover_process_intents+0xa3/0x300 [xfs]
May 24 19:21:19 dvr0.lan kernel:  ? xfs_read_agf+0x94/0x130 [xfs]
May 24 19:21:19 dvr0.lan kernel:  ? preempt_count_add+0x6e/0xa0
May 24 19:21:19 dvr0.lan kernel:  ? lock_timer_base+0x61/0x80
May 24 19:21:19 dvr0.lan kernel:  xlog_recover_finish+0x31/0x340 [xfs]
May 24 19:21:19 dvr0.lan kernel:  xfs_log_mount_finish+0x106/0x180 [xfs]
May 24 19:21:19 dvr0.lan kernel:  xfs_mountfs+0x51f/0x900 [xfs]
May 24 19:21:19 dvr0.lan kernel:  xfs_fs_fill_super+0x4ca/0x920 [xfs]
May 24 19:21:19 dvr0.lan kernel:  ? __pfx_xfs_fs_fill_super+0x10/0x10 [xfs]
May 24 19:21:19 dvr0.lan kernel:  get_tree_bdev+0x174/0x260
May 24 19:21:19 dvr0.lan kernel:  vfs_get_tree+0x29/0xd0
May 24 19:21:19 dvr0.lan kernel:  path_mount+0x4a1/0xae0
May 24 19:21:19 dvr0.lan kernel:  __x64_sys_mount+0x11a/0x150
May 24 19:21:19 dvr0.lan kernel:  do_syscall_64+0x60/0x90
May 24 19:21:19 dvr0.lan kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
May 24 19:21:19 dvr0.lan kernel:  ? do_syscall_64+0x6c/0x90
May 24 19:21:19 dvr0.lan kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
May 24 19:21:19 dvr0.lan kernel:  ? do_syscall_64+0x6c/0x90
May 24 19:21:19 dvr0.lan kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
May 24 19:21:19 dvr0.lan kernel: RIP: 0033:0x7f97037d3a9e
May 24 19:21:19 dvr0.lan kernel: Code: 48 8b 0d 6d 03 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3a 03 0c 00 f7 d8 64 89 01 48
May 24 19:21:19 dvr0.lan kernel: RSP: 002b:00007ffe575b90a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
May 24 19:21:19 dvr0.lan kernel: RAX: ffffffffffffffda RBX: 000055abcf56f570 RCX: 00007f97037d3a9e
May 24 19:21:19 dvr0.lan kernel: RDX: 000055abcf575ee0 RSI: 000055abcf56f7c0 RDI: 000055abcf56f7a0
May 24 19:21:19 dvr0.lan kernel: RBP: 00007ffe575b91d0 R08: 0000000000000000 R09: 0000000000000007
May 24 19:21:19 dvr0.lan kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
May 24 19:21:19 dvr0.lan kernel: R13: 000055abcf56f7a0 R14: 000055abcf575ee0 R15: 00007f9703903076
May 24 19:21:19 dvr0.lan kernel:  </TASK>
May 24 19:21:19 dvr0.lan kernel: XFS (sdc1): Corruption detected. Unmount and run xfs_repair
May 24 19:21:19 dvr0.lan kernel: XFS (sdc1): Failed to recover intents
May 24 19:21:19 dvr0.lan kernel: XFS (sdc1): Filesystem has been shut down due to log error (0x2).
May 24 19:21:19 dvr0.lan kernel: XFS (sdc1): Please unmount the filesystem and rectify the problem(s).
May 24 19:21:19 dvr0.lan kernel: XFS (sdc1): Ending recovery (logdev: internal)
May 24 19:21:19 dvr0.lan kernel: XFS (sdc1): log mount finish failed

xfs_repair with -L was required

---

(A very old) desktop system, without heavy disk activity, running on a LSI MegaRaid 9260 using hardware RAID-6 (6 4TB drives in the RAID-6 virtual drive).

Previously the system was running 6.2.14 without issue(s).

---

Results from xfs_info

meta-data=/dev/sdc1              isize=512    agcount=32, agsize=122070336 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=3906249216, imaxpct=5
         =                       sunit=64     swidth=256 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

---

I note that all these reported failures seem to be using hardware raid and with stripe (whether either is relevant is unknown).

I am also using XFS (without issue) on a number of other systems with a 6.3 kernel, but none of them are using hardware raid or stripes.

---

I downgraded to the latest 6.2 kernel for now as my workaround.

Comment 21 Justin M. Forbes 2023-05-25 21:21:34 UTC

Other things of note:

6.3 has never made it past updates testing, due to this issue. That means we don't really know how widepsread it is beyond the people who have enabled updates-testing
The default filesystem for fedora workstation and cloud spins is btrfs, so I would expect that even the majority of users with updates-testing enabled are not running xfs.
I have not seen this on any of my machines, though they are all single disk paritions for xfs.

Comment 22 Chris Caudle 2023-05-25 22:18:25 UTC

I have been busy with work the last couple of days, I'm glad Rune had also seen this same problem on a much larger number of machines.  This occurred on my primary workstation, as you can probably imagine I am not eager to repeatedly try to duplicate something which results in corruption of the filesystem.  Of course I will if necessary to help track down the bug, but I am hoping Rune's 50 guinea pigs are sufficient for the time being.

> The default filesystem for fedora workstation and cloud spins is btrfs

Yeah, I first setup this machine back before btrfs was considered stable enough to use in production.  Red Hat had decided to make XFS the default for RHEL, and I had been using XFS on some data storage volumes in the past (i.e. not / and not /home), so I decided to setup my machine with XFS like RHEL was using as default.

I will try to fill in the info Dave C requested in case it helps find some common thread.
I am not sure what was meant by "storage stack layout," if @dchinner can explain what was meant I can provide additional info.
Unfortunately I do not have the full dmesg info available from the time of the crash, hopefully Rune is better able to grab that from one of his machines.

Did anyone recognize any of that output that got jammed into the metadata block in my first post?  Could it be from gcc?  I think I was compiling Ardour when the crash happened.
Does the preponderance of the evidence point to some other process stomping on filesystem memory, and not a filesystem problem per-se?

This is an old tower server that I am using as a workstation, so it is what would have been a basic server about 14 years ago.  Ugh, I had to go look up that release date.  Time for a refresh, although hard to beat the core count and memory size you can get for just a few hundred dollars of obsolete datacenter gear.

Anyway...this info is current, not from the time of the filesystem corruption.

Dual sockets (both populated).

CPU: 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU X5650  @ 2.67GHz
stepping        : 2
microcode       : 0x1f

Memory:
MemTotal:       198035844 kB
MemFree:        186282764 kB
MemAvailable:   189951908 kB
Buffers:           10716 kB
Cached:          4872220 kB
SwapCached:            0 kB
Active:          7719416 kB
Inactive:        1964496 kB
Active(anon):    4901080 kB
Inactive(anon):    22592 kB
Active(file):    2818336 kB
Inactive(file):  1941904 kB
Unevictable:         492 kB
Mlocked:             492 kB
SwapTotal:      12580088 kB
SwapFree:       12580088 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:                 4 kB
Writeback:             0 kB
AnonPages:       4801604 kB
Mapped:           916436 kB
Shmem:            122676 kB
KReclaimable:     144676 kB
Slab:             362636 kB
SReclaimable:     144676 kB
SUnreclaim:       217960 kB
KernelStack:       30576 kB
PageTables:        73212 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    111598008 kB
Committed_AS:   15729108 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      935428 kB
VmallocChunk:          0 kB
Percpu:            33280 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:      9872 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      604284 kB
DirectMap2M:    17211392 kB
DirectMap1G:    183500800 kB

Storage:
Smart Array P410 in Slot 1
Smart Array P420 in Slot 2

The controller in slot 1 is a pseudo-backup device, it has a single large logical volume with a single XFS filesystem.  It gets mounted to copy archive files, then is unmounted again for safety.


Smart Array P420 in Slot 2
   Bus Interface: PCI
   Slot: 2
   RAID 6 (ADG) Status: Enabled
   Controller Status: OK
   Hardware Revision: B
   Firmware Version: 8.32-0
   Rebuild Priority: High
   Expand Priority: Medium
   Surface Scan Delay: 3 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 15 secs
   Cache Board Present: True
   Cache Status: OK
   Cache Ratio: 50% Read / 50% Write
   Drive Write Cache: Disabled
   Total Cache Size: 1024 MB
   Total Cache Memory Available: 816 MB
   No-Battery Write Cache: Disabled
   SSD Caching RAID5 WriteBack Enabled: False
   SSD Caching Version: 1
   Cache Backup Power Source: Capacitors
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK
   SATA NCQ Supported: True
   Spare Activation Mode: Activate on physical drive failure (default)
   Controller Temperature (C): 74
   Cache Module Temperature (C): 42
   Capacitor Temperature  (C): 28
   Number of Ports: 2 Internal only
   Driver Name: hpsa
   Driver Version: 3.4.20
   Driver Supports SSD Smart Path: True
   PCI Address (Domain:Bus:Device.Function): 0000:11:00.0
   Sanitize Erase Supported: False
   Primary Boot Volume: None
   Secondary Boot Volume: None

Physical drive info:

Smart Array P420 in Slot 2
   Array A
      physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SAS HDD, 300 GB, OK)
      physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SAS HDD, 300 GB, OK)
      physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SAS HDD, 300 GB, OK)
      physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SAS HDD, 300 GB, OK)
      physicaldrive 2I:2:5 (port 2I:box 2:bay 5, SAS HDD, 300 GB, OK)
      physicaldrive 2I:2:6 (port 2I:box 2:bay 6, SAS HDD, 300 GB, OK)
      physicaldrive 2I:2:7 (port 2I:box 2:bay 7, SAS HDD, 300 GB, OK)
      physicaldrive 2I:2:8 (port 2I:box 2:bay 8, SAS HDD, 300 GB, OK)

Logical volume info:
Smart Array P420 in Slot 2
   Array A
      logicaldrive 1 (597 MB, RAID 5, OK)       [ this is /boot ]
      logicaldrive 2 (4.0 GB, RAID 5, OK)       [ this is swap  ]
      logicaldrive 3 (100.0 GB, RAID 5, OK)     [ this is /     ]
      logicaldrive 4 (100.0 GB, RAID 5, OK)     [ this is /home, this one fell over ]
      logicaldrive 5 (1.7 TB, RAID 5, OK)       [ this is /mnt/data  ]


Logical drive 4 was the volume which failed previously (which is mounted on  /home).
Smart Array P420 in Slot 2
   Array A
      Logical Drive: 4
         Size: 100.0 GB
         Fault Tolerance: 5
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 25700
         Strip Size: 256 KB
         Full Stripe Size: 1792 KB
         Status: OK
         Caching:  Enabled
         Parity Initialization Status: Initialization Completed
         Unique Identifier: 600508B1001C21B685B97DA3A8E264FF
         Disk Name: /dev/sdd
         Mount Points: /home 100.0 GB Partition Number 1
         OS Status: LOCKED
         Logical Drive Label: home
         Drive Type: Data
         LD Acceleration Method: Controller Cache


xfs_info /home
meta-data=/dev/sdd1              isize=512    agcount=16, agsize=1638272 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0
         =                       reflink=0    bigtime=0 inobtcount=0 nrext64=0
data     =                       bsize=4096   blocks=26212352, imaxpct=25
         =                       sunit=64     swidth=448 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=12800, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


Mount options:
#
# /etc/fstab
# Created by anaconda on Wed Dec 21 18:20:55 2016
# edited by chris on Fri Dec 23 2016
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=f89dff8e-749a-4bfa-a42a-0144acc00820  /                 xfs     defaults        0 1
UUID=b0efc850-2966-44af-90a5-7de8bf605093  /boot             ext4    defaults        1 2
UUID=60cf5755-8094-46d2-a297-c0b4c1ce03d9  /home             xfs     defaults        0 2
UUID=7213c283-cdc2-421d-a840-f28eb8716c7e  swap              swap    defaults        0 0
UUID=f3add08f-39f8-4727-a2d5-25f3a1c9c300  /mnt/data         xfs     defaults        0 2

# Extra array of mid-line disks
UUID=4885b76e-fb84-43e6-bd27-557a1a9462ea  /mnt/bigdata      xfs     defaults,noauto 0 2

Comment 23 Justin M. Forbes 2023-05-25 23:02:49 UTC

(In reply to Chris Caudle from comment #22)
> > The default filesystem for fedora workstation and cloud spins is btrfs
> 
> Yeah, I first setup this machine back before btrfs was considered stable
> enough to use in production.  Red Hat had decided to make XFS the default
> for RHEL, and I had been using XFS on some data storage volumes in the past
> (i.e. not / and not /home), so I decided to setup my machine with XFS like
> RHEL was using as default.


This was not meant as "you shouldn't be using xfs" I use it here, and only have btrfs on some test VMs, it was more meant to point out that of the limited number of users willing to test updates-testing, a much smaller number would be running xfs at all.  I am rather worried about how widespread this issue would be if we pushed 6.3 to stable updates right now.

Comment 24 Rich 2023-05-26 06:10:34 UTC

Not a Red Hat user but I have the same issue and thought it might be helpful for you to have some details. It's a Debian system (currently on bookworm) using XanMod. On the XanMod 6.3.3 kernel I get the error reliably; on 6.3.1 I do not. I'm afraid I'm not a kernel expert. Posting this here as it seems to be a fairly different hardware configuration to others, but very similar issue in other respects. I'm in no way seeking support - purely in case it helps with your own diagnostics and apologies if it muddies the water.

Some info pulled from syslog before I switched back to 6.3.1-x64v2-xanmod1:

2023-05-19T09:51:36.149820+01:00 localhost kernel: [595667.735185] XFS (md1): Corruption detected. Unmount and run xfs_repair
2023-05-19T09:51:36.179608+01:00 localhost kernel: [595667.737796] md1: writeback error on inode 8651095188, offset 0, sector 5920535552
2023-05-19T09:51:36.179615+01:00 localhost kernel: [595667.765370] XFS (md1): Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll+0x23b/0x480 [xfs] (fs/xfs/libxfs/xfs_defer.c:573).  Shutting 
down filesystem.
2023-05-19T09:51:36.179616+01:00 localhost kernel: [595667.765390] XFS (md1): Please unmount the filesystem and rectify the problem(s)


2023-05-19T10:26:30.649329+01:00 localhost kernel: [    6.535940] XFS (md1): Metadata corruption detected at xfs_buf_ioend+0x4f/0x380 [xfs], xfs_inode block 0x160e42c80 xfs_inode_buf_verify
2023-05-19T10:26:30.649331+01:00 localhost kernel: [    6.535961] XFS (md1): Unmount and run xfs_repair


Also
2023-05-21T15:55:01.742635+01:00 localhost kernel: [192412.030060] XFS (md1): Internal error bno + len > gtbno at line 1908 of file fs/xfs/libxfs/xfs_alloc.c.  Caller xfs_free_ag_extent.constprop.0+0x114/0x6f0 [xfs]
2023-05-21T15:55:01.758627+01:00 localhost kernel: [192412.045969] XFS (md1): Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll+0x23b/0x480 [xfs] (fs/xfs/libxfs/xfs_defer.c:573).  Shutting down filesystem.


Specification:
AMD Ryzen 5 5600X
4*12TB WD Gold RAID10 (mdraid, far=2 layout, 512k chunk)
XFS filesystem was created on the mdraid device April 2020
No LVM in use
No hardware RAID
smartctl happy for all physical devices in the system
xfs_repair -L required


mdadm -Q /dev/md1
/dev/md1: 21.83TiB raid10 4 devices, 0 spares. Use mdadm --detail for more detail.


xfs_info /home
meta-data=/dev/md1              isize=512    agcount=32, agsize=183105408 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0    bigtime=0 inobtcount=0 nrext64=0
data     =                       bsize=4096   blocks=5859373056, imaxpct=5
         =                       sunit=128    swidth=512 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


Mount options:
/dev/md1 /home xfs defaults 0 0

Comment 25 Rune Kleveland 2023-05-26 06:59:28 UTC

I could reproduce on 6.3.4 kernel. The 6.4 kernel still runs fine on the same type of hardware.

Comment 26 Justin M. Forbes 2023-05-26 13:42:38 UTC

So this was a patch backported to stable for either 6.3.2 or 6.3.3.  The only xfs specific patch there is xfs-don-t-consider-future-format-versions-valid, but there are about 1300 patches total in those updates.

Comment 27 Justin M. Forbes 2023-05-26 13:46:41 UTC

Rune, can you give 6.3.2 a try? That would narrow it down a good bit more.  There is a build at https://koji.fedoraproject.org/koji/buildinfo?buildID=2199126

Comment 28 Eric Sandeen 2023-05-26 15:27:56 UTC

(In reply to Justin M. Forbes from comment #27)
> Rune, can you give 6.3.2 a try? That would narrow it down a good bit more. 
> There is a build at
> https://koji.fedoraproject.org/koji/buildinfo?buildID=2199126

I think that's a good idea. (Was wondering about testing 6.3.0 to eliminate all -stable backport variables?)

Comment 29 Justin M. Forbes 2023-05-26 15:48:28 UTC

(In reply to Eric Sandeen from comment #28)
> (In reply to Justin M. Forbes from comment #27)
> > Rune, can you give 6.3.2 a try? That would narrow it down a good bit more. 
> > There is a build at
> > https://koji.fedoraproject.org/koji/buildinfo?buildID=2199126
> 
> I think that's a good idea. (Was wondering about testing 6.3.0 to eliminate
> all -stable backport variables?)

We have at least one verification that 6.3.1 is good.  We know 6.3.3 is bad. I built 6.3.2 for kernel test week (where no one reported this, but it could just be no one impacted tested with that kernel).  A proper test from someone who can reliably reproduce would tell us if this came in with 6.3.2 (694 patches) or 6.3.3 (245 patches).  Also of note, if 6.2.15 did not see this, that was the release at the same time as 6.3.2.  It too had over 600 patches, most of the same ones. 6.2.16 was the 6.3.3 equivalent, which I built, but never pushed as an update, so we have no data on.

Comment 30 Gary Buhrmaster 2023-05-26 15:59:05 UTC

(In reply to Justin M. Forbes from comment #29)

> We have at least one verification that 6.3.1 is good.

And the report in comment #20 failed with 6.3.1

Given that no one has a trivial reproducer (although Rune seems to be far in the lead) I would be reluctant to believe that lack of failures demonstrates a good kernel level.

Comment 31 Justin M. Forbes 2023-05-26 16:10:09 UTC

We do also have 6.3.1 kernels:
https://koji.fedoraproject.org/koji/buildinfo?buildID=2196477

and 6.3.0 kernels (built against rawhide, but should run fine for testing as long as you don't need an out of tree module)
https://koji.fedoraproject.org/koji/buildinfo?buildID=2191871

Should anyone need the link to it for testing, this was 6.2.16:
https://koji.fedoraproject.org/koji/buildinfo?buildID=2202061

Comment 32 Eric Sandeen 2023-05-26 17:13:35 UTC

I might have a reproducer here.

I took Rune's metadump, xfs_mdrestore'd it to an image, and used that in a loopback mount as the TEST_DEV in xfstests.

Running generic/013 (which runs fsstress on TEST_DEV) seems to reliably re-corrupt the image under the 6.3.3 kernel and succeeds under the 6.2.15 kernel.

(It's remotely possible that there is latent corruption in the image even after the xfs_repair, and only 6.3 is catching it, but that seems unlikely.)

I'll keep you posted...

Comment 33 Rune Kleveland 2023-05-26 17:58:42 UTC

I can confirm that 6.2.2 also corrupts the file systems. I happened just after reboot, I couldn't log in and run a command before it died.

Comment 34 Rune Kleveland 2023-05-26 18:00:01 UTC

Sorry, typo, 6.3.2.

Comment 35 Christian Labisch 2023-05-26 18:15:08 UTC

@Justin : I just stumbled upon this bug report and want to share my experience. I am running Fedora Linux 38 Workstation with kernel 6.3 on a XFS file system since you provided the first point release.
My partition setup -> 1. partition /boot/efi - format FAT32 | 2. partition /boot - format XFS | 3. partition (LVM) / - format XFS ... All 6.3 kernel versions worked as expected without reported issues.
Today I installed kernel 6.3.4 and nothing changed, everything works as expected. Below you can see that no metadata issue gets reported. Maybe the problem occurs only on RAID setups ? Hope this helps.
 
journalctl | grep metadata
May 26 11:00:34 ******** systemd[2174]: Starting gvfs-metadata.service - Virtual filesystem metadata service...
May 26 11:00:34 ******** systemd[2174]: Started gvfs-metadata.service - Virtual filesystem metadata service.
May 26 11:08:52 ******** systemd[2174]: Starting tracker-extract-3.service - Tracker metadata extractor...
May 26 11:08:52 ******** systemd[2174]: Started tracker-extract-3.service - Tracker metadata extractor.

journalctl | grep xfs
May 26 11:00:21 ******** systemd-fsck[651]: /usr/sbin/fsck.xfs: XFS file system.
May 26 11:00:23 ******** kernel: xfs filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
May 26 11:00:23 ******** kernel: xfs filesystem being mounted at /boot supports timestamps until 2038 (0x7fffffff)

Comment 36 Joseph C. Sible 2023-05-26 19:29:53 UTC

(In reply to Christian Labisch from comment #35)
> Maybe the problem occurs only on RAID setups ?

No, I encountered this problem despite not using any form of RAID.

Comment 37 Eric Sandeen 2023-05-27 02:18:49 UTC

I landed on 74c36a8689d ("xfs: use xfs_alloc_vextent_this_ag() where appropriate") as the first bad commit between 6.2 and 6.3 which triggers the reproducer for me.

Haven't looked for problems in the commit yet; 6.4-rc seems ok again, I'm going to see if I can bisect to what fixed it, and see whether it was intentional...

Comment 38 Gary Buhrmaster 2023-05-27 02:38:12 UTC

(In reply to Eric Sandeen from comment #37)
> I landed on 74c36a8689d ("xfs: use xfs_alloc_vextent_this_ag() where
> appropriate") as the first bad commit between 6.2 and 6.3 which triggers the
> reproducer for me.
> 
> Haven't looked for problems in the commit yet; 6.4-rc seems ok again, I'm
> going to see if I can bisect to what fixed it, and see whether it was
> intentional...

That is great detective work.  Thanks.

I now see the light at the end of the tunnel (and I don't think it is another filesystem corrupting train).

Comment 39 Dave Chinner 2023-05-27 02:43:33 UTC

Please backport this commit to a test 6.3.3 kernel build and see if that fixes the problem:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/xfs/libxfs/xfs_bmap.c?id=9419092fb2630c30e4ffeb9ef61007ef0c61827a

This is a bug fix that we thought just fixed a livelock on stripe aligned filesystems.

I'm guessing that in certain circumstances instead of livelocking on repeated failed allocations, it results in a broken mapping being returned to the writeback code and hence misdirecting the writeback IO.

-Dave.

Comment 40 Eric Sandeen 2023-05-27 02:45:08 UTC

Thanks Dave, I'll try that right now.

Comment 41 Gary Buhrmaster 2023-05-27 02:46:47 UTC

(In reply to Joseph C. Sible from comment #36)
> (In reply to Christian Labisch from comment #35)
> > Maybe the problem occurs only on RAID setups ?
> 
> No, I encountered this problem despite not using any form of RAID.

Out of curiosity, are you using stripes?  (sunit != 0 and/or swidth != 0)?  That might suggest a common case.

Comment 42 Eric Sandeen 2023-05-27 03:12:22 UTC

(In reply to Dave Chinner from comment #39)
> Please backport this commit to a test 6.3.3 kernel build and see if that
> fixes the problem:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/
> xfs/libxfs/xfs_bmap.c?id=9419092fb2630c30e4ffeb9ef61007ef0c61827a
> 
> This is a bug fix that we thought just fixed a livelock on stripe aligned
> filesystems.
> 
> I'm guessing that in certain circumstances instead of livelocking on
> repeated failed allocations, it results in a broken mapping being returned
> to the writeback code and hence misdirecting the writeback IO.
> 
> -Dave.

Looks like that fixes it Dave, thank you. @jforbes if you want to build a test 6.3.x kernel with that commit ...?

Comment 43 Justin M. Forbes 2023-05-27 16:32:47 UTC

https://koji.fedoraproject.org/koji/buildinfo?buildID=2206484 has that commit. Please test and let me know, it is an official build, so I will push to updates-testing once I have confirmation.

Comment 44 Eric Sandeen 2023-05-27 19:27:10 UTC

Thanks Justin! This passes the reproducer I had, I'll loop it for a while and then run through full xfstests a few times.

Comment 45 Rune Kleveland 2023-05-27 20:24:48 UTC

Build 2206484 has been stable for 90 minutes on the same type of hardware that all the other 6.3 kernels crashed within a couple of minutes after boot. So this seems to fix the issue for me.

Comment 46 Fedora Update System 2023-05-28 01:30:01 UTC

FEDORA-2023-2f35633034 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2023-2f35633034

Comment 47 Fedora Update System 2023-05-28 01:31:38 UTC

FEDORA-2023-514965dd8a has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-514965dd8a

Comment 48 Joseph C. Sible 2023-05-28 17:01:56 UTC

(In reply to Gary Buhrmaster from comment #41)
> (In reply to Joseph C. Sible from comment #36)
> > (In reply to Christian Labisch from comment #35)
> > > Maybe the problem occurs only on RAID setups ?
> > 
> > No, I encountered this problem despite not using any form of RAID.
> 
> Out of curiosity, are you using stripes?  (sunit != 0 and/or swidth != 0)? 
> That might suggest a common case.

Nope, those are both 0 on the filesystem where I encountered this.

Comment 49 Fedora Update System 2023-05-29 02:08:44 UTC

FEDORA-2023-514965dd8a has been pushed to the Fedora 38 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-514965dd8a`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-514965dd8a

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 50 Fedora Update System 2023-05-29 02:12:53 UTC

FEDORA-2023-2f35633034 has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-2f35633034`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-2f35633034

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 51 Fedora Update System 2023-05-31 01:44:55 UTC

FEDORA-2023-2f35633034 has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 52 Chris Caudle 2023-05-31 02:14:13 UTC

No repro after only about an hour running kernel-6.3.4-201.fc38.x86_64.  
I will keep it running for the next few days, hopefully that fully fixed the problem.

Comment 53 Fedora Update System 2023-05-31 17:29:56 UTC

FEDORA-2023-514965dd8a has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 54 Eric Sandeen 2023-06-01 17:15:12 UTC

(In reply to Joseph C. Sible from comment #48)
> (In reply to Gary Buhrmaster from comment #41)
> > (In reply to Joseph C. Sible from comment #36)
> > > (In reply to Christian Labisch from comment #35)
> > > > Maybe the problem occurs only on RAID setups ?
> > > 
> > > No, I encountered this problem despite not using any form of RAID.
> > 
> > Out of curiosity, are you using stripes?  (sunit != 0 and/or swidth != 0)? 
> > That might suggest a common case.
> 
> Nope, those are both 0 on the filesystem where I encountered this.

I don't see any details of the problem you encountered; it's important not to lump different problems together. Have you reported details of this anywhere?

Thanks,
-Eric

Comment 55 Joseph C. Sible 2023-06-02 19:19:18 UTC

(In reply to Eric Sandeen from comment #54)
> I don't see any details of the problem you encountered; it's important not
> to lump different problems together. Have you reported details of this
> anywhere?

I have not. Unfortunately, I didn't save enough information from when I encountered this to make a report, since this was the first time I've ever encountered XFS metadata corruption, and it happened to be on a 6.3.x kernel and the very same day that this bug was mentioned on news sites like Phoronix, so I just assumed it was what was affecting me. I'm on 6.3.5 now, so if it does reoccur, then next time I'll save enough information for a report.

Note You need to log in before you can comment on or make changes to this bug.

acaringi
adikurthy
adscvr
agurenko
airlied
alciregi
amcdermo
amessina
bskeggs
clnetbox
cristian.ciupitu
cro
dchinner
esandeen
gary.buhrmaster
hdegoede
hpa
jarodwilson
jbowen
jcsible
jforbes
josef
kernel-maint
lgoncalv
linux
linville
masami256
mchehab
mihai
netllama
nixuser
pasik
ptalbert
rh
robatino
runekl
sam
steved