Bug 1857996 - BTRFS - FS corruption when using rpm-ostree
Summary: BTRFS - FS corruption when using rpm-ostree
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 32
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: fedora-kernel-btrfs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-16 20:08 UTC by Stefano Figura
Modified: 2020-07-20 19:23 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-20 19:23:24 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
latest dmesg (121.01 KB, text/plain)
2020-07-16 20:08 UTC, Stefano Figura
no flags Details
dmesg snippet when system becomes RO (4.15 KB, text/plain)
2020-07-16 20:11 UTC, Stefano Figura
no flags Details
journal when system becomes RO (19.48 KB, text/plain)
2020-07-16 20:11 UTC, Stefano Figura
no flags Details
btrfs check --repair weirdness (4.89 MB, image/jpeg)
2020-07-16 21:08 UTC, Stefano Figura
no flags Details
memtest fail (219.18 KB, image/jpeg)
2020-07-17 19:15 UTC, Stefano Figura
no flags Details

Description Stefano Figura 2020-07-16 20:08:49 UTC
Created attachment 1701463 [details]
latest dmesg

1. Please describe the problem:

When using `rpm-ostree` command to install/uninstall or cleanup, the FS becomes RO .

2. What is the Version-Release number of the kernel:

5.7.8-200.fc32.x86_64


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Kernel version unchanged AFAIK

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
 - rpm-ostree cleanup -b -m

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

NA as rpm-ostree operations would make the sysem RO

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Stefano Figura 2020-07-16 20:11:17 UTC
Created attachment 1701464 [details]
dmesg snippet when system becomes RO

Comment 2 Stefano Figura 2020-07-16 20:11:52 UTC
Created attachment 1701465 [details]
journal when system becomes RO

Comment 3 Josef Bacik 2020-07-16 20:32:30 UTC
Can you fsck the volume?  btrfs check <device>, and capture the output?  Seems like something's gone kinda wrong.

Comment 4 Stefano Figura 2020-07-16 20:44:25 UTC
WARNING: filesystem mounted, continuing because of --force
Checking filesystem on /dev/mapper/luks-057fc359-f23a-44da-add9-cbe4dcb88e94
UUID: 625524dd-9148-4443-9803-e639b1fad368
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
	unresolved ref dir 1052986 index 6 namelen 38 name 3fdff3c8570e63c7c4db61f76c0d9ab6d0556 filetype 0 errors 3, no dir item, no dir index
	unresolved ref dir 1052986 index 6 namelen 38 name 3fdff3c8570e63c7c4db61f76c0d9ab6d05596 filetype 7 errors 4, no inode ref
ERROR: errors found in fs roots
found 71413518336 bytes used, error(s) found
total csum bytes: 60156704
total tree bytes: 856981504
total fs tree bytes: 752533504
total extent tree bytes: 34177024
btree space waste bytes: 155156570
file data blocks allocated: 84847075328
 referenced 78786220032

Comment 5 Stefano Figura 2020-07-16 21:08:46 UTC
Created attachment 1701470 [details]
btrfs check --repair weirdness

Comment 6 Josef Bacik 2020-07-16 21:22:19 UTC
Ok I was wrong, apparently fsck doesn't repair reference mistakes.  If you can, will you go back into rescue mode and mount a usb disk and do the following

btrfs-image -c 4 -t 4 <device that btrfs is on> /mnt/pnt/for/usb/disk/dump.img

And then upload it somewhere so I can get btrfs check to fix the problem.  This only pulls metadata, no data.  However I will be able to see filenames, so if you have sensitive filenames you can use the -s option, which will generate garbage for every filename.

If this isn't possible then no worries, I have enough information to hand craft a corrupt fs to test against.

Comment 7 Stefano Figura 2020-07-16 22:03:18 UTC
(In reply to Josef Bacik from comment #6)
> Ok I was wrong, apparently fsck doesn't repair reference mistakes.  If you
> can, will you go back into rescue mode and mount a usb disk and do the
> following
> 
> btrfs-image -c 4 -t 4 <device that btrfs is on>
> /mnt/pnt/for/usb/disk/dump.img
> 
> And then upload it somewhere so I can get btrfs check to fix the problem. 
> This only pulls metadata, no data.  However I will be able to see filenames,
> so if you have sensitive filenames you can use the -s option, which will
> generate garbage for every filename.
> 
> If this isn't possible then no worries, I have enough information to hand
> craft a corrupt fs to test against.

Please check DM in IRC.
Sent link to dump.img with scrambled filenames.

Comment 8 Josef Bacik 2020-07-17 15:16:15 UTC
Update with where we're at right now.

OS tree is doing hardlinks to original files, presumably with the same filename.  However somehow one of the filenames got corrupted.  The 

btrfs inspect-internal inode-resolve 517456

spit out

/ostree/repo/objects/87/31770bf3ba31449494a6696688bf0752f2db6d46a216e32d030d739bfdf7a2.file
//ostree/deploy/fedora/deploy/025ef23a7912bcbb04576d58fead0851867090b708ca7d7f83cdc62846023f0f.0/usr/lib/.build-id/55/3fdff3c8570e63c7c4db61f76c0d9ab6d0556
//ostree/deploy/fedora/deploy/b728d4423a29b348eff87b9f173e8a09ee1974b25fc413a25fb6ab104f58be15.0/usr/lib/.build-id/55/3fdff3c8570e63c7c4db61f76c0d9ab6d05596
//ostree/deploy/fedora/deploy/889ca94abd67c1013be571901f17544eaf19031d781bc2f8138629ef66374590.0/usr/lib/.build-id/55/3fdff3c8570e63c7c4db61f76c0d9ab6d05596
//ostree/repo/extensions/rpmostree/private/commit/usr/lib/.build-id/55/3fdff3c8570e63c7c4db61f76c0d9ab6d05596

Here you notice we have '3fdff3c8570e63c7c4db61f76c0d9ab6d05596' and '3fdff3c8570e63c7c4db61f76c0d9ab6d0556'.  It appears the '9' got cut out of that one file, but dumping the contents of the leaf we see this

item 109 key (1052986 DIR_ITEM 3585598111) itemoff 12578 itemsize 68
        location key (517456 INODE_ITEM 0) type SYMLINK
        transid 12549 data_len 0 name_len 38
        name: 3fdff3c8570e63c7c4db61f76c0d9ab6d055^Y6

that '^Y' is hex 0x19, which is an unprintable ascii character.  '9' is 0x39.  So we have

'^Y' => 0x19 => 0001 1001
'9'  => 0x39 => 0011 1001

So there's a bitflip in the name.  This is suspicious, but the item is

1052986 DIR_ITEM 3585598111

The 3585598111 is the hash of the name, which matches the corrupted name.  This means we were handed the corrupted name.  The way Btrfs works when adding a link is

->insert inode ref - this inserts an item that is [$INODE_NUMBER INODE_REF $INODE_NUMBER_OF_THE_DIRECTORY_CONTAINING_THE_REF], and then in here it copies the name that we were given.
->insert a DIR_ITEM - this is what is shown above, is in the form of [$DIR_INODE_NUMBER DIR_ITEM $HASH_OF_NAME], the name is copied into this item
->insert a DIR_INDEX - this is for readdir, just [$DIR_INODE_NUMBER DIR_INDEX $INCREASING_KEY], the name is copied into this item

Now I made a mistake here telling the user to run --repair, because it likely made changes to the fs so now I'm not entirely sure what it looked like before, but I can guess from previous clues.  What we know is

1) The DIR_ITEM and DIR_INDEX have the same corrupt name.
2) We were expecting the correct name to be found.  The original error was

[  241.670548] BTRFS info (device dm-0): failed to delete reference to 3fdff3c8570e63c7c4db61f76c0d9ab6d05596, inode 517456 parent 1052986

Which means we were trying to remove reference the correct name '3fdff3c8570e63c7c4db61f76c0d9ab6d05596' for the directory that we clearly have a corrupt name inside of.  We get this name from the dentry->d_name (same as for the original link), so at unlink time we had the correct name in the dentry.  However we couldn't find the INODE_REF item, however the code here does this

        di = btrfs_lookup_dir_item(trans, root, path, dir_ino,
                                    name, name_len, -1);
        if (IS_ERR_OR_NULL(di)) {
                ret = di ? PTR_ERR(di) : -ENOENT;
                goto err;
        }
        ret = btrfs_delete_one_dir_name(trans, root, path, di);
        if (ret)
                goto err;
        btrfs_release_path(path);

<snip>
        ret = btrfs_del_inode_ref(trans, root, name, name_len, ino,
                                  dir_ino, &index);
        if (ret) {
                btrfs_info(fs_info,
                        "failed to delete reference to %.*s, inode %llu parent %llu",
                        name_len, name, ino, dir_ino);
                btrfs_abort_transaction(trans, ret);
                goto err;
        }

so clearly we found the DIR_ITEM we were looking for, which is strange, because again we can see that we have the wrong DIR_ITEM on the disk, so we should have gotten an ENOENT from btrfs_lookup_dir_item().

Unfortunately right now I'm out of ideas.  It could be bad memory, but it could also just be a memory corruption bug somewhere inside the kernel that just happened to nail us in this weird way.

Obviously we are still digging in, and I've advised a memtest to be run while we're investigating.  This is just a dump of the status of the investigation so far.

Comment 9 Stefano Figura 2020-07-17 19:15:17 UTC
Created attachment 1701579 [details]
memtest fail

Please see attached memory test failure.
Happy BTRFS was not at fault. Looking forward at getting better fault tollerance in BTRFS.

Comment 10 Michel Lind 2020-07-17 21:53:27 UTC
confirmed -- I installed Silverblue 32 on a VM, and exercised rpm-ostree by switching to Rawhide. No issue (apart from the known problem of having to add rootflags=subvol=root at each boot).

Comment 11 Josef Bacik 2020-07-20 19:23:24 UTC
The initial btrfs check --repair did something wonky with the original inode, so that will warrant further investigation to fix that.  We had dangling references to the inode, and we were able to use btrfs inspect-internal inode-resolve to determine that it was just some build artifact that wasn't needed.  I rigged up code on this branch

https://github.com/josefbacik/btrfs-progs/tree/for-returntrip

to go through and rip out the dangling references, and he was able to replace his bad memory and carry on.

The TODO's from this bugzilla are

1) Add tests for this specific failure case.
2) Figure out why btrfs check --repair couldn't fix this initially and fix that part. (It should have just linked in an empty inode)
3) Instead of rebuilding the inode in some cases we want to just remove it, so we need to provide this as an option to the user.

I'm going to close this BZ out as the original issue has been diagnosed and fixed, and I'm tracking the followup tasks internally.


Note You need to log in before you can comment on or make changes to this bug.