Created attachment 1983737 [details] picture of the error screen Description of problem: After power failure i get this error: BTRFS error: Device nvme0n1p6: state ....error=-n17 Object already exists (Failed to recover log tree) How can i fix the file system without losing data? Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: See attached picture.
$ sudo btrfsck /dev/nvme0n1p6 Opening filesystem to check... Checking filesystem on /dev/nvme0n1p6 UUID: 8476540f-ac0e-41a1-9ef9-7f833de63382 [1/7] checking root items [2/7] checking extents [3/7] checking free space cache [4/7] checking fs roots root 257 inode 1686811 errors 200, dir isize wrong root 257 inode 2722042 errors 1, no inode item unresolved ref dir 1686811 index 418215 namelen 15 name imjournal.state filetype 1 errors 5, no dir item, no inode ref root 257 inode 2722043 errors 1, no inode item unresolved ref dir 1686811 index 418217 namelen 15 name imjournal.state filetype 1 errors 5, no dir item, no inode ref ERROR: errors found in fs roots found 984442568704 bytes used, error(s) found total csum bytes: 826128480 total tree bytes: 3355852800 total fs tree bytes: 2182397952 total extent tree bytes: 187236352 btree space waste bytes: 631685011 file data blocks allocated: 4400592211968 referenced 1015906455552
$ sudo btrfsck --repair /dev/nvme0n1p6 enabling repair mode WARNING: Do not use --repair unless you are advised to do so by a developer or an experienced user, and then only after having accepted that no fsck can successfully repair all types of filesystem corruption. E.g. some software or hardware bugs can fatally damage a volume. The operation will start in 10 seconds. Use Ctrl-C to stop it. 10 9 8 7 6 5 4 3 2 1 Starting repair. Opening filesystem to check... Checking filesystem on /dev/nvme0n1p6 UUID: 8476540f-ac0e-41a1-9ef9-7f833de63382 repair mode will force to clear out log tree, are you sure? [y/N]: n *** Please advise as to what is the next step.
Due to the fact i can not use my computer, i can not access my user data, i've increased the severity of this ticket.
Switching to the right component...
Actually switch it to the kernel btrfs, since this is a kernel-space thing.
To get your machine back answer yes when it asks if it's ok to clear the log, you'll at most lose the last 30 seconds worth of changes to the disk. What kernel was this on? We had a bug in this area that was sent back to stable, it should have made it to all the relevant fedora kernels a while ago.
ok, thank you. Here are the results so far (after this post i'll reboot and check it is really back to life, then i'll report some more): $ sudo btrfsck --repair /dev/nvme0n1p6 enabling repair mode WARNING: Do not use --repair unless you are advised to do so by a developer or an experienced user, and then only after having accepted that no fsck can successfully repair all types of filesystem corruption. E.g. some software or hardware bugs can fatally damage a volume. The operation will start in 10 seconds. Use Ctrl-C to stop it. 10 9 8 7 6 5 4 3 2 1 Starting repair. Opening filesystem to check... Checking filesystem on /dev/nvme0n1p6 UUID: 8476540f-ac0e-41a1-9ef9-7f833de63382 repair mode will force to clear out log tree, are you sure? [y/N]: y [1/7] checking root items Fixed 0 roots. [2/7] checking extents super bytes used 984442552320 mismatches actual used 984442568704 No device size related problem found [3/7] checking free space cache cache and super generation don't match, space cache will be invalidated [4/7] checking fs roots Deleting bad dir index [1686811,96,418215] root 257 Deleting bad dir index [1686811,96,418217] root 257 [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) found 1968885121024 bytes used, no error found total csum bytes: 1652256960 total tree bytes: 6711689216 total fs tree bytes: 4364795904 total extent tree bytes: 374456320 btree space waste bytes: 1263353830 file data blocks allocated: 8801184423936 referenced 2031812911104 Then i checked the file system once more: $ sudo btrfsck /dev/nvme0n1p6 Opening filesystem to check... Checking filesystem on /dev/nvme0n1p6 UUID: 8476540f-ac0e-41a1-9ef9-7f833de63382 [1/7] checking root items [2/7] checking extents [3/7] checking free space cache cache and super generation don't match, space cache will be invalidated [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) found 984442568704 bytes used, no error found total csum bytes: 826128480 total tree bytes: 3355852800 total fs tree bytes: 2182397952 total extent tree bytes: 187219968 btree space waste bytes: 631683789 file data blocks allocated: 4400592211968 referenced 1015906455552 All these were done from the F38 workstation livecd (updated in 1st of august, as linked in IRC #fedora channel).
ok, computer is back. THANK YOU! here's the kernel: $ uname -a Linux fedora 6.4.7-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jul 27 20:01:18 UTC 2023 x86_64 GNU/Linux Probably nothing was lost since it was early in the morning and i've done no real work at that time.
Hmmm, maybe i need some time to adapt to the new reality but, is it possible that more than the last 30 seconds was lost? My local git repos seem to be in a rather old state. Tomorrow (Bucharest time) i'll compare to the upstream and let you know if my current perception is correct.
Created attachment 1983930 [details] screenshot of bad colours in gnome terminal
Sorry about this, it seems that attaching a file discards the current comment :/ Here's the discarded comment: " Before anything else, i did a mistake: seeing i'm behind with updates, immediately after recovering the filesystem i did a dnf upgrade. I'll attach the updated package list and one screenshot. Then, the git repo looks good (just too many untracked files made me see red :)). However, there's a bunch of applications that have display problems, like the font color is wrong in gnome terminal, or the applications bar colors is wrong for example in screenshot application, gnome terminal, firefox or chrome. Because i did that upgrade , i can not tell if it's caused by the upgrade, caused by the btrfs issue, or any other reason. After sending this update i'll create a new user and check if there everything is ok. " I'll come back with some more picture(s). Also with report from status for new user.
Created attachment 1983932 [details] pic1
Created attachment 1983933 [details] pic2
Created attachment 1983934 [details] pic3 This three pictures shows the problematic behaviour for the user that was logged in when the btrfs was affected by the power failure and the correct behaviour for the user whic was created after btrfs was fixed.
How can i check if indeed i've lost last 30 seconds of changes and not last three days of changes? I believe the problem here is bigger than '#SomeUSer has lost his files'. For one thing, this may happen to anyone and then , if a big enough number of users have this problem, the pressure on Fedora project may be way bigger. Then, can the messages from btrfs tools be more user friendly and less scary? For example, is there any doc where an user can understand what does it mean that the log tree will be 'clear out'? If user decides to wait for a '(btrfs?) developer or an experienced (btrfs?) user' to provide feedback but such a feedback never comes, what are the user alternatives? I understand that despite RedHat giving up on btrfs as a technology preview, btrfs has certain qualities that convinced Fedora project to use it as the default filesystem. Ad it certainly served me well till the point where i've met the reported problem. It could be useful if this would be accompanied not only by better tools by but also a better (or more visible) documentation. Another example would be: when installing Fedora, provide some inline summary on to why to choose btrfs and why to choose some other filesystem. Also provide some external links, usable mostly when installing from live CD. Beside reporting this problem and providing my 2 cents ideas, what else can i do to improve this situation?
The messages are a bit unfriendly, I will send patches to make the tooling less scary. Additionally fsck with --check should indeed allow for the log to be cleared without asking first. I will update this as well. Unfortunately you got hit with a bug in the logging code, the bug was short lived upstream, but was still there if you didn't upgraded your kernel after an update, which is a common occurrence. As for the rest of your symptoms, those are unlikely related to the file system, just unhappy coincidence. The fsck did the correct thing in fixing your file system, it simply updated the incorrect directory index entries, which don't affect actual files, simply are a readdir optimization. The tree log will only have what would have happened in up to the last 30 seconds, so that's all you would have truly lost. I agree, the recovery tools for btrfs are relatively scary, the hope is they only have to be brought out in extreme cases. We will put some effort into documenting this and making it less terrifying when they do have to be used.