| Summary: | [abrt] kernel: kernel BUG at fs/btrfs/extent-tree.c:1401!: TAINTED Die | ||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Milan Bouchet-Valat <nalimilan> | ||||||||||||||||||||||||||||||||||||||||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||||||||||||||||||||||||||||||||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||||||||||||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||||||||||||||||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||||||||||||||||||||||||||||
| Version: | 15 | CC: | dwmw2, gansalmon, itamar, jbacik, jonathan, kernel-maint, madhu.chinakonda | ||||||||||||||||||||||||||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||||||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||||||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||||||||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||||||||||||||||||||||||||
| URL: | https://bugzilla.kernel.org/show_bug.cgi?id=34292 | ||||||||||||||||||||||||||||||||||||||||||||
| Whiteboard: | abrt_hash:0b414105677dfec81abc94555f7021ae1c6bcb80 | ||||||||||||||||||||||||||||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||||||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||||||||||||||||||||||||
| Last Closed: | 2011-12-31 11:37:56 UTC | Type: | --- | ||||||||||||||||||||||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||
|
Description
Milan Bouchet-Valat
2011-04-30 13:00:54 UTC
Created attachment 495963 [details]
File: backtrace
Very funny: I installed the experimental 64-bit Flash plugin, and the oops reappeared when loading a page with a Flash video. I removed /usr/lib64/mozilla/plugins/libflashplayer.so, and it stopped. I could also reproduce it under Ubuntu 10.10, kernel 2.6.37 (still Firefox+Flash). [129170.603442] kernel BUG at fs/btrfs/extent-tree.c:1401!
RIP: 0010:[<ffffffffa049414b>] [<ffffffffa049414b>] lookup_inline_extent_backref+0xa4/0x31e [btrfs]
ret = btrfs_search_slot(trans, root, &key, path, extra_size, 1);
if (ret < 0) {
err = ret;
goto out;
}
BUG_ON(ret);
It bugged because ret == 1
Given it also happens in Ubuntu, I've filed it upstream as https://bugzilla.kernel.org/show_bug.cgi?id=34292 Ok if you can reproduce will pull from my tree git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work.git and build it and see if it reproduces there? If it does (it still should, I don't think we've fixed this) I can give you debug patches to try and figure out whats going on. Thanks. OK, I've tried it and it still crashes. So I'm eager to test your patches! ;-) Ping! :-) I'd really like to test these patches quickly so that I can get rid of the bug by removing those files - not being able to use Flash is painful (videos...). Right sorry, you are the first thing on my list tomorrow morning. Created attachment 499666 [details]
debug patch
Ok apply this patch and run it. It will still panic, just attach dmesg so I can see whats going on.
Created attachment 499873 [details]
Relevant output from dmesg with first patch
Wow! I didn't think it would cost me so much. I've been fighting with my wireless for more than two hours: for some reason, the kernel crash blocked it off, and the switch and rfkill wouldn't get it up. I tried everything until I found out the ugly hp tool under Windows had a button to turn it back on.
But eventually I'm back here, with the logs. Hope it really helps! ;-)
Created attachment 500059 [details]
Another debug patch
Well thats weird, we're trying to modify byte 0, which shouldn't ever happen really, this will catch the guy trying to do it. This will panic your box again, so please provide me with the dmesg again after running this.
Created attachment 500174 [details]
Backtrace with second patch
Not sure this is what you want, as I can find where's the new information... ;-)
Anyway, if that's not correct, just ask and I'll retry.
Created attachment 500461 [details]
Yet another debug patch
Here's another one. The last one was perfect, it looks like it isn't necessarily a bug in the delayed ref stuff but probably some sort of disk corruption. This debug patch will be a bit more verbose, but again will still panic. I'll need the entire bit, because I'll print out a bunch of stuff before it panics.
Created attachment 500522 [details]
/var/log/messages with third patch
Here's the new delivery... ;-)
Created attachment 500659 [details]
More debugging
Hrm sorry it looks like its a node thats corrupt, not a leaf, so this will print out the right information and not make 10000 different panics happen, just the normal one we expect.
Created attachment 500674 [details]
/var/log/messages with fourth patch
Just need to ask!
key 41 (72537 84 1075557500) block 72660422656 key 42 (72537 84 4135981418) block 1061535744 key 43 (72537 96 54) block 0 key 44 (72537 84 1226029824) block 39102320640 Ok so something went a little sideways here, these keys are not in the right order at all. Since you aren't getting checksum errors or anything else and it appears you have barriers enabled I can't imagine how this happened other than we have a horrible bug somewhere. First things first will be to fix your file system, is your entire fs on btrfs, or is it just your home directory? Also has anything happened in the past that would possibly corrupt your fs? Please say there was :). Only /home is brtfs (/ is ext4). I have another btrfs partition that contains an Ubuntu root, but it's not even mounted. I've found something weird in the 'mount' output though: this line is duplicated: /dev/sda6 on /home type btrfs (rw,relatime,seclabel) My /etc/fstab line for it is UUID=6f5d42c3-52fd-474b-88ec-756bbf64dd1b /home btrfs defaults,subvol=home.snapshot 1 2 FWIW, the btrfs partition originally contained my /home in the root subvolume, and I created a snapshot of it when moving to Fedora 15, to avoid messing with my old settings in case I had to switch back. I'm not aware of a special breakage that should have corrupted my partition. It was created during the clean install of Ubuntu 10.10 back in March, and I've never had problems until I did that snapshot and got this bug. Maybe I experienced a few kernel crashes, I don't precisely remember, but I can tell you I didn't touch that partition when installing Fedora 15 afterwards (no resize, etc.). Ah, maybe one explanation: Anaconda crashed badly when trying to resize another partition, and left my partition table completely destroyed, so I had to recover it using TestDisk. It went really well, and I recovered all my old partitions. AFAIK, this shouldn't have affected at all the actual content of the partitions. Is it possible that the partition was recreated with an offset of a few blocks, without preventing it from being mounted correctly, and that it lead to weird bugs? So I don't think this is accidental corruption, it really feels like we have a bug somewhere. Unfortunately without knowing how to reproduce it the only thing I can do is review the code and hope the bug jumps out at me. Now as for your file system, it being separate from your / fs is perfect. I think I can put it mostly back together, but since your fs is the only one thats broken like this, I only have one shot to get it right and not make it a lot worse :). So I need you to clone this git tree git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git and run make btrfs-image and then make sure your /home isn't mounted, and run (as root of course) ./btrfs-image -c 9 -t <number of cpus you have> /dev/sda6 home.img and then upload home.img somewhere that I can pull it down. Don't worry it's just a metadata dump, I will only have data that is small enough to fit into a leaf and file names. I need this so I can write a program to put your tree back together properly, this will give me something to run my program against so I can make sure it does the right thing. Now if you don't feel comfortable giving me this, the other option is to back everything up except whats in your mozilla folder (I think thats where the corruption is) and then I will give you a program to fix it and hope it doesn't make things worse. I'm sending you an e-mail with the link to the image. Do you think there's a chance you'll find the bug, or just fix the partition? I will try and find the bug, but I'm going to focus on fixing your partition first. Any news on that? Can you just help me to remove the guilty directory? ;-) Created attachment 510495 [details]
Repair program
Sorry about that, I got distracted with other things :). So I can't test this on your image because of the way we create images. So you can
a) trust that i didn't make any mistakes and just run this on your fs
b) take backups first
Once you choose one of those, clone the btrfs-progs-unstable git tree from git.kernel.org and then apply this patch and run make. Once you do that you can run
./repair /your/device
note your disk has to be unmounted, so if you may have to boot into rescue mode to do this. Once it's done you should be good to go.
Created attachment 510496 [details]
Fixed repair program
Hah see I did make a mistake, so really go for option b :).
Sorry, doesn't seem to work... :-/ Thanks for your work!
Program received signal SIGABRT, Aborted.
0x00000034b4a352d5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
#0 0x00000034b4a352d5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00000034b4a36beb in abort () at abort.c:93
#2 0x00000034b4a2dc5e in __assert_fail_base (fmt=<optimized out>, assertion=0x4139d1 "!(ret)", file=0x414495 "extent-tree.c", line=<optimized out>,
function=<optimized out>) at assert.c:96
#3 0x00000034b4a2dd02 in __assert_fail (assertion=0x4139d1 "!(ret)", file=0x414495 "extent-tree.c", line=1042, function=0x414960 "lookup_inline_extent_backref")
at assert.c:105
#4 0x0000000000407f60 in lookup_inline_extent_backref (trans=<optimized out>, root=0x61b1b0, path=0x1c293d0, ref_ret=0x7fffffffe1e0, bytenr=0,
num_bytes=<optimized out>, parent=71470424064, root_objectid=5, owner=0, offset=0, insert=1) at extent-tree.c:1062
#5 0x00000000004095d8 in insert_inline_extent_backref (offset=0, owner=0, root_objectid=5, parent=71470424064, num_bytes=4096, bytenr=0, path=0x1c293d0, root=0x61b1b0,
trans=0x6f7850, refs_to_add=<optimized out>) at extent-tree.c:1308
#6 btrfs_inc_extent_ref (trans=0x6f7850, root=0x6f6640, bytenr=0, num_bytes=4096, parent=71470424064, root_objectid=5, owner=0, offset=0) at extent-tree.c:1382
#7 0x0000000000407abd in __btrfs_mod_ref (trans=0x6f7850, root=0x6f6640, buf=0x17f44b0, record_parent=<optimized out>, inc=<optimized out>) at extent-tree.c:1602
#8 0x000000000040270f in update_ref_for_cow (cow=0x1b1f5e0, buf=0x17f44b0, root=0x6f6640, trans=0x6f7850) at ctree.c:217
#9 __btrfs_cow_block (trans=0x6f7850, root=0x6f6640, buf=0x17f44b0, parent=0x17e49c0, parent_slot=3, cow_ret=0x7fffffffe430, search_start=70866960384, empty_size=0)
at ctree.c:305
#10 0x00000000004029e7 in btrfs_cow_block (trans=<optimized out>, root=<optimized out>, buf=<optimized out>, parent=<optimized out>, parent_slot=<optimized out>,
cow_ret=<optimized out>) at ctree.c:371
#11 0x00000000004038b4 in btrfs_search_slot (trans=0x6f7850, root=0x6f6640, key=0x7fffffffe4de, p=0x61b920, ins_len=-1, cow=1) at ctree.c:1214
#12 0x000000000040145f in main (argc=<optimized out>, argv=<optimized out>) at repair.c:81
Heh great, while trying to fix the problem it trips over the corrupt area and blows up. I'm on vacation this week but I will try to get to this tonight and just rig up a manual search that won't actually blow up. FWIW, the corruption has now extended to other files without any apparent reason: Some Evolution and Firefox config files are now broken too. One possibly interesting thing is that even the guest user account I was using to work around the problem got corrupted after I crashed the machine: my normal account and the guest account were open at the same time, and the former crashed because of the btrfs corruption. So it's possible that the breakage comes from a kernel crash happening when those files are being written to. Maybe I we find a procedure to check this idea. Anyway, I'm moving the non-broken files to another partition before it blows up completely. ;-) I'd really like a way to fix the partition, but if you prefer, just give me a way to identify the corrupt files, and I'll copy everything except them, and erase the partition. The big problem ATM is that I cannot do full backups since the system crashes when broken files are read! :-/ The workaround I found is to remove read permissions, but I don't know exactly what files are affected. The crash I'm getting when trying to copy the files that broke recently (cf. comment #27) is different from the original one. See the attached excerpt from dmesg. There are four different successive traces: WARNING: at fs/btrfs/ctree.c:2297 leaf_space_used+0x6f/0x7f [btrfs]() WARNING: at fs/btrfs/ctree.c:2297 leaf_space_used+0x6f/0x7f [btrfs]() BUG: scheduling while atomic: cp/1447/0x10000001 BUG: sleeping function called from invalid context at kernel/rwsem.c:21 The partition has become so broken now that I can't copy most toplevel directories from my home dir, they all contain a few corrupt files that I couldn't identify. Created attachment 518152 [details]
Dmesg for new crashes
Ping? Would you give me a clue about how to identify the corrupt files? I really need to remove that partition that takes up all of my disk space... ;-)
I'm reattaching the trace I spoke about in my last comment, looks like it didn't work last time.
Created attachment 518553 [details]
An update repair patch
Ok apply this and rebuild and re-run the repair program. It will still blow up but it will spit out what it blew up on so I can figure out how to work around/fix whats broken.
The only output is: repair: extent-tree.c:1042: lookup_inline_extent_backref: Assertion `!(ret)' failed. But it doesn't seem to provide more information than before... (I checked I really applied the correct patch, the debugging lines in extent-tree.c are present and I ran make clean to be sure.) I can run other tests if you need them. Thanks again for your attention! Created attachment 518668 [details]
Incremental patch
Just apply this over the top of what you already have, it's just incremental.
Created attachment 518733 [details] Output of repair with patch from comment #32 Seems to write something interesting now... ;-) Ok so I was going to write my own search function to just get to what we need and not do all the safety checks, but I think that if I just use the normal search function and tell it I'm not going to modify the tree and then modify it anyway it should work fine. So will you open repair.c, and look for the line
ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
and change it to
ret = btrfs_search_slot(trans, root, &key, path, 0, 0);
and rebuild and try again. It's horribly evil, but I think it will work out just fine.
Well, it fails somewhere else:
(gdb) run /dev/sda6
getting key from slot 39
getting key from slot 40
getting key from slot 41
getting key from slot 42
getting key from slot 43
Found node with bytner of 0, deleting
getting key from slot 43
getting key from slot 43
getting key from slot 44
parent transid verify failed on 71470424064 wanted 130995 found 77431
Program received signal SIGABRT, Aborted.
0x0000003f61c352d5 in raise () from /lib64/libc.so.6
(gdb) ba
#0 0x0000003f61c352d5 in raise () from /lib64/libc.so.6
#1 0x0000003f61c36beb in abort () from /lib64/libc.so.6
#2 0x0000000000405d5e in write_tree_block (trans=0x701170, root=0x6fff60,
eb=0x702220) at disk-io.c:238
#3 0x0000000000405e90 in __commit_transaction (trans=0x701170, root=0x6fff60)
at disk-io.c:354
#4 0x0000000000405f85 in btrfs_commit_transaction (trans=0x701170,
root=0x6fff60) at disk-io.c:385
#5 0x0000000000401655 in main (argc=<optimized out>, argv=<optimized out>)
at repair.c:158
Ah yes that's just a safety check to make sure we copy-on-writed the block we are writing out. But we know we aren't doing that, we're just modifying the block in place. Normally this is dangerous, but really your fs is screwed up anyway, how much worse could it get :). So just go into disk-io.c in write_tree_block and comment out this area
if (!btrfs_buffer_uptodate(eb, trans->transid))
BUG();
and then you should be a-ok.
OK, now it works, but that doesn't fix the problem... :-/ It says: ./repair /dev/sda6 getting key from slot 39 getting key from slot 40 getting key from slot 41 getting key from slot 42 getting key from slot 43 getting key from slot 44 (There was another line the first time about fixing something, but I lost it after the kernel crash.) If I run 'find' on the mounted partition, I still get the lookup_inline_extent_backref bug. Can you give me the output when you hit that bug, there may be other parts in your fs that are broken and we're just hitting that. You'll want the patches that I gave you that print out the leaf when it hits that bug in place so I can see whats going on. Here they are: Aug 19 17:21:38 milan kernel: [ 2338.818690] Call Trace: Aug 19 17:21:38 milan kernel: [ 2338.821679] [<ffffffff811118ab>] ? kmem_cache_ alloc+0x90/0x105 Aug 19 17:21:38 milan kernel: [ 2338.824726] [<ffffffffa00e5774>] __btrfs_free_ extent+0xc0/0x55d [btrfs] Aug 19 17:21:38 milan kernel: [ 2338.827764] [<ffffffff8146f92d>] ? __slab_free +0x27/0xeb Aug 19 17:21:38 milan kernel: [ 2338.830829] [<ffffffffa0126365>] ? btrfs_delay ed_ref_lock+0x3f/0x9d [btrfs] Aug 19 17:21:38 milan kernel: [ 2338.833904] [<ffffffffa00e8771>] run_clustered _refs+0x615/0x672 [btrfs] Aug 19 17:21:38 milan kernel: [ 2338.836994] [<ffffffffa0126400>] ? btrfs_find_ ref_cluster+0x3d/0x145 [btrfs] Aug 19 17:21:38 milan kernel: [ 2338.840088] [<ffffffffa00e889f>] btrfs_run_del ayed_refs+0xd1/0x193 [btrfs] Aug 19 17:21:38 milan kernel: [ 2338.843184] [<ffffffffa00f5026>] __btrfs_end_t ransaction+0x6f/0x1f6 [btrfs] Aug 19 17:21:38 milan kernel: [ 2338.846283] [<ffffffffa00f51f0>] btrfs_end_tra nsaction+0x15/0x17 [btrfs] Aug 19 17:21:38 milan kernel: [ 2338.849396] [<ffffffffa00fdd0b>] btrfs_dirty_inode+0xfe/0x107 [btrfs] Aug 19 17:21:38 milan kernel: [ 2338.852479] [<ffffffff8113eb8c>] __mark_inode_dirty+0x2e/0x167 Aug 19 17:21:38 milan kernel: [ 2338.855551] [<ffffffff8113483f>] touch_atime+0x10e/0x131 Aug 19 17:21:38 milan kernel: [ 2338.858597] [<ffffffff8112f5d3>] ? filldir+0x0/0xc7 Aug 19 17:21:38 milan kernel: [ 2338.861639] [<ffffffff8112f8a8>] vfs_readdir+0x8c/0xac Aug 19 17:21:38 milan kernel: [ 2338.864675] [<ffffffff8112f9ae>] sys_getdents+0x7e/0xce Aug 19 17:21:38 milan kernel: [ 2338.867721] [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b Created attachment 519091 [details]
Dmesg for remaining crashes (patched kernel)
Ah, sorry, I didn't read you comment carefully enough. Here's the output you requested, obtained with the patched kernel.
Created attachment 520409 [details]
New and improved repair tool
Ok so I spent all last week writing a more comprehensive repair tool for a different bug that I'd like you to run. Please run it with -d which means dry-run and attach the output to this bz so I can see what it's going to do, then I can decide if it's going to be safe to let it try and fix stuff for you :).
Nice, another one, and better! ;-)
It crashes after running for a while:
Starting program: /home/milan/Dev/btrfs-progs-unstable/repair /dev/sda6 -d
Checking extent root
Bad item end value, attempting to fix
Bad item end value, attempting to fix
Neighbor is bad too, will come back and try again
leaf 112747028480 items 18 free space 2627 generation 130677 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (1413240 6c 17547264) level 0
tree block backref root 256
item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (98660622336 a8 8192) level 0
tree block backref root 2
item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (98152087552 a8 4096) level 0
tree block backref root 2
item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (104854753280 a8 8192) level 0
tree block backref root 2
item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (98147524608 a8 8192) level 0
tree block backref root 2
item 5 key (112430071808 EXTENT_ITEM 12297828649465286656) itemoff -1431655120 itemsize -1431655936
Program received signal SIGSEGV, Segmentation fault.
print_extent_item (slot=5, eb=0x1e3e6d0) at print-tree.c:190
190 flags = btrfs_extent_flags(eb, ei);
Missing separate debuginfos, use: debuginfo-install glibc-2.14-4.x86_64 libuuid-2.19.1-1.4.fc15.x86_64
(gdb) ba
#0 print_extent_item (slot=5, eb=0x1e3e6d0) at print-tree.c:190
#1 btrfs_print_leaf (root=<optimized out>, l=0x1e3e6d0) at print-tree.c:529
#2 0x0000000000402429 in check_leaf (path=0x61c920, root=0x61c1b0)
at repair.c:530
#3 check_children (root=0x61c1b0, path=0x61c920, level=1) at repair.c:596
#4 0x000000000040219f in check_children (root=0x61c1b0, path=0x61c920,
level=2) at repair.c:590
#5 0x000000000040219f in check_children (root=0x61c1b0, path=0x61c920,
level=3) at repair.c:590
#6 0x0000000000401572 in main (argc=<optimized out>, argv=<optimized out>)
at repair.c:778
Argh ok I finally got the image stuff working so I can run the repair tool locally against an image (at least it appears to be working anyway). I cannot find the image you sent me originally anywhere, can you recreate the image and put it somewhere for me to pull down so I can run the repair tool and make sure it's working right? Like I said in my comment 27, the partition is now even in a worse state than before. I can't even run btrfs-image now, it crashes: Program received signal SIGSEGV, Segmentation fault. create_metadump (compress_level=<optimized out>, num_threads=<optimized out>, out=0x61d010, input=<optimized out>) at btrfs-image.c:537 537 if (btrfs_extent_flags(leaf, ei) & Missing separate debuginfos, use: debuginfo-install glibc-2.14-4.x86_64 libuuid-2.19.1-1.4.fc15.x86_64 (gdb) ba #0 create_metadump (compress_level=<optimized out>, num_threads=<optimized out>, out=0x61d010, input=<optimized out>) at btrfs-image.c:537 #1 main (argc=<optimized out>, argv=<optimized out>) at btrfs-image.c:875 Created attachment 521232 [details]
Next iteration of the repair tool
Ugh sorry about that. Here's a new repair tool that should only print out the bad parts and hopefully not segfault. Make sure to run it with -d, once I get a full look at everything that's wrong I will double check all of my code and then you can run without the -d and hopefully it will fix everything.
No need to feel sorry... Anyways, now it works: Bad item end value, attempting to fix Bad item end value, attempting to fix Neighbor is bad too, will come back and try again Bad item end value, attempting to fix Bad item end value, attempting to fix Previous neighbor is bad, will come back and try again later Bad item end value, attempting to fix Bad item end value, attempting to fix Neighbor is bad too, will come back and try again Bad item end value, attempting to fix Bad item end value, attempting to fix Previous neighbor is bad, will come back and try again later Couldn't fixup leaf Checking extent root bytenr 112747028480 item 5 key (112430071808 EXTENT_ITEM 12297828649465286656) itemoff -1431655120 itemsize -1431655936 bytenr 112747028480 item 6 key (12297828652328593920 UNKNOWN 48038393173158570) itemoff 11185626 itemsize 51 bytenr 112747028480 item 5 key (112430071808 EXTENT_ITEM 12297828649465286656) itemoff -1431655120 itemsize -1431655936 bytenr 112747028480 item 6 key (12297828652328593920 UNKNOWN 48038393173158570) itemoff 11185626 itemsize 51 Ping again? :-) Created attachment 523624 [details]
A new repair program
Ok here's a new one, and at this point I think it's time to create a public git tree you can just pull instead of me constantly putting patches up here :). Run again with the dry-run. This will try to fix things but won't actually write the fixes, and will print out the fixed leaf to make sure we fixed it properly. We're getting there.
Here's the new output. Looks like it's not happy yet - my partition must be a hard case, but I can't really tell... :-) Checking extent root Bad key offset, deleting 12297828649465286656 Bad item end value, attempting to fix bytenr 112747028480 item 5 key (12297828652328593920 UNKNOWN 48038393173158570) itemoff 3638 itemsize 102 Keys in the wrong order [objectid], swapping 5 Keys are out of order in a leaf, this program cant fix that yet, tell the author so he can get off his lazy ass and fix that leaf 112747028480 items 17 free space 2652 generation 130677 owner 2 fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (1413240 6c 17547264) level 0 tree block backref root 256 item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98660622336 a8 8192) level 0 tree block backref root 2 item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98152087552 a8 4096) level 0 tree block backref root 2 item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (104854753280 a8 8192) level 0 tree block backref root 2 item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98147524608 a8 8192) level 0 tree block backref root 2 item 5 key (12297828652328593920 UNKNOWN 48038393173158570) itemoff 3638 itemsize 102 item 6 key (112430448640 EXTENT_ITEM 4096) itemoff 3587 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98100707328 a8 12288) level 0 tree block backref root 2 item 7 key (112430551040 EXTENT_ITEM 4096) itemoff 3536 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110543069184) level 0 tree block backref root 7 item 8 key (112430637056 EXTENT_ITEM 4096) itemoff 3485 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (96611688448 a8 131072) level 0 tree block backref root 2 item 9 key (112430641152 EXTENT_ITEM 4096) itemoff 3434 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 96621641728) level 0 tree block backref root 7 item 10 key (112430645248 EXTENT_ITEM 4096) itemoff 3383 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 96619077632) level 0 tree block backref root 7 item 11 key (112430706688 EXTENT_ITEM 4096) itemoff 3332 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110681280512) level 0 tree block backref root 7 item 12 key (112430710784 EXTENT_ITEM 4096) itemoff 3281 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110673211392) level 0 tree block backref root 7 item 13 key (112430809088 EXTENT_ITEM 4096) itemoff 3230 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110719135744) level 0 tree block backref root 7 item 14 key (112430813184 EXTENT_ITEM 4096) itemoff 3179 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110718173184) level 0 tree block backref root 7 item 15 key (112430989312 EXTENT_ITEM 4096) itemoff 3128 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (505416 c 504678) level 0 tree block backref root 256 item 16 key (112431054848 EXTENT_ITEM 4096) itemoff 3077 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (507404 1 0) level 0 tree block backref root 256 Oops I had a check in there that was wrong, here is my git tree git://github.com/josefbacik/btrfs-progs.git you can clone that and just pull from it when I fix stuff and hopefully that will make this process a little simpler. Yeah, it will be simpler for everybody. I've tried it, and it gives the very same result despite commit 74e7c57. Was it intended to change something? Ugh sorry, pushed an update, give that a shot. Still the same with commit 399173... :-/ I get warnings, if that's of any help: repair.c: In function ‘fix_leaf_item’: repair.c:361:6: attention : variable ‘did_cow’ set but not used [-Wunused-but-set-variable] repair.c: In function ‘verify_extent_item’: repair.c:466:6: attention : unused variable ‘ret’ [-Wunused-variable] Oh I see, that key has a bad type, I just pushed a fix that should catch that. There's some progress! Checking extent root Bad key offset, deleting 12297828649465286656 Invalid key type, deleting key (12297828652328593920 0 48038393173158570) Fixed something, dumping leaf to make sure it looks right leaf 112747028480 items 16 free space 2677 generation 130677 owner 2 fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (1413240 6c 17547264) level 0 tree block backref root 256 item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98660622336 a8 8192) level 0 tree block backref root 2 item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98152087552 a8 4096) level 0 tree block backref root 2 item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (104854753280 a8 8192) level 0 tree block backref root 2 item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98147524608 a8 8192) level 0 tree block backref root 2 item 5 key (112430448640 EXTENT_ITEM 4096) itemoff 3587 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98100707328 a8 12288) level 0 tree block backref root 2 item 6 key (112430551040 EXTENT_ITEM 4096) itemoff 3536 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110543069184) level 0 tree block backref root 7 item 7 key (112430637056 EXTENT_ITEM 4096) itemoff 3485 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (96611688448 a8 131072) level 0 tree block backref root 2 item 8 key (112430641152 EXTENT_ITEM 4096) itemoff 3434 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 96621641728) level 0 tree block backref root 7 item 9 key (112430645248 EXTENT_ITEM 4096) itemoff 3383 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 96619077632) level 0 tree block backref root 7 item 10 key (112430706688 EXTENT_ITEM 4096) itemoff 3332 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110681280512) level 0 tree block backref root 7 item 11 key (112430710784 EXTENT_ITEM 4096) itemoff 3281 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110673211392) level 0 tree block backref root 7 item 12 key (112430809088 EXTENT_ITEM 4096) itemoff 3230 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110719135744) level 0 tree block backref root 7 item 13 key (112430813184 EXTENT_ITEM 4096) itemoff 3179 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110718173184) level 0 tree block backref root 7 item 14 key (112430989312 EXTENT_ITEM 4096) itemoff 3128 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (505416 c 504678) level 0 tree block backref root 256 item 15 key (112431054848 EXTENT_ITEM 4096) itemoff 3077 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (507404 1 0) level 0 tree block backref root 256 Finding fs roots Checking fs roots Checking root 5 Checking root 5 refs Checking root 256 Checking root 256 refs Couldn't find an extent ref for bytenr 112430080000 Ok thats perfect, now we just need to fix the leaf to have the data packed properly, I will work on that now. Ok give the new batch a run, this should delete the keys and reorder the items properly. Once that is all working right the next step will to be to run without -d, so we're almost there :). It crashes... :-/
(gdb) run -d /dev/sda6
Checking extent root
Leaf items aren't quite in the right order, fixing
Fixed something, dumping leaf to make sure it looks right
leaf 114049122304 items 40 free space 266 generation 131008 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
item 0 key (0 BLOCK_GROUP_ITEM 4194304) itemoff 3971 itemsize 24
block group used 0 chunk_objectid 256 flags 4
item 1 key (4194304 BLOCK_GROUP_ITEM 8388608) itemoff 3947 itemsize 24
block group used 1142461300736 chunk_objectid 0 flags 4294967296
item 2 key (12582912 EXTENT_ITEM 446464) itemoff 3865 itemsize 82
extent refs 8388608 gen 256 flags 1
Program received signal SIGABRT, Aborted.
0x0000003f61c352d5 in raise () from /lib64/libc.so.6
(gdb) ba
#0 0x0000003f61c352d5 in raise () from /lib64/libc.so.6
#1 0x0000003f61c36beb in abort () from /lib64/libc.so.6
#2 0x000000000040d17c in btrfs_extent_inline_ref_size (type=<optimized out>)
at ctree.h:1208
#3 print_extent_item (slot=2, eb=0x62ee40) at print-tree.c:244
#4 btrfs_print_leaf (root=<optimized out>, l=0x62ee40) at print-tree.c:529
#5 0x0000000000402a07 in check_leaf (path=0x61d920, root=0x61d1b0)
at repair.c:726
#6 check_children (root=0x61d1b0, path=0x61d920, level=1) at repair.c:764
#7 0x00000000004022d3 in check_children (root=0x61d1b0, path=0x61d920,
level=2) at repair.c:758
#8 0x00000000004022d3 in check_children (root=0x61d1b0, path=0x61d920,
level=3) at repair.c:758
#9 0x000000000040148a in main (argc=<optimized out>, argv=<optimized out>)
at repair.c:944
bah sorry about that, this one should be right. No worries, here's the new log: Bad key offset, deleting 12297828649465286656 Invalid key type, deleting key (12297828652328593920 0 48038393173158570) Leaf items aren't quite in the right order, fixing Checking extent root Fixed something, dumping leaf to make sure it looks right leaf 112747028480 items 16 free space 2677 generation 130677 owner 2 fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (1413240 6c 17547264) level 0 tree block backref root 256 item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98660622336 a8 8192) level 0 tree block backref root 2 item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98152087552 a8 4096) level 0 tree block backref root 2 item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (104854753280 a8 8192) level 0 tree block backref root 2 item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98147524608 a8 8192) level 0 tree block backref root 2 item 5 key (112430448640 EXTENT_ITEM 4096) itemoff 3587 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (96611688448 a8 131072) level 0 tree block backref root 2 item 6 key (112430551040 EXTENT_ITEM 4096) itemoff 3536 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 96621641728) level 0 tree block backref root 7 item 7 key (112430637056 EXTENT_ITEM 4096) itemoff 3485 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 96619077632) level 0 tree block backref root 7 item 8 key (112430641152 EXTENT_ITEM 4096) itemoff 3434 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110681280512) level 0 tree block backref root 7 item 9 key (112430645248 EXTENT_ITEM 4096) itemoff 3383 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110673211392) level 0 tree block backref root 7 item 10 key (112430706688 EXTENT_ITEM 4096) itemoff 3332 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110719135744) level 0 tree block backref root 7 item 11 key (112430710784 EXTENT_ITEM 4096) itemoff 3281 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110718173184) level 0 tree block backref root 7 item 12 key (112430809088 EXTENT_ITEM 4096) itemoff 3230 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (505416 c 504678) level 0 tree block backref root 256 item 13 key (112430813184 EXTENT_ITEM 4096) itemoff 3179 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (507404 1 0) level 0 tree block backref root 256 item 14 key (112430989312 EXTENT_ITEM 4096) itemoff 3128 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (505416 c 504678) level 0 tree block backref root 256 item 15 key (112431054848 EXTENT_ITEM 4096) itemoff 3077 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (507404 1 0) level 0 tree block backref root 256 Couldn't find an extent ref for bytenr 112430080000 Finding fs roots Checking fs roots Checking root 5 Checking root 5 refs Checking root 256 Checking root 256 refs Man thank God I don't work on anything important like a file system or anything. I just pushed a fix that will actually update the items so they point to the right damn data, give that a run and hopefully that will work out right. You mean I'm really fool to trust you? Maybe you're right, there's another issue now... :-D
(gdb) run -d /dev/sda6
Checking extent root
Bad key offset, deleting 12297828649465286656
Invalid key type, deleting key (12297828652328593920 0 48038393173158570)
Leaf items aren't quite in the right order, fixing
Keys in the wrong order [objectid], swapping 1
Keys are out of order in a leaf, this program cant fix that yet, tell the author so he can get off his lazy ass and fix that
repair: ctree.c:1688: leaf_space_used: Assertion `!(data_len < 0)' failed.
Program received signal SIGABRT, Aborted.
0x0000003f61c352d5 in raise () from /lib64/libc.so.6
(gdb) ba
#0 0x0000003f61c352d5 in raise () from /lib64/libc.so.6
#1 0x0000003f61c36beb in abort () from /lib64/libc.so.6
#2 0x0000003f61c2dc5e in __assert_fail_base () from /lib64/libc.so.6
#3 0x0000003f61c2dd02 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000403417 in leaf_space_used (l=<optimized out>,
start=<optimized out>, nr=<optimized out>) at ctree.c:1688
#5 leaf_space_used (l=0x1e3e8b0, start=<optimized out>, nr=8) at ctree.c:1677
#6 0x0000000000403eae in btrfs_leaf_free_space (root=<optimized out>,
leaf=<optimized out>) at ctree.c:1701
#7 0x000000000040cbfd in btrfs_print_leaf (root=<optimized out>, l=0x1e3e8b0)
at print-tree.c:460
#8 0x000000000040270c in verify_extent_item (path=<optimized out>,
root=<optimized out>) at repair.c:508
#9 verify_leaf (path=<optimized out>, root=<optimized out>) at repair.c:598
#10 check_leaf (root=0x61d1b0, path=0x61d920) at repair.c:724
#11 0x0000000000402a73 in check_children (root=0x61d1b0, path=0x61d920,
level=1) at repair.c:770
#12 0x0000000000402a62 in check_children (root=0x61d1b0, path=0x61d920,
level=2) at repair.c:764
#13 0x0000000000402a62 in check_children (root=0x61d1b0, path=0x61d920,
level=3) at repair.c:764
#14 0x000000000040149f in main (argc=<optimized out>, argv=<optimized out>)
at repair.c:953
Try that. Right: Bad key offset, deleting 12297828649465286656 Invalid key type, deleting key (12297828652328593920 0 48038393173158570) Leaf items aren't quite in the right order, fixing Checking extent root Fixed something, dumping leaf to make sure it looks right leaf 112747028480 items 16 free space 2779 generation 130677 owner 2 fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (1413240 6c 17547264) level 0 tree block backref root 256 item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98660622336 a8 8192) level 0 tree block backref root 2 item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98152087552 a8 4096) level 0 tree block backref root 2 item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (104854753280 a8 8192) level 0 tree block backref root 2 item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98147524608 a8 8192) level 0 tree block backref root 2 item 5 key (112430448640 EXTENT_ITEM 4096) itemoff 3689 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98100707328 a8 12288) level 0 tree block backref root 2 item 6 key (112430551040 EXTENT_ITEM 4096) itemoff 3638 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110543069184) level 0 tree block backref root 7 item 7 key (112430637056 EXTENT_ITEM 4096) itemoff 3587 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (96611688448 a8 131072) level 0 tree block backref root 2 item 8 key (112430641152 EXTENT_ITEM 4096) itemoff 3536 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 96621641728) level 0 tree block backref root 7 item 9 key (112430645248 EXTENT_ITEM 4096) itemoff 3485 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 96619077632) level 0 tree block backref root 7 item 10 key (112430706688 EXTENT_ITEM 4096) itemoff 3434 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110681280512) level 0 tree block backref root 7 item 11 key (112430710784 EXTENT_ITEM 4096) itemoff 3383 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110673211392) level 0 tree block backref root 7 item 12 key (112430809088 EXTENT_ITEM 4096) itemoff 3332 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110719135744) level 0 tree block backref root 7 item 13 key (112430813184 EXTENT_ITEM 4096) itemoff 3281 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110718173184) level 0 tree block backref root 7 item 14 key (112430989312 EXTENT_ITEM 4096) itemoff 3230 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (505416 c 504678) level 0 tree block backref root 256 item 15 key (112431054848 EXTENT_ITEM 4096) itemoff 3179 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (507404 1 0) level 0 tree block backref root 256 Couldn't find an extent ref for bytenr 112430080000 Finding fs roots Checking fs roots Checking root 5 Checking root 5 refs Checking root 256 Checking root 256 refs Ok I pushed another update to make sure the path get's cow'ed properly. Go ahead and pull that and then run ./repair /dev/sda6 now this is likely to bring a whole host of other problems since I'm not able to test this part of the code, but it once we get through all those you should have a fully fixed system, or a fs that is so totally broken there is no hope of return :). Well, it run once correctly with -d, but it crashed without. For the record, the log until then is:
leaf 112747028480 items 18 free space 2627 generation 130677 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (1413240 6c 17547264) level 0
tree block backref root 256
item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (98660622336 a8 8192) level 0
tree block backref root 2
item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (98152087552 a8 4096) level 0
tree block backref root 2
item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (104854753280 a8 8192) level 0
tree block backref root 2
item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (98147524608 a8 8192) level 0
tree block backref root 2
I'm running it again in gdb (which I should have done before).
And here's the second real attempt in gdb (I assume it crashed at the same place):
Checking extent root
Bad key offset, deleting 12297828649465286656
Started transid 131241
Invalid key type, deleting key (12297828652328593920 0 48038393173158570)
Bad item end value, attempting to fix
Bad item end value, attempting to fix
Bad item end value, attempting to fix
Bad item end value, attempting to fix
Couldn't fixup leaf
Fixed something, dumping leaf to make sure it looks right
leaf 112747028480 items 18 free space 2627 generation 130677 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (1413240 6c 17547264) level 0
tree block backref root 256
item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (98660622336 a8 8192) level 0
tree block backref root 2
item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (98152087552 a8 4096) level 0
tree block backref root 2
item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (104854753280 a8 8192) level 0
tree block backref root 2
item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
extent refs 1 gen 129941 flags 2
tree block key (98147524608 a8 8192) level 0
tree block backref root 2
item 5 key (112430071808 EXTENT_ITEM 12297828649465286656) itemoff -1431655120 itemsize -1431655936
Program received signal SIGSEGV, Segmentation fault.
print_extent_item (slot=5, eb=0x1e3e8b0) at print-tree.c:190
190 flags = btrfs_extent_flags(eb, ei);
(gdb) ba
#0 print_extent_item (slot=5, eb=0x1e3e8b0) at print-tree.c:190
#1 btrfs_print_leaf (root=<optimized out>, l=0x1e3e8b0) at print-tree.c:529
#2 0x00000000004027d4 in check_leaf (root=0x61d1b0, path=0x61d920)
at repair.c:740
#3 0x0000000000402a92 in check_children (root=0x61d1b0, path=0x61d920,
level=1) at repair.c:778
#4 0x0000000000402a81 in check_children (root=0x61d1b0, path=0x61d920,
level=2) at repair.c:772
#5 0x0000000000402a81 in check_children (root=0x61d1b0, path=0x61d920,
level=3) at repair.c:772
#6 0x000000000040149f in main (argc=<optimized out>, argv=<optimized out>)
at repair.c:961
Yeah these would be those bugs that I haven't shaken out yet :). Just pushed a couple of fixes, let me know how that works out. OK, it didn't crash, but now I'm unable to mount the partition because the superblock seems corrupted. ./repair output: Bad key offset, deleting 12297828649465286656 Invalid key type, deleting key (12297828652328593920 0 48038393173158570) Leaf items aren't quite in the right order, fixing Checking extent root Started transid 131253 Fixed something, dumping leaf to make sure it looks right leaf 112638033920 items 16 free space 2779 generation 131253 owner 2 fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (1413240 6c 17547264) level 0 tree block backref root 256 item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98660622336 a8 8192) level 0 tree block backref root 2 item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98152087552 a8 4096) level 0 tree block backref root 2 item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (104854753280 a8 8192) level 0 tree block backref root 2 item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98147524608 a8 8192) level 0 tree block backref root 2 item 5 key (112430448640 EXTENT_ITEM 4096) itemoff 3689 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (98100707328 a8 12288) level 0 tree block backref root 2 item 6 key (112430551040 EXTENT_ITEM 4096) itemoff 3638 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110543069184) level 0 tree block backref root 7 item 7 key (112430637056 EXTENT_ITEM 4096) itemoff 3587 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (96611688448 a8 131072) level 0 tree block backref root 2 item 8 key (112430641152 EXTENT_ITEM 4096) itemoff 3536 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 96621641728) level 0 tree block backref root 7 item 9 key (112430645248 EXTENT_ITEM 4096) itemoff 3485 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 96619077632) level 0 tree block backref root 7 item 10 key (112430706688 EXTENT_ITEM 4096) itemoff 3434 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110681280512) level 0 tree block backref root 7 item 11 key (112430710784 EXTENT_ITEM 4096) itemoff 3383 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110673211392) level 0 tree block backref root 7 item 12 key (112430809088 EXTENT_ITEM 4096) itemoff 3332 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110719135744) level 0 tree block backref root 7 item 13 key (112430813184 EXTENT_ITEM 4096) itemoff 3281 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (18446744073709551606 80 110718173184) level 0 tree block backref root 7 item 14 key (112430989312 EXTENT_ITEM 4096) itemoff 3230 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (505416 c 504678) level 0 tree block backref root 256 item 15 key (112431054848 EXTENT_ITEM 4096) itemoff 3179 itemsize 51 extent refs 1 gen 129941 flags 2 tree block key (507404 1 0) level 0 tree block backref root 256 Couldn't find an extent ref for bytenr 112430080000 extent buffer leak: start 112638033920 len 4096 extent buffer leak: start 113980211200 len 4096 extent buffer leak: start 113980215296 len 4096 extent buffer leak: start 113980219392 len 4096 Finding fs roots Checking fs roots Checking root 5 Checking root 5 refs Checking root 256 Checking root 256 refs writing out a block [x30] mount output: mount : /dev/sda6 : can't read superblock Relevant dmesg excerpt: [439452.065960] device fsid 4b47fd52c3425d6f-1bdd64bf6b75ec88 devid 1 transid 131253 /dev/sda6 [439453.645594] parent transid verify failed on 114248884224 wanted 131253 found 131241 [439453.645971] parent transid verify failed on 114248884224 wanted 131253 found 131241 [439453.650210] parent transid verify failed on 114248884224 wanted 131253 found 131241 [439453.650226] parent transid verify failed on 114248884224 wanted 131253 found 131241 [439453.666257] btrfs warning page private not zero on page 29380608 [439453.671747] btrfs: open_ctree failed I have a patch for that, I just need to dig it out and clean it up, I'll attach it shortly. Actually that won't help you, can you re-run the repair with -d and see if it complains the same way? It aborts now... :-)
Program received signal SIGABRT, Aborted.
0x00000035dd6352d5 in __GI_raise (sig=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
(gdb) ba
#0 0x00000035dd6352d5 in __GI_raise (sig=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00000035dd636beb in __GI_abort () at abort.c:93
#2 0x00000035dd62dc5e in __assert_fail_base (fmt=<optimized out>,
assertion=0x415af5 "!(!root->node)", file=0x4159fc "disk-io.c",
line=<optimized out>, function=<optimized out>) at assert.c:96
#3 0x00000035dd62dd02 in __GI___assert_fail (
assertion=0x415af5 "!(!root->node)", file=0x4159fc "disk-io.c", line=419,
function=0x415d30 "find_and_setup_root") at assert.c:105
#4 0x00000000004078d0 in find_and_setup_root (tree_root=0x61d010,
fs_info=<optimized out>, objectid=5, root=0x7021f0) at disk-io.c:419
#5 0x0000000000407934 in btrfs_read_fs_root_no_cache (
fs_info=<optimized out>, location=0x7fffffffe08f) at disk-io.c:494
#6 0x0000000000407b3f in btrfs_read_fs_root (fs_info=0x61f180,
location=0x7fffffffe08f) at disk-io.c:564
#7 0x000000000040816f in open_ctree_fd (fp=7,
path=0x2003f <Address 0x2003f out of bounds>, sb_bytenr=<optimized out>,
writes=<optimized out>) at disk-io.c:769
#8 0x00000000004081ff in open_ctree (filename=0x7fffffffe59a "/dev/sda6",
sb_bytenr=0, writes=1) at disk-io.c:590
#9 0x00000000004013c8 in main (argc=3, argv=0x7fffffffe278) at repair.c:937
Alright give that a whirl, again with -d to make sure it's working. Then I'll figure out a way to unscrew the fs from there. Created attachment 525645 [details]
repair output
It prints a lot of warnings, and seems to enter an infinite loop. I stopped it in gdb a few times, waiting one minute or two between each stop, to check, and the pointers seems to be the same (see the end of the log). Disk activity was null, and CPU 100%.
Ok so before I screw your file system up any more than I already have, I've written a basic restore program that will go through and dump out all of your data into a directory. So pull from the git tree and run ./restore /your/dev /some/dir and sit back and relax. This should work because it seems like your fs tree's are a-ok, it's just your extent tree that's broken. If you mounted with compress=lzo at all then let me know because I only added zlib support, but it will error out if it runs into anything it cant handle. Let me know how that goes. Ah, great! That will make both of us more relax...
I've ran it for about one hour (and 30 min CPU time), and while it was quite fast at the beginning (3GB in a few minutes), it seems to have stalled, eating 100% CPU and making no progress (file count doesn't increase). There was a warning at the beginning:
parent transid verify failed on 114248884224 wanted 131253 found 131241
But which is probably unrelated since it appeared while files were actually being copied. Now, it seems to be working very hard on a single file, always at the same position, but changing a little the params:
Program received signal SIGINT, Interrupt.
0x000000000040fbb8 in read_extent_buffer (eb=<optimized out>,
dst=0x7fff36b7866f, start=226, len=16) at /usr/include/bits/string3.h:52
52 return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
(gdb) ba
#0 0x000000000040fbb8 in read_extent_buffer (eb=<optimized out>,
dst=0x7fff36b7866f, start=226, len=16) at /usr/include/bits/string3.h:52
#1 0x0000000000401b52 in btrfs_item_key (nr=<optimized out>,
disk_key=0x7fff36b7866f, eb=0x42c1190) at ctree.h:1321
#2 btrfs_item_key_to_cpu (nr=<optimized out>, key=read_sleb128: Corrupted DWARF expression.
) at ctree.h:1398
#3 copy_file (key=0x7fff36b7865e, fd=3, root=0x25381f0) at restore.c:263
#4 search_dir (root=0x25381f0, key=0x7fff36b7884e,
dir=0x3f41230 "/media/WD Passport/Milan/home-sda6/invite/.cache/dconf")
at restore.c:378
#5 0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78a3e,
dir=0x3f47670 "/media/WD Passport/Milan/home-sda6/invite/.cache")
at restore.c:427
#6 0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78c2e,
dir=0x253c550 "/media/WD Passport/Milan/home-sda6/invite") at restore.c:427
#7 0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78d2f,
dir=0x7fff36b78caf "/media/WD Passport/Milan/home-sda6") at restore.c:427
#8 0x0000000000401588 in main (argc=<optimized out>, argv=0x7fff36b78e38)
at restore.c:501
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
copy_file (key=0x7fff36b7865e, fd=3, root=0x25381f0) at restore.c:264
264 if (found_key.objectid != key->objectid)
(gdb) ba
#0 copy_file (key=0x7fff36b7865e, fd=3, root=0x25381f0) at restore.c:264
#1 search_dir (root=0x25381f0, key=0x7fff36b7884e,
dir=0x3f41230 "/media/WD Passport/Milan/home-sda6/invite/.cache/dconf")
at restore.c:378
#2 0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78a3e,
dir=0x3f47670 "/media/WD Passport/Milan/home-sda6/invite/.cache")
at restore.c:427
#3 0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78c2e,
dir=0x253c550 "/media/WD Passport/Milan/home-sda6/invite") at restore.c:427
#4 0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78d2f,
dir=0x7fff36b78caf "/media/WD Passport/Milan/home-sda6") at restore.c:427
#5 0x0000000000401588 in main (argc=<optimized out>, argv=0x7fff36b78e38)
at restore.c:501
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x000000000040fbb8 in read_extent_buffer (eb=<optimized out>,
dst=0x7fff36b7866f, start=226, len=4) at /usr/include/bits/string3.h:52
52 return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
(gdb) ba
#0 0x000000000040fbb8 in read_extent_buffer (eb=<optimized out>,
dst=0x7fff36b7866f, start=226, len=4) at /usr/include/bits/string3.h:52
#1 0x0000000000401b52 in btrfs_item_key (nr=<optimized out>,
disk_key=0x7fff36b7866f, eb=0x42c1190) at ctree.h:1321
#2 btrfs_item_key_to_cpu (nr=<optimized out>, key=read_sleb128: Corrupted DWARF expression.
) at ctree.h:1398
#3 copy_file (key=0x7fff36b7865e, fd=3, root=0x25381f0) at restore.c:263
#4 search_dir (root=0x25381f0, key=0x7fff36b7884e,
dir=0x3f41230 "/media/WD Passport/Milan/home-sda6/invite/.cache/dconf")
at restore.c:378
#5 0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78a3e,
dir=0x3f47670 "/media/WD Passport/Milan/home-sda6/invite/.cache")
at restore.c:427
#6 0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78c2e,
dir=0x253c550 "/media/WD Passport/Milan/home-sda6/invite") at restore.c:427
#7 0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78d2f,
dir=0x7fff36b78caf "/media/WD Passport/Milan/home-sda6") at restore.c:427
#8 0x0000000000401588 in main (argc=<optimized out>, argv=0x7fff36b78e38)
at restore.c:501
A couple of shots in the dark, I have a bug where if you hit a prealloced extent it would just loop forever so hopefully that's it, or there is a problem getting the leaf node and we're not getting an error back. Give that a whirl and let me know how it works. If it does the same thing, each time you stop it do p path->nodes[0] p leaf p path->slots[0] so I can get an idea if it's looping on the same slot or not. Thanks. Yes, it works. :-) But it stopped later with an error: Error mkdiring /media/WD Passport/Milan/home-sda6/milan.sav/.local/share/evolution/mail/imap/nalimilan@imap.sfr.fr.old/folders/`"O�: 84 errno 84 is "Invalid or incomplete multibyte or wide character. It seems that the path includes weird chars coming from elsewhere... If that helps, the dir /media/WD Passport/Milan/home-sda6/milan.sav/ is still completely empty. I can actually skip that dir if needed, as it's of absolutely no interest. And there was also a note about a snapshot not being copied, but that snapshot actually contains the interesting data... :-/ Ok so I had a -s option for restoring snapshots but it was broken. I've fixed it so run ./restore -is /dev/whatever /mnt/wherever this will make restore ignore errors, it will still complain about them but it will keep going and try to restore other things. So in the case of your file it will just move on to the next file. The -s option will restore snapshots. This means that if that snapshot has links to any of your other snapshots or subvolumes you are going to end up with duplicates of stuff restored, so use with caution. It copied about 80GB, but now it's progressing at the pace of a few MBs per hour. It's still running, but if the trace can help, it's:
#0 0x00000035dd6d18a3 in __pread_nocancel ()
at ../sysdeps/unix/syscall-template.S:82
#1 0x0000000000401e14 in pread (__offset=<optimized out>,
__nbytes=<optimized out>, __buf=0x7fb2a4bb0010, __fd=4)
at /usr/include/bits/unistd.h:100
#2 copy_one_extent (pos=35913728, fi=<optimized out>, leaf=0x2d177d0, fd=3,
root=0x1afccd0) at restore.c:184
#3 copy_file (key=0x7fff902de27e, fd=3, root=0x1afccd0) at restore.c:302
#4 search_dir (root=0x1afccd0, key=0x7fff902de46e,
dir=0x1b1ffd0 "/media/WD Passport/Milan/home-sda6/home.snapshot/milan/P2P/Fedora-15-Beta-x86_64-Live-Desktop") at restore.c:398
#5 0x00000000004020ed in search_dir (root=0x1afccd0, key=0x7fff902de65e,
dir=0x34f7230 "/media/WD Passport/Milan/home-sda6/home.snapshot/milan/P2P")
at restore.c:466
#6 0x00000000004020ed in search_dir (root=0x1afccd0, key=0x7fff902de84e,
dir=0x34fd670 "/media/WD Passport/Milan/home-sda6/home.snapshot/milan")
at restore.c:466
#7 0x00000000004020ed in search_dir (root=0x1afccd0, key=0x7fff902dea3e,
dir=0x1af2550 "/media/WD Passport/Milan/home-sda6/home.snapshot")
at restore.c:466
#8 0x00000000004020ed in search_dir (root=0x1aee1f0, key=0x7fff902deb3f,
dir=0x7fff902deabf "/media/WD Passport/Milan/home-sda6") at restore.c:466
#9 0x00000000004015a8 in main (argc=<optimized out>, argv=0x7fff902dec48)
And five minutes later it was:
#0 0x00000035dd6d18a3 in __pread_nocancel ()
at ../sysdeps/unix/syscall-template.S:82
#1 0x0000000000401e14 in pread (__offset=<optimized out>,
__nbytes=<optimized out>, __buf=0x7fb2a4bb0010, __fd=4)
at /usr/include/bits/unistd.h:100
#2 copy_one_extent (pos=37486592, fi=<optimized out>, leaf=0x2d177d0, fd=3,
root=0x1afccd0) at restore.c:184
#3 copy_file (key=0x7fff902de27e, fd=3, root=0x1afccd0) at restore.c:302
#4 search_dir (root=0x1afccd0, key=0x7fff902de46e,
dir=0x1b1ffd0 "/media/WD Passport/Milan/home-sda6/home.snapshot/milan/P2P/Fedora-15-Beta-x86_64-Live-Desktop") at restore.c:398
If I had knew that stale ISO would be so annoying! ;-) Looks like it doesn't like big files, but it will eventually succeed I guess.
Did you happen to download that ISO via a torrent? Some torrent programs don't preallocate the space for what they're downloading, so the files end up super fragmented, which is going to _suck_ for this restore program, since it's going to look for an extent, allocate a buffer the size of the extent, read in the data, and write it out, and move to the next extent. So worst case scenario you have a 650 mb file that's broken up into 160k extents, and say it takes 1/2 a second to deal with an extent, you are looking at about 24 hours in the worst case :(. That's it. I downloaded it via Transmission, and there are probably other files in that folder too. Now it's at pos=69M, which required a few hours to get there. :-/ Maybe I can trick it from gdb to skip that folder? OK, I've added a little hack to skip these files, let's see how it goes... Ah good sorry I didn't see this till this morning, let me know if it doesn't work, I'm thinking about adding a timer that will pause after say 5 minutes on the same file and asking if the user wants to skip it. Yes, that could be useful in the future. Anyway, I think I have been able to backup everything that was needed, so you can go on destroying my filesystem now. ;-) Ping? I'd really like to get this 400GB partition (4/5 of my hard disk) usable again... :-) Yup sorry this restore program has been hugely popular so I've had my attention on a bunch of users who were trying to get it to work for them. That's all winding down now so I'll get back to work on the repair program. Glad to know the tool is useful to others. btrfs will end with the most resistant repair tool on the market! Oh, and if you plan to further improve the restore tool, one useful feature would be to preserve [mca]time. This is something definitely valuable, especially when you do incremental backups. (End of wishlist... ;-) I'm not sure if I can do that but I will definitely look into it. I'm going to be in Prague next week for kernel summit but when I get back I will refocus on the repair tool, I got caught up again helping somebody with the restore tool. Today, I noticed some restored files are somewhat corrupt. For example, a source file had a few hundreds of \00 at its end (the rest of the contents were OK). Any idea what do to about that? Yeah sorry Chris found a problem where I needed to truncate the file to it's actual size, so delete your copy of my tree and repull, it will have the new fixes and re-run the restore, it should give you non-corrupt files. Looks good now! git is a pretty good data consistency checking tool. ;-) Are you still interested in fixing my partition? If you can't find the time, or think it would be a waste of time at this point, I can perfectly format it and put the restored files on it. If there was a way to restore the timestamps, it would be perfect! OK, I just wiped that broken partition, because I couldn't live in the few remaining GB (this prevented me from upgrading to F16). Thanks for the kind help you provided, at least I could get back all of my files. :-) Hope the original problem is gone... |