Bug 701027 - [abrt] kernel: kernel BUG at fs/btrfs/extent-tree.c:1401!: TAINTED Die
Summary: [abrt] kernel: kernel BUG at fs/btrfs/extent-tree.c:1401!: TAINTED Die
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 15
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL: https://bugzilla.kernel.org/show_bug....
Whiteboard: abrt_hash:0b414105677dfec81abc94555f7...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-30 13:00 UTC by Milan Bouchet-Valat
Modified: 2011-12-31 11:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-31 11:37:56 UTC
Type: ---


Attachments (Terms of Use)
File: backtrace (4.06 KB, text/plain)
2011-04-30 13:00 UTC, Milan Bouchet-Valat
no flags Details
debug patch (474 bytes, patch)
2011-05-18 18:48 UTC, Josef Bacik
no flags Details | Diff
Relevant output from dmesg with first patch (3.69 KB, text/plain)
2011-05-19 15:25 UTC, Milan Bouchet-Valat
no flags Details
Another debug patch (499 bytes, patch)
2011-05-20 13:46 UTC, Josef Bacik
no flags Details | Diff
Backtrace with second patch (5.48 KB, text/plain)
2011-05-21 08:36 UTC, Milan Bouchet-Valat
no flags Details
Yet another debug patch (887 bytes, patch)
2011-05-23 16:38 UTC, Josef Bacik
no flags Details | Diff
/var/log/messages with third patch (120.70 KB, text/plain)
2011-05-23 22:58 UTC, Milan Bouchet-Valat
no flags Details
More debugging (2.33 KB, patch)
2011-05-24 17:06 UTC, Josef Bacik
no flags Details | Diff
/var/log/messages with fourth patch (97.48 KB, text/plain)
2011-05-24 18:59 UTC, Milan Bouchet-Valat
no flags Details
Repair program (6.13 KB, patch)
2011-06-29 15:53 UTC, Josef Bacik
no flags Details | Diff
Fixed repair program (6.11 KB, patch)
2011-06-29 15:57 UTC, Josef Bacik
no flags Details | Diff
Dmesg for new crashes (20.09 KB, text/plain)
2011-08-13 15:56 UTC, Milan Bouchet-Valat
no flags Details
An update repair patch (6.38 KB, patch)
2011-08-16 18:34 UTC, Josef Bacik
no flags Details | Diff
Incremental patch (474 bytes, patch)
2011-08-17 12:41 UTC, Josef Bacik
no flags Details | Diff
Output of repair with patch from comment #32 (7.89 KB, text/plain)
2011-08-17 18:48 UTC, Milan Bouchet-Valat
no flags Details
Dmesg for remaining crashes (patched kernel) (4.28 KB, text/plain)
2011-08-19 19:02 UTC, Milan Bouchet-Valat
no flags Details
New and improved repair tool (24.14 KB, patch)
2011-08-29 14:22 UTC, Josef Bacik
no flags Details | Diff
Next iteration of the repair tool (26.83 KB, patch)
2011-09-02 14:02 UTC, Josef Bacik
no flags Details | Diff
A new repair program (28.99 KB, patch)
2011-09-16 20:55 UTC, Josef Bacik
no flags Details | Diff
repair output (101.11 KB, text/plain)
2011-09-29 21:01 UTC, Milan Bouchet-Valat
no flags Details

Description Milan Bouchet-Valat 2011-04-30 13:00:54 UTC
abrt version: 2.0.1
cmdline: BOOT_IMAGE=/boot/vmlinuz-2.6.38.2-9.fc15.x86_64 root=/dev/sda7
component: kernel
reported_to: kerneloops: URL=http://submit.kerneloops.org/submitoops.php
kernel_tainted: 129
kernel: 2.6.38.2-9.fc15.x86_64
reason: [129170.603442] kernel BUG at fs/btrfs/extent-tree.c:1401!
architecture: x86_64
package: kernel
os_release: Fedora release 15 (Lovelock)
time: 1304165324

Text file: backtrace, 4162 bytes

comment
-----
Reproducible, happens a few seconds after starting Firefox. My /home/ is on a BTRFS partition, and my home folder is specifically on a subvolume of it.

It didn't happen until I installed the Flash plugin, and stopped happening when I removed nspluginwrapper, but I guess it doesn't have anything to do with Flash itself.

event_log
-----
2011-04-30-14:48:54> Submitting oops report to http://submit.kerneloops.org/submitoops.php
2011-04-30-14:48:56  Kernel oops report was uploaded

Comment 1 Milan Bouchet-Valat 2011-04-30 13:00:58 UTC
Created attachment 495963 [details]
File: backtrace

Comment 2 Milan Bouchet-Valat 2011-04-30 13:37:38 UTC
Very funny: I installed the experimental 64-bit Flash plugin, and the oops reappeared when loading a page with a Flash video. I removed /usr/lib64/mozilla/plugins/libflashplayer.so, and it stopped.

I could also reproduce it under Ubuntu 10.10, kernel 2.6.37 (still Firefox+Flash).

Comment 3 Chuck Ebbert 2011-05-03 02:05:52 UTC
[129170.603442] kernel BUG at fs/btrfs/extent-tree.c:1401!

RIP: 0010:[<ffffffffa049414b>]  [<ffffffffa049414b>] lookup_inline_extent_backref+0xa4/0x31e [btrfs]

        ret = btrfs_search_slot(trans, root, &key, path, extra_size, 1);
        if (ret < 0) {
                err = ret;
                goto out;
        }
        BUG_ON(ret);

It bugged because ret == 1

Comment 4 Milan Bouchet-Valat 2011-05-03 13:34:11 UTC
Given it also happens in Ubuntu, I've filed it upstream as
https://bugzilla.kernel.org/show_bug.cgi?id=34292

Comment 5 Josef Bacik 2011-05-03 14:23:02 UTC
Ok if you can reproduce will pull from my tree

git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work.git

and build it and see if it reproduces there?  If it does (it still should, I
don't think we've fixed this) I can give you debug patches to try and figure
out whats going on.  Thanks.

Comment 6 Milan Bouchet-Valat 2011-05-03 20:42:46 UTC
OK, I've tried it and it still crashes. So I'm eager to test your patches! ;-)

Comment 7 Milan Bouchet-Valat 2011-05-17 18:49:59 UTC
Ping! :-)

I'd really like to test these patches quickly so that I can get rid of the bug by removing those files - not being able to use Flash is painful (videos...).

Comment 8 Josef Bacik 2011-05-17 20:09:53 UTC
Right sorry, you are the first thing on my list tomorrow morning.

Comment 9 Josef Bacik 2011-05-18 18:48:27 UTC
Created attachment 499666 [details]
debug patch

Ok apply this patch and run it.  It will still panic, just attach dmesg so I can see whats going on.

Comment 10 Milan Bouchet-Valat 2011-05-19 15:25:24 UTC
Created attachment 499873 [details]
Relevant output from dmesg with first patch

Wow! I didn't think it would cost me so much. I've been fighting with my wireless for more than two hours: for some reason, the kernel crash blocked it off, and the switch and rfkill wouldn't get it up. I tried everything until I found out the ugly hp tool under Windows had a button to turn it back on.

But eventually I'm back here, with the logs. Hope it really helps! ;-)

Comment 11 Josef Bacik 2011-05-20 13:46:15 UTC
Created attachment 500059 [details]
Another debug patch

Well thats weird, we're trying to modify byte 0, which shouldn't ever happen really, this will catch the guy trying to do it.  This will panic your box again, so please provide me with the dmesg again after running this.

Comment 12 Milan Bouchet-Valat 2011-05-21 08:36:42 UTC
Created attachment 500174 [details]
Backtrace with second patch

Not sure this is what you want, as I can find where's the new information... ;-)

Anyway, if that's not correct, just ask and I'll retry.

Comment 13 Josef Bacik 2011-05-23 16:38:16 UTC
Created attachment 500461 [details]
Yet another debug patch

Here's another one.  The last one was perfect, it looks like it isn't necessarily a bug in the delayed ref stuff but probably some sort of disk corruption.  This debug patch will be a bit more verbose, but again will still panic.  I'll need the entire bit, because I'll print out a bunch of stuff before it panics.

Comment 14 Milan Bouchet-Valat 2011-05-23 22:58:58 UTC
Created attachment 500522 [details]
/var/log/messages with third patch

Here's the new delivery... ;-)

Comment 15 Josef Bacik 2011-05-24 17:06:48 UTC
Created attachment 500659 [details]
More debugging

Hrm sorry it looks like its a node thats corrupt, not a leaf, so this will print out the right information and not make 10000 different panics happen, just the normal one we expect.

Comment 16 Milan Bouchet-Valat 2011-05-24 18:59:17 UTC
Created attachment 500674 [details]
/var/log/messages with fourth patch

Just need to ask!

Comment 17 Josef Bacik 2011-05-24 20:53:51 UTC
key 41 (72537 84 1075557500) block 72660422656
key 42 (72537 84 4135981418) block 1061535744
key 43 (72537 96 54) block 0
key 44 (72537 84 1226029824) block 39102320640

Ok so something went a little sideways here, these keys are not in the right order at all.  Since you aren't getting checksum errors or anything else and it appears you have barriers enabled I can't imagine how this happened other than we have a horrible bug somewhere.  First things first will be to fix your file system, is your entire fs on btrfs, or is it just your home directory?  Also has anything happened in the past that would possibly corrupt your fs?  Please say there was :).

Comment 18 Milan Bouchet-Valat 2011-05-24 21:13:50 UTC
Only /home is brtfs (/ is ext4). I have another btrfs partition that contains an Ubuntu root, but it's not even mounted.

I've found something weird in the 'mount' output though: this line is duplicated:
/dev/sda6 on /home type btrfs (rw,relatime,seclabel)

My /etc/fstab line for it is
UUID=6f5d42c3-52fd-474b-88ec-756bbf64dd1b /home btrfs defaults,subvol=home.snapshot 1 2

FWIW, the btrfs partition originally contained my /home in the root subvolume, and I created a snapshot of it when moving to Fedora 15, to avoid messing with my old settings in case I had to switch back.

I'm not aware of a special breakage that should have corrupted my partition. It was created during the clean install of Ubuntu 10.10 back in March, and I've never had problems until I did that snapshot and got this bug. Maybe I experienced a few kernel crashes, I don't precisely remember, but I can tell you I didn't touch that partition when installing Fedora 15 afterwards (no resize, etc.).

Ah, maybe one explanation: Anaconda crashed badly when trying to resize another partition, and left my partition table completely destroyed, so I had to recover it using TestDisk. It went really well, and I recovered all my old partitions. AFAIK, this shouldn't have affected at all the actual content of the partitions. Is it possible that the partition was recreated with an offset of a few blocks, without preventing it from being mounted correctly, and that it lead to weird bugs?

Comment 19 Josef Bacik 2011-05-25 00:32:46 UTC
So I don't think this is accidental corruption, it really feels like we have a bug somewhere.  Unfortunately without knowing how to reproduce it the only thing I can do is review the code and hope the bug jumps out at me.

Now as for your file system, it being separate from your / fs is perfect.  I think I can put it mostly back together, but since your fs is the only one thats broken like this, I only have one shot to get it right and not make it a lot worse :).  So I need you to clone this git tree

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git

and run

make btrfs-image

and then make sure your /home isn't mounted, and run (as root of course)

./btrfs-image -c 9 -t <number of cpus you have> /dev/sda6 home.img

and then upload home.img somewhere that I can pull it down.  Don't worry it's just a metadata dump, I will only have data that is small enough to fit into a leaf and file names.  I need this so I can write a program to put your tree back together properly, this will give me something to run my program against so I can make sure it does the right thing.

Now if you don't feel comfortable giving me this, the other option is to back everything up except whats in your mozilla folder (I think thats where the corruption is) and then I will give you a program to fix it and hope it doesn't make things worse.

Comment 20 Milan Bouchet-Valat 2011-05-25 15:18:48 UTC
I'm sending you an e-mail with the link to the image. Do you think there's a chance you'll find the bug, or just fix the partition?

Comment 21 Josef Bacik 2011-05-25 18:06:32 UTC
I will try and find the bug, but I'm going to focus on fixing your partition first.

Comment 22 Milan Bouchet-Valat 2011-06-08 16:25:20 UTC
Any news on that? Can you just help me to remove the guilty directory? ;-)

Comment 23 Josef Bacik 2011-06-29 15:53:05 UTC
Created attachment 510495 [details]
Repair program

Sorry about that, I got distracted with other things :).  So I can't test this on your image because of the way we create images.  So you can

a) trust that i didn't make any mistakes and just run this on your fs
b) take backups first

Once you choose one of those, clone the btrfs-progs-unstable git tree from git.kernel.org and then apply this patch and run make.  Once you do that you can run

./repair /your/device

note your disk has to be unmounted, so if you may have to boot into rescue mode to do this.  Once it's done you should be good to go.

Comment 24 Josef Bacik 2011-06-29 15:57:47 UTC
Created attachment 510496 [details]
Fixed repair program

Hah see I did make a mistake, so really go for option b :).

Comment 25 Milan Bouchet-Valat 2011-07-05 09:29:16 UTC
Sorry, doesn't seem to work... :-/ Thanks for your work!

Program received signal SIGABRT, Aborted.
0x00000034b4a352d5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
#0  0x00000034b4a352d5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00000034b4a36beb in abort () at abort.c:93
#2  0x00000034b4a2dc5e in __assert_fail_base (fmt=<optimized out>, assertion=0x4139d1 "!(ret)", file=0x414495 "extent-tree.c", line=<optimized out>, 
    function=<optimized out>) at assert.c:96
#3  0x00000034b4a2dd02 in __assert_fail (assertion=0x4139d1 "!(ret)", file=0x414495 "extent-tree.c", line=1042, function=0x414960 "lookup_inline_extent_backref")
    at assert.c:105
#4  0x0000000000407f60 in lookup_inline_extent_backref (trans=<optimized out>, root=0x61b1b0, path=0x1c293d0, ref_ret=0x7fffffffe1e0, bytenr=0, 
    num_bytes=<optimized out>, parent=71470424064, root_objectid=5, owner=0, offset=0, insert=1) at extent-tree.c:1062
#5  0x00000000004095d8 in insert_inline_extent_backref (offset=0, owner=0, root_objectid=5, parent=71470424064, num_bytes=4096, bytenr=0, path=0x1c293d0, root=0x61b1b0, 
    trans=0x6f7850, refs_to_add=<optimized out>) at extent-tree.c:1308
#6  btrfs_inc_extent_ref (trans=0x6f7850, root=0x6f6640, bytenr=0, num_bytes=4096, parent=71470424064, root_objectid=5, owner=0, offset=0) at extent-tree.c:1382
#7  0x0000000000407abd in __btrfs_mod_ref (trans=0x6f7850, root=0x6f6640, buf=0x17f44b0, record_parent=<optimized out>, inc=<optimized out>) at extent-tree.c:1602
#8  0x000000000040270f in update_ref_for_cow (cow=0x1b1f5e0, buf=0x17f44b0, root=0x6f6640, trans=0x6f7850) at ctree.c:217
#9  __btrfs_cow_block (trans=0x6f7850, root=0x6f6640, buf=0x17f44b0, parent=0x17e49c0, parent_slot=3, cow_ret=0x7fffffffe430, search_start=70866960384, empty_size=0)
    at ctree.c:305
#10 0x00000000004029e7 in btrfs_cow_block (trans=<optimized out>, root=<optimized out>, buf=<optimized out>, parent=<optimized out>, parent_slot=<optimized out>, 
    cow_ret=<optimized out>) at ctree.c:371
#11 0x00000000004038b4 in btrfs_search_slot (trans=0x6f7850, root=0x6f6640, key=0x7fffffffe4de, p=0x61b920, ins_len=-1, cow=1) at ctree.c:1214
#12 0x000000000040145f in main (argc=<optimized out>, argv=<optimized out>) at repair.c:81

Comment 26 Josef Bacik 2011-07-05 13:15:44 UTC
Heh great, while trying to fix the problem it trips over the corrupt area and blows up.  I'm on vacation this week but I will try to get to this tonight and just rig up a manual search that won't actually blow up.

Comment 27 Milan Bouchet-Valat 2011-07-17 08:14:31 UTC
FWIW, the corruption has now extended to other files without any apparent reason: Some Evolution and Firefox config files are now broken too.

One possibly interesting thing is that even the guest user account I was using to work around the problem got corrupted after I crashed the machine: my normal account and the guest account were open at the same time, and the former crashed because of the btrfs corruption. So it's possible that the breakage comes from a kernel crash happening when those files are being written to. Maybe I we find a procedure to check this idea.

Anyway, I'm moving the non-broken files to another partition before it blows up completely. ;-) I'd really like a way to fix the partition, but if you prefer, just give me a way to identify the corrupt files, and I'll copy everything except them, and erase the partition. The big problem ATM is that I cannot do full backups since the system crashes when broken files are read! :-/ The workaround I found is to remove read permissions, but I don't know exactly what files are affected.

Comment 28 Milan Bouchet-Valat 2011-07-21 16:54:49 UTC
The crash I'm getting when trying to copy the files that broke recently (cf. comment #27) is different from the original one. See the attached excerpt from dmesg. There are four different successive traces:
WARNING: at fs/btrfs/ctree.c:2297 leaf_space_used+0x6f/0x7f [btrfs]()
WARNING: at fs/btrfs/ctree.c:2297 leaf_space_used+0x6f/0x7f [btrfs]()
BUG: scheduling while atomic: cp/1447/0x10000001
BUG: sleeping function called from invalid context at kernel/rwsem.c:21

The partition has become so broken now that I can't copy most toplevel directories from my home dir, they all contain a few corrupt files that I couldn't identify.

Comment 29 Milan Bouchet-Valat 2011-08-13 15:56:52 UTC
Created attachment 518152 [details]
Dmesg for new crashes

Ping? Would you give me a clue about how to identify the corrupt files? I really need to remove that partition that takes up all of my disk space... ;-)

I'm reattaching the trace I spoke about in my last comment, looks like it didn't work last time.

Comment 30 Josef Bacik 2011-08-16 18:34:34 UTC
Created attachment 518553 [details]
An update repair patch

Ok apply this and rebuild and re-run the repair program.  It will still blow up but it will spit out what it blew up on so I can figure out how to work around/fix whats broken.

Comment 31 Milan Bouchet-Valat 2011-08-17 11:23:12 UTC
The only output is:
repair: extent-tree.c:1042: lookup_inline_extent_backref: Assertion `!(ret)' failed.

But it doesn't seem to provide more information than before... (I checked I really applied the correct patch, the debugging lines in extent-tree.c are present and I ran make clean to be sure.)

I can run other tests if you need them. Thanks again for your attention!

Comment 32 Josef Bacik 2011-08-17 12:41:19 UTC
Created attachment 518668 [details]
Incremental patch

Just apply this over the top of what you already have, it's just incremental.

Comment 33 Milan Bouchet-Valat 2011-08-17 18:48:04 UTC
Created attachment 518733 [details]
Output of repair with patch from comment #32

Seems to write something interesting now... ;-)

Comment 34 Josef Bacik 2011-08-18 14:33:42 UTC
Ok so I was going to write my own search function to just get to what we need and not do all the safety checks, but I think that if I just use the normal search function and tell it I'm not going to modify the tree and then modify it anyway it should work fine.  So will you open repair.c, and look for the line

        ret = btrfs_search_slot(trans, root, &key, path, -1, 1);

and change it to

        ret = btrfs_search_slot(trans, root, &key, path, 0, 0);

and rebuild and try again.  It's horribly evil, but I think it will work out just fine.

Comment 35 Milan Bouchet-Valat 2011-08-19 08:55:48 UTC
Well, it fails somewhere else:

(gdb) run /dev/sda6
getting key from slot 39
getting key from slot 40
getting key from slot 41
getting key from slot 42
getting key from slot 43
Found node with bytner of 0, deleting
getting key from slot 43
getting key from slot 43
getting key from slot 44
parent transid verify failed on 71470424064 wanted 130995 found 77431

Program received signal SIGABRT, Aborted.
0x0000003f61c352d5 in raise () from /lib64/libc.so.6
(gdb) ba
#0  0x0000003f61c352d5 in raise () from /lib64/libc.so.6
#1  0x0000003f61c36beb in abort () from /lib64/libc.so.6
#2  0x0000000000405d5e in write_tree_block (trans=0x701170, root=0x6fff60, 
    eb=0x702220) at disk-io.c:238
#3  0x0000000000405e90 in __commit_transaction (trans=0x701170, root=0x6fff60)
    at disk-io.c:354
#4  0x0000000000405f85 in btrfs_commit_transaction (trans=0x701170, 
    root=0x6fff60) at disk-io.c:385
#5  0x0000000000401655 in main (argc=<optimized out>, argv=<optimized out>)
    at repair.c:158

Comment 36 Josef Bacik 2011-08-19 13:31:12 UTC
Ah yes that's just a safety check to make sure we copy-on-writed the block we are writing out.  But we know we aren't doing that, we're just modifying the block in place.  Normally this is dangerous, but really your fs is screwed up anyway, how much worse could it get :).  So just go into disk-io.c in write_tree_block and comment out this area

if (!btrfs_buffer_uptodate(eb, trans->transid))
        BUG();

and then you should be a-ok.

Comment 37 Milan Bouchet-Valat 2011-08-19 14:49:08 UTC
OK, now it works, but that doesn't fix the problem... :-/

It says:
./repair /dev/sda6
getting key from slot 39
getting key from slot 40
getting key from slot 41
getting key from slot 42
getting key from slot 43
getting key from slot 44

(There was another line the first time about fixing something, but I lost it after the kernel crash.)

If I run 'find' on the mounted partition, I still get the lookup_inline_extent_backref  bug.

Comment 38 Josef Bacik 2011-08-19 15:04:33 UTC
Can you give me the output when you hit that bug, there may be other parts in your fs that are broken and we're just hitting that.  You'll want the patches that I gave you that print out the leaf when it hits that bug in place so I can see whats going on.

Comment 39 Milan Bouchet-Valat 2011-08-19 15:22:27 UTC
Here they are:
Aug 19 17:21:38 milan kernel: [ 2338.818690] Call Trace:
Aug 19 17:21:38 milan kernel: [ 2338.821679]  [<ffffffff811118ab>] ? kmem_cache_
alloc+0x90/0x105
Aug 19 17:21:38 milan kernel: [ 2338.824726]  [<ffffffffa00e5774>] __btrfs_free_
extent+0xc0/0x55d [btrfs]
Aug 19 17:21:38 milan kernel: [ 2338.827764]  [<ffffffff8146f92d>] ? __slab_free
+0x27/0xeb
Aug 19 17:21:38 milan kernel: [ 2338.830829]  [<ffffffffa0126365>] ? btrfs_delay
ed_ref_lock+0x3f/0x9d [btrfs]
Aug 19 17:21:38 milan kernel: [ 2338.833904]  [<ffffffffa00e8771>] run_clustered
_refs+0x615/0x672 [btrfs]
Aug 19 17:21:38 milan kernel: [ 2338.836994]  [<ffffffffa0126400>] ? btrfs_find_
ref_cluster+0x3d/0x145 [btrfs]
Aug 19 17:21:38 milan kernel: [ 2338.840088]  [<ffffffffa00e889f>] btrfs_run_del
ayed_refs+0xd1/0x193 [btrfs]
Aug 19 17:21:38 milan kernel: [ 2338.843184]  [<ffffffffa00f5026>] __btrfs_end_t
ransaction+0x6f/0x1f6 [btrfs]
Aug 19 17:21:38 milan kernel: [ 2338.846283]  [<ffffffffa00f51f0>] btrfs_end_tra
nsaction+0x15/0x17 [btrfs]
Aug 19 17:21:38 milan kernel: [ 2338.849396]  [<ffffffffa00fdd0b>] btrfs_dirty_inode+0xfe/0x107 [btrfs]
Aug 19 17:21:38 milan kernel: [ 2338.852479]  [<ffffffff8113eb8c>] __mark_inode_dirty+0x2e/0x167
Aug 19 17:21:38 milan kernel: [ 2338.855551]  [<ffffffff8113483f>] touch_atime+0x10e/0x131
Aug 19 17:21:38 milan kernel: [ 2338.858597]  [<ffffffff8112f5d3>] ? filldir+0x0/0xc7
Aug 19 17:21:38 milan kernel: [ 2338.861639]  [<ffffffff8112f8a8>] vfs_readdir+0x8c/0xac
Aug 19 17:21:38 milan kernel: [ 2338.864675]  [<ffffffff8112f9ae>] sys_getdents+0x7e/0xce
Aug 19 17:21:38 milan kernel: [ 2338.867721]  [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b

Comment 40 Milan Bouchet-Valat 2011-08-19 19:02:51 UTC
Created attachment 519091 [details]
Dmesg for remaining crashes (patched kernel)

Ah, sorry, I didn't read you comment carefully enough. Here's the output you requested, obtained with the patched kernel.

Comment 41 Josef Bacik 2011-08-29 14:22:28 UTC
Created attachment 520409 [details]
New and improved repair tool

Ok so I spent all last week writing a more comprehensive repair tool for a different bug that I'd like you to run.  Please run it with -d which means dry-run and attach the output to this bz so I can see what it's going to do, then I can decide if it's going to be safe to let it try and fix stuff for you :).

Comment 42 Milan Bouchet-Valat 2011-08-30 21:20:57 UTC
Nice, another one, and better! ;-)

It crashes after running for a while:

Starting program: /home/milan/Dev/btrfs-progs-unstable/repair /dev/sda6 -d
Checking extent root
Bad item end value, attempting to fix
Bad item end value, attempting to fix
Neighbor is bad too, will come back and try again
leaf 112747028480 items 18 free space 2627 generation 130677 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
	item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (1413240 6c 17547264) level 0
		tree block backref root 256
	item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98660622336 a8 8192) level 0
		tree block backref root 2
	item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98152087552 a8 4096) level 0
		tree block backref root 2
	item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (104854753280 a8 8192) level 0
		tree block backref root 2
	item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98147524608 a8 8192) level 0
		tree block backref root 2
	item 5 key (112430071808 EXTENT_ITEM 12297828649465286656) itemoff -1431655120 itemsize -1431655936

Program received signal SIGSEGV, Segmentation fault.
print_extent_item (slot=5, eb=0x1e3e6d0) at print-tree.c:190
190		flags = btrfs_extent_flags(eb, ei);
Missing separate debuginfos, use: debuginfo-install glibc-2.14-4.x86_64 libuuid-2.19.1-1.4.fc15.x86_64
(gdb) ba
#0  print_extent_item (slot=5, eb=0x1e3e6d0) at print-tree.c:190
#1  btrfs_print_leaf (root=<optimized out>, l=0x1e3e6d0) at print-tree.c:529
#2  0x0000000000402429 in check_leaf (path=0x61c920, root=0x61c1b0)
    at repair.c:530
#3  check_children (root=0x61c1b0, path=0x61c920, level=1) at repair.c:596
#4  0x000000000040219f in check_children (root=0x61c1b0, path=0x61c920, 
    level=2) at repair.c:590
#5  0x000000000040219f in check_children (root=0x61c1b0, path=0x61c920, 
    level=3) at repair.c:590
#6  0x0000000000401572 in main (argc=<optimized out>, argv=<optimized out>)
    at repair.c:778

Comment 43 Josef Bacik 2011-08-31 13:02:42 UTC
Argh ok I finally got the image stuff working so I can run the repair tool locally against an image (at least it appears to be working anyway).  I cannot find the image you sent me originally anywhere, can you recreate the image and put it somewhere for me to pull down so I can run the repair tool and make sure it's working right?

Comment 44 Milan Bouchet-Valat 2011-09-01 10:57:34 UTC
Like I said in my comment 27, the partition is now even in a worse state than before. I can't even run btrfs-image now, it crashes:
Program received signal SIGSEGV, Segmentation fault.
create_metadump (compress_level=<optimized out>, num_threads=<optimized out>, 
    out=0x61d010, input=<optimized out>) at btrfs-image.c:537
537				if (btrfs_extent_flags(leaf, ei) &
Missing separate debuginfos, use: debuginfo-install glibc-2.14-4.x86_64 libuuid-2.19.1-1.4.fc15.x86_64
(gdb) ba
#0  create_metadump (compress_level=<optimized out>, 
    num_threads=<optimized out>, out=0x61d010, input=<optimized out>)
    at btrfs-image.c:537
#1  main (argc=<optimized out>, argv=<optimized out>) at btrfs-image.c:875

Comment 45 Josef Bacik 2011-09-02 14:02:04 UTC
Created attachment 521232 [details]
Next iteration of the repair tool

Ugh sorry about that.  Here's a new repair tool that should only print out the bad parts and hopefully not segfault.  Make sure to run it with -d, once I get a full look at everything that's wrong I will double check all of my code and then you can run without the -d and hopefully it will fix everything.

Comment 46 Milan Bouchet-Valat 2011-09-02 15:08:19 UTC
No need to feel sorry... Anyways, now it works:

Bad item end value, attempting to fix
Bad item end value, attempting to fix
Neighbor is bad too, will come back and try again
Bad item end value, attempting to fix
Bad item end value, attempting to fix
Previous neighbor is bad, will come back and try again later
Bad item end value, attempting to fix
Bad item end value, attempting to fix
Neighbor is bad too, will come back and try again
Bad item end value, attempting to fix
Bad item end value, attempting to fix
Previous neighbor is bad, will come back and try again later
Couldn't fixup leaf
Checking extent root
bytenr 112747028480 item 5 key (112430071808 EXTENT_ITEM 12297828649465286656) itemoff -1431655120 itemsize -1431655936
bytenr 112747028480 item 6 key (12297828652328593920 UNKNOWN 48038393173158570) itemoff 11185626 itemsize 51
bytenr 112747028480 item 5 key (112430071808 EXTENT_ITEM 12297828649465286656) itemoff -1431655120 itemsize -1431655936
bytenr 112747028480 item 6 key (12297828652328593920 UNKNOWN 48038393173158570) itemoff 11185626 itemsize 51

Comment 47 Milan Bouchet-Valat 2011-09-12 19:41:59 UTC
Ping again? :-)

Comment 48 Josef Bacik 2011-09-16 20:55:31 UTC
Created attachment 523624 [details]
A new repair program

Ok here's a new one, and at this point I think it's time to create a public git tree you can just pull instead of me constantly putting patches up here :).  Run again with the dry-run.  This will try to fix things but won't actually write the fixes, and will print out the fixed leaf to make sure we fixed it properly.  We're getting there.

Comment 49 Milan Bouchet-Valat 2011-09-17 15:43:00 UTC
Here's the new output. Looks like it's not happy yet - my partition must be a hard case, but I can't really tell... :-)

Checking extent root
Bad key offset, deleting 12297828649465286656
Bad item end value, attempting to fix
bytenr 112747028480 item 5 key (12297828652328593920 UNKNOWN 48038393173158570) itemoff 3638 itemsize 102
Keys in the wrong order [objectid], swapping 5
Keys are out of order in a leaf, this program cant fix that yet, tell the author so he can get off his lazy ass and fix that
leaf 112747028480 items 17 free space 2652 generation 130677 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
	item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (1413240 6c 17547264) level 0
		tree block backref root 256
	item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98660622336 a8 8192) level 0
		tree block backref root 2
	item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98152087552 a8 4096) level 0
		tree block backref root 2
	item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (104854753280 a8 8192) level 0
		tree block backref root 2
	item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98147524608 a8 8192) level 0
		tree block backref root 2
	item 5 key (12297828652328593920 UNKNOWN 48038393173158570) itemoff 3638 itemsize 102
	item 6 key (112430448640 EXTENT_ITEM 4096) itemoff 3587 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98100707328 a8 12288) level 0
		tree block backref root 2
	item 7 key (112430551040 EXTENT_ITEM 4096) itemoff 3536 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110543069184) level 0
		tree block backref root 7
	item 8 key (112430637056 EXTENT_ITEM 4096) itemoff 3485 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (96611688448 a8 131072) level 0
		tree block backref root 2
	item 9 key (112430641152 EXTENT_ITEM 4096) itemoff 3434 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 96621641728) level 0
		tree block backref root 7
	item 10 key (112430645248 EXTENT_ITEM 4096) itemoff 3383 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 96619077632) level 0
		tree block backref root 7
	item 11 key (112430706688 EXTENT_ITEM 4096) itemoff 3332 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110681280512) level 0
		tree block backref root 7
	item 12 key (112430710784 EXTENT_ITEM 4096) itemoff 3281 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110673211392) level 0
		tree block backref root 7
	item 13 key (112430809088 EXTENT_ITEM 4096) itemoff 3230 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110719135744) level 0
		tree block backref root 7
	item 14 key (112430813184 EXTENT_ITEM 4096) itemoff 3179 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110718173184) level 0
		tree block backref root 7
	item 15 key (112430989312 EXTENT_ITEM 4096) itemoff 3128 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (505416 c 504678) level 0
		tree block backref root 256
	item 16 key (112431054848 EXTENT_ITEM 4096) itemoff 3077 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (507404 1 0) level 0
		tree block backref root 256

Comment 50 Josef Bacik 2011-09-19 15:09:04 UTC
Oops I had a check in there that was wrong, here is my git tree

git://github.com/josefbacik/btrfs-progs.git

you can clone that and just pull from it when I fix stuff and hopefully that will make this process a little simpler.

Comment 51 Milan Bouchet-Valat 2011-09-19 16:53:51 UTC
Yeah, it will be simpler for everybody.

I've tried it, and it gives the very same result despite commit 74e7c57. Was it intended to change something?

Comment 52 Josef Bacik 2011-09-19 17:51:39 UTC
Ugh sorry, pushed an update, give that a shot.

Comment 53 Milan Bouchet-Valat 2011-09-19 18:09:09 UTC
Still the same with commit 399173... :-/

I get warnings, if that's of any help:
repair.c: In function ‘fix_leaf_item’:
repair.c:361:6: attention : variable ‘did_cow’ set but not used [-Wunused-but-set-variable]
repair.c: In function ‘verify_extent_item’:
repair.c:466:6: attention : unused variable ‘ret’ [-Wunused-variable]

Comment 54 Josef Bacik 2011-09-19 19:12:01 UTC
Oh I see, that key has a bad type, I just pushed a fix that should catch that.

Comment 55 Milan Bouchet-Valat 2011-09-19 20:04:11 UTC
There's some progress!

Checking extent root
Bad key offset, deleting 12297828649465286656
Invalid key type, deleting key (12297828652328593920 0 48038393173158570)
Fixed something, dumping leaf to make sure it looks right
leaf 112747028480 items 16 free space 2677 generation 130677 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
	item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (1413240 6c 17547264) level 0
		tree block backref root 256
	item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98660622336 a8 8192) level 0
		tree block backref root 2
	item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98152087552 a8 4096) level 0
		tree block backref root 2
	item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (104854753280 a8 8192) level 0
		tree block backref root 2
	item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98147524608 a8 8192) level 0
		tree block backref root 2
	item 5 key (112430448640 EXTENT_ITEM 4096) itemoff 3587 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98100707328 a8 12288) level 0
		tree block backref root 2
	item 6 key (112430551040 EXTENT_ITEM 4096) itemoff 3536 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110543069184) level 0
		tree block backref root 7
	item 7 key (112430637056 EXTENT_ITEM 4096) itemoff 3485 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (96611688448 a8 131072) level 0
		tree block backref root 2
	item 8 key (112430641152 EXTENT_ITEM 4096) itemoff 3434 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 96621641728) level 0
		tree block backref root 7
	item 9 key (112430645248 EXTENT_ITEM 4096) itemoff 3383 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 96619077632) level 0
		tree block backref root 7
	item 10 key (112430706688 EXTENT_ITEM 4096) itemoff 3332 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110681280512) level 0
		tree block backref root 7
	item 11 key (112430710784 EXTENT_ITEM 4096) itemoff 3281 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110673211392) level 0
		tree block backref root 7
	item 12 key (112430809088 EXTENT_ITEM 4096) itemoff 3230 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110719135744) level 0
		tree block backref root 7
	item 13 key (112430813184 EXTENT_ITEM 4096) itemoff 3179 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110718173184) level 0
		tree block backref root 7
	item 14 key (112430989312 EXTENT_ITEM 4096) itemoff 3128 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (505416 c 504678) level 0
		tree block backref root 256
	item 15 key (112431054848 EXTENT_ITEM 4096) itemoff 3077 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (507404 1 0) level 0
		tree block backref root 256
Finding fs roots
Checking fs roots
Checking root 5
Checking root 5 refs
Checking root 256
Checking root 256 refs
Couldn't find an extent ref for bytenr 112430080000

Comment 56 Josef Bacik 2011-09-19 20:40:34 UTC
Ok thats perfect, now we just need to fix the leaf to have the data packed properly, I will work on that now.

Comment 57 Josef Bacik 2011-09-21 14:14:25 UTC
Ok give the new batch a run, this should delete the keys and reorder the items properly.  Once that is all working right the next step will to be to run without -d, so we're almost there :).

Comment 58 Milan Bouchet-Valat 2011-09-21 14:30:26 UTC
It crashes... :-/

(gdb) run -d /dev/sda6
Checking extent root
Leaf items aren't quite in the right order, fixing
Fixed something, dumping leaf to make sure it looks right
leaf 114049122304 items 40 free space 266 generation 131008 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
	item 0 key (0 BLOCK_GROUP_ITEM 4194304) itemoff 3971 itemsize 24
		block group used 0 chunk_objectid 256 flags 4
	item 1 key (4194304 BLOCK_GROUP_ITEM 8388608) itemoff 3947 itemsize 24
		block group used 1142461300736 chunk_objectid 0 flags 4294967296
	item 2 key (12582912 EXTENT_ITEM 446464) itemoff 3865 itemsize 82
		extent refs 8388608 gen 256 flags 1

Program received signal SIGABRT, Aborted.
0x0000003f61c352d5 in raise () from /lib64/libc.so.6
(gdb) ba
#0  0x0000003f61c352d5 in raise () from /lib64/libc.so.6
#1  0x0000003f61c36beb in abort () from /lib64/libc.so.6
#2  0x000000000040d17c in btrfs_extent_inline_ref_size (type=<optimized out>)
    at ctree.h:1208
#3  print_extent_item (slot=2, eb=0x62ee40) at print-tree.c:244
#4  btrfs_print_leaf (root=<optimized out>, l=0x62ee40) at print-tree.c:529
#5  0x0000000000402a07 in check_leaf (path=0x61d920, root=0x61d1b0)
    at repair.c:726
#6  check_children (root=0x61d1b0, path=0x61d920, level=1) at repair.c:764
#7  0x00000000004022d3 in check_children (root=0x61d1b0, path=0x61d920, 
    level=2) at repair.c:758
#8  0x00000000004022d3 in check_children (root=0x61d1b0, path=0x61d920, 
    level=3) at repair.c:758
#9  0x000000000040148a in main (argc=<optimized out>, argv=<optimized out>)
    at repair.c:944

Comment 59 Josef Bacik 2011-09-21 14:48:17 UTC
bah sorry about that, this one should be right.

Comment 60 Milan Bouchet-Valat 2011-09-21 15:40:20 UTC
No worries, here's the new log:
Bad key offset, deleting 12297828649465286656
Invalid key type, deleting key (12297828652328593920 0 48038393173158570)
Leaf items aren't quite in the right order, fixing
Checking extent root
Fixed something, dumping leaf to make sure it looks right
leaf 112747028480 items 16 free space 2677 generation 130677 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
	item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (1413240 6c 17547264) level 0
		tree block backref root 256
	item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98660622336 a8 8192) level 0
		tree block backref root 2
	item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98152087552 a8 4096) level 0
		tree block backref root 2
	item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (104854753280 a8 8192) level 0
		tree block backref root 2
	item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98147524608 a8 8192) level 0
		tree block backref root 2
	item 5 key (112430448640 EXTENT_ITEM 4096) itemoff 3587 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (96611688448 a8 131072) level 0
		tree block backref root 2
	item 6 key (112430551040 EXTENT_ITEM 4096) itemoff 3536 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 96621641728) level 0
		tree block backref root 7
	item 7 key (112430637056 EXTENT_ITEM 4096) itemoff 3485 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 96619077632) level 0
		tree block backref root 7
	item 8 key (112430641152 EXTENT_ITEM 4096) itemoff 3434 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110681280512) level 0
		tree block backref root 7
	item 9 key (112430645248 EXTENT_ITEM 4096) itemoff 3383 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110673211392) level 0
		tree block backref root 7
	item 10 key (112430706688 EXTENT_ITEM 4096) itemoff 3332 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110719135744) level 0
		tree block backref root 7
	item 11 key (112430710784 EXTENT_ITEM 4096) itemoff 3281 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110718173184) level 0
		tree block backref root 7
	item 12 key (112430809088 EXTENT_ITEM 4096) itemoff 3230 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (505416 c 504678) level 0
		tree block backref root 256
	item 13 key (112430813184 EXTENT_ITEM 4096) itemoff 3179 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (507404 1 0) level 0
		tree block backref root 256
	item 14 key (112430989312 EXTENT_ITEM 4096) itemoff 3128 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (505416 c 504678) level 0
		tree block backref root 256
	item 15 key (112431054848 EXTENT_ITEM 4096) itemoff 3077 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (507404 1 0) level 0
		tree block backref root 256
Couldn't find an extent ref for bytenr 112430080000
Finding fs roots
Checking fs roots
Checking root 5
Checking root 5 refs
Checking root 256
Checking root 256 refs

Comment 61 Josef Bacik 2011-09-21 17:22:25 UTC
Man thank God I don't work on anything important like a file system or anything.  I just pushed a fix that will actually update the items so they point to the right damn data, give that a run and hopefully that will work out right.

Comment 62 Milan Bouchet-Valat 2011-09-21 17:34:00 UTC
You mean I'm really fool to trust you? Maybe you're right, there's another issue now... :-D

(gdb) run -d /dev/sda6
Checking extent root
Bad key offset, deleting 12297828649465286656
Invalid key type, deleting key (12297828652328593920 0 48038393173158570)
Leaf items aren't quite in the right order, fixing
Keys in the wrong order [objectid], swapping 1
Keys are out of order in a leaf, this program cant fix that yet, tell the author so he can get off his lazy ass and fix that
repair: ctree.c:1688: leaf_space_used: Assertion `!(data_len < 0)' failed.

Program received signal SIGABRT, Aborted.
0x0000003f61c352d5 in raise () from /lib64/libc.so.6
(gdb) ba
#0  0x0000003f61c352d5 in raise () from /lib64/libc.so.6
#1  0x0000003f61c36beb in abort () from /lib64/libc.so.6
#2  0x0000003f61c2dc5e in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003f61c2dd02 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000403417 in leaf_space_used (l=<optimized out>, 
    start=<optimized out>, nr=<optimized out>) at ctree.c:1688
#5  leaf_space_used (l=0x1e3e8b0, start=<optimized out>, nr=8) at ctree.c:1677
#6  0x0000000000403eae in btrfs_leaf_free_space (root=<optimized out>, 
    leaf=<optimized out>) at ctree.c:1701
#7  0x000000000040cbfd in btrfs_print_leaf (root=<optimized out>, l=0x1e3e8b0)
    at print-tree.c:460
#8  0x000000000040270c in verify_extent_item (path=<optimized out>, 
    root=<optimized out>) at repair.c:508
#9  verify_leaf (path=<optimized out>, root=<optimized out>) at repair.c:598
#10 check_leaf (root=0x61d1b0, path=0x61d920) at repair.c:724
#11 0x0000000000402a73 in check_children (root=0x61d1b0, path=0x61d920, 
    level=1) at repair.c:770
#12 0x0000000000402a62 in check_children (root=0x61d1b0, path=0x61d920, 
    level=2) at repair.c:764
#13 0x0000000000402a62 in check_children (root=0x61d1b0, path=0x61d920, 
    level=3) at repair.c:764
#14 0x000000000040149f in main (argc=<optimized out>, argv=<optimized out>)
    at repair.c:953

Comment 63 Josef Bacik 2011-09-21 18:03:23 UTC
Try that.

Comment 64 Milan Bouchet-Valat 2011-09-21 18:50:53 UTC
Right:

Bad key offset, deleting 12297828649465286656
Invalid key type, deleting key (12297828652328593920 0 48038393173158570)
Leaf items aren't quite in the right order, fixing
Checking extent root
Fixed something, dumping leaf to make sure it looks right
leaf 112747028480 items 16 free space 2779 generation 130677 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
	item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (1413240 6c 17547264) level 0
		tree block backref root 256
	item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98660622336 a8 8192) level 0
		tree block backref root 2
	item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98152087552 a8 4096) level 0
		tree block backref root 2
	item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (104854753280 a8 8192) level 0
		tree block backref root 2
	item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98147524608 a8 8192) level 0
		tree block backref root 2
	item 5 key (112430448640 EXTENT_ITEM 4096) itemoff 3689 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98100707328 a8 12288) level 0
		tree block backref root 2
	item 6 key (112430551040 EXTENT_ITEM 4096) itemoff 3638 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110543069184) level 0
		tree block backref root 7
	item 7 key (112430637056 EXTENT_ITEM 4096) itemoff 3587 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (96611688448 a8 131072) level 0
		tree block backref root 2
	item 8 key (112430641152 EXTENT_ITEM 4096) itemoff 3536 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 96621641728) level 0
		tree block backref root 7
	item 9 key (112430645248 EXTENT_ITEM 4096) itemoff 3485 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 96619077632) level 0
		tree block backref root 7
	item 10 key (112430706688 EXTENT_ITEM 4096) itemoff 3434 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110681280512) level 0
		tree block backref root 7
	item 11 key (112430710784 EXTENT_ITEM 4096) itemoff 3383 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110673211392) level 0
		tree block backref root 7
	item 12 key (112430809088 EXTENT_ITEM 4096) itemoff 3332 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110719135744) level 0
		tree block backref root 7
	item 13 key (112430813184 EXTENT_ITEM 4096) itemoff 3281 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110718173184) level 0
		tree block backref root 7
	item 14 key (112430989312 EXTENT_ITEM 4096) itemoff 3230 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (505416 c 504678) level 0
		tree block backref root 256
	item 15 key (112431054848 EXTENT_ITEM 4096) itemoff 3179 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (507404 1 0) level 0
		tree block backref root 256
Couldn't find an extent ref for bytenr 112430080000
Finding fs roots
Checking fs roots
Checking root 5
Checking root 5 refs
Checking root 256
Checking root 256 refs

Comment 65 Josef Bacik 2011-09-21 19:41:18 UTC
Ok I pushed another update to make sure the path get's cow'ed properly.  Go ahead and pull that and then run

./repair /dev/sda6

now this is likely to bring a whole host of other problems since I'm not able to test this part of the code, but it once we get through all those you should have a fully fixed system, or a fs that is so totally broken there is no hope of return :).

Comment 66 Milan Bouchet-Valat 2011-09-21 20:50:49 UTC
Well, it run once correctly with -d, but it crashed without. For the record, the log until then is:
leaf 112747028480 items 18 free space 2627 generation 130677 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
        item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
                extent refs 1 gen 129941 flags 2
                tree block key (1413240 6c 17547264) level 0
                tree block backref root 256
        item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
                extent refs 1 gen 129941 flags 2
                tree block key (98660622336 a8 8192) level 0
                tree block backref root 2
        item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
                extent refs 1 gen 129941 flags 2
                tree block key (98152087552 a8 4096) level 0
                tree block backref root 2
        item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
                extent refs 1 gen 129941 flags 2
                tree block key (104854753280 a8 8192) level 0
                tree block backref root 2
        item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
                extent refs 1 gen 129941 flags 2
                tree block key (98147524608 a8 8192) level 0
                tree block backref root 2

I'm running it again in gdb (which I should have done before).

Comment 67 Milan Bouchet-Valat 2011-09-21 20:57:03 UTC
And here's the second real attempt in gdb (I assume it crashed at the same place):

Checking extent root
Bad key offset, deleting 12297828649465286656
Started transid 131241
Invalid key type, deleting key (12297828652328593920 0 48038393173158570)
Bad item end value, attempting to fix
Bad item end value, attempting to fix
Bad item end value, attempting to fix
Bad item end value, attempting to fix
Couldn't fixup leaf
Fixed something, dumping leaf to make sure it looks right
leaf 112747028480 items 18 free space 2627 generation 130677 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
	item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (1413240 6c 17547264) level 0
		tree block backref root 256
	item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98660622336 a8 8192) level 0
		tree block backref root 2
	item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98152087552 a8 4096) level 0
		tree block backref root 2
	item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (104854753280 a8 8192) level 0
		tree block backref root 2
	item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98147524608 a8 8192) level 0
		tree block backref root 2
	item 5 key (112430071808 EXTENT_ITEM 12297828649465286656) itemoff -1431655120 itemsize -1431655936

Program received signal SIGSEGV, Segmentation fault.
print_extent_item (slot=5, eb=0x1e3e8b0) at print-tree.c:190
190		flags = btrfs_extent_flags(eb, ei);
(gdb) ba
#0  print_extent_item (slot=5, eb=0x1e3e8b0) at print-tree.c:190
#1  btrfs_print_leaf (root=<optimized out>, l=0x1e3e8b0) at print-tree.c:529
#2  0x00000000004027d4 in check_leaf (root=0x61d1b0, path=0x61d920)
    at repair.c:740
#3  0x0000000000402a92 in check_children (root=0x61d1b0, path=0x61d920, 
    level=1) at repair.c:778
#4  0x0000000000402a81 in check_children (root=0x61d1b0, path=0x61d920, 
    level=2) at repair.c:772
#5  0x0000000000402a81 in check_children (root=0x61d1b0, path=0x61d920, 
    level=3) at repair.c:772
#6  0x000000000040149f in main (argc=<optimized out>, argv=<optimized out>)
    at repair.c:961

Comment 68 Josef Bacik 2011-09-22 16:10:47 UTC
Yeah these would be those bugs that I haven't shaken out yet :).  Just pushed a couple of fixes, let me know how that works out.

Comment 69 Milan Bouchet-Valat 2011-09-22 18:52:03 UTC
OK, it didn't crash, but now I'm unable to mount the partition because the superblock seems corrupted.

./repair output:
Bad key offset, deleting 12297828649465286656
Invalid key type, deleting key (12297828652328593920 0 48038393173158570)
Leaf items aren't quite in the right order, fixing
Checking extent root
Started transid 131253
Fixed something, dumping leaf to make sure it looks right
leaf 112638033920 items 16 free space 2779 generation 131253 owner 2
fs uuid 6f5d42c3-52fd-474b-88ec-756bbf64dd1b
chunk uuid f7373b86-6f91-4c96-b4e2-df5859a92c1e
	item 0 key (112428765184 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (1413240 6c 17547264) level 0
		tree block backref root 256
	item 1 key (112428822528 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98660622336 a8 8192) level 0
		tree block backref root 2
	item 2 key (112429154304 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98152087552 a8 4096) level 0
		tree block backref root 2
	item 3 key (112429563904 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (104854753280 a8 8192) level 0
		tree block backref root 2
	item 4 key (112429604864 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98147524608 a8 8192) level 0
		tree block backref root 2
	item 5 key (112430448640 EXTENT_ITEM 4096) itemoff 3689 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (98100707328 a8 12288) level 0
		tree block backref root 2
	item 6 key (112430551040 EXTENT_ITEM 4096) itemoff 3638 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110543069184) level 0
		tree block backref root 7
	item 7 key (112430637056 EXTENT_ITEM 4096) itemoff 3587 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (96611688448 a8 131072) level 0
		tree block backref root 2
	item 8 key (112430641152 EXTENT_ITEM 4096) itemoff 3536 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 96621641728) level 0
		tree block backref root 7
	item 9 key (112430645248 EXTENT_ITEM 4096) itemoff 3485 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 96619077632) level 0
		tree block backref root 7
	item 10 key (112430706688 EXTENT_ITEM 4096) itemoff 3434 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110681280512) level 0
		tree block backref root 7
	item 11 key (112430710784 EXTENT_ITEM 4096) itemoff 3383 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110673211392) level 0
		tree block backref root 7
	item 12 key (112430809088 EXTENT_ITEM 4096) itemoff 3332 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110719135744) level 0
		tree block backref root 7
	item 13 key (112430813184 EXTENT_ITEM 4096) itemoff 3281 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (18446744073709551606 80 110718173184) level 0
		tree block backref root 7
	item 14 key (112430989312 EXTENT_ITEM 4096) itemoff 3230 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (505416 c 504678) level 0
		tree block backref root 256
	item 15 key (112431054848 EXTENT_ITEM 4096) itemoff 3179 itemsize 51
		extent refs 1 gen 129941 flags 2
		tree block key (507404 1 0) level 0
		tree block backref root 256
Couldn't find an extent ref for bytenr 112430080000
extent buffer leak: start 112638033920 len 4096
extent buffer leak: start 113980211200 len 4096
extent buffer leak: start 113980215296 len 4096
extent buffer leak: start 113980219392 len 4096
Finding fs roots
Checking fs roots
Checking root 5
Checking root 5 refs
Checking root 256
Checking root 256 refs
writing out a block [x30]

mount output:
mount : /dev/sda6 : can't read superblock

Relevant dmesg excerpt:
[439452.065960] device fsid 4b47fd52c3425d6f-1bdd64bf6b75ec88 devid 1 transid 131253 /dev/sda6
[439453.645594] parent transid verify failed on 114248884224 wanted 131253 found 131241
[439453.645971] parent transid verify failed on 114248884224 wanted 131253 found 131241
[439453.650210] parent transid verify failed on 114248884224 wanted 131253 found 131241
[439453.650226] parent transid verify failed on 114248884224 wanted 131253 found 131241
[439453.666257] btrfs warning page private not zero on page 29380608
[439453.671747] btrfs: open_ctree failed

Comment 70 Josef Bacik 2011-09-26 18:32:25 UTC
I have a patch for that, I just need to dig it out and clean it up, I'll attach it shortly.

Comment 71 Josef Bacik 2011-09-26 19:00:20 UTC
Actually that won't help you, can you re-run the repair with -d and see if it complains the same way?

Comment 72 Milan Bouchet-Valat 2011-09-26 19:51:38 UTC
It aborts now... :-)

Program received signal SIGABRT, Aborted.
0x00000035dd6352d5 in __GI_raise (sig=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
(gdb)  ba
#0  0x00000035dd6352d5 in __GI_raise (sig=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00000035dd636beb in __GI_abort () at abort.c:93
#2  0x00000035dd62dc5e in __assert_fail_base (fmt=<optimized out>, 
    assertion=0x415af5 "!(!root->node)", file=0x4159fc "disk-io.c", 
    line=<optimized out>, function=<optimized out>) at assert.c:96
#3  0x00000035dd62dd02 in __GI___assert_fail (
    assertion=0x415af5 "!(!root->node)", file=0x4159fc "disk-io.c", line=419, 
    function=0x415d30 "find_and_setup_root") at assert.c:105
#4  0x00000000004078d0 in find_and_setup_root (tree_root=0x61d010, 
    fs_info=<optimized out>, objectid=5, root=0x7021f0) at disk-io.c:419
#5  0x0000000000407934 in btrfs_read_fs_root_no_cache (
    fs_info=<optimized out>, location=0x7fffffffe08f) at disk-io.c:494
#6  0x0000000000407b3f in btrfs_read_fs_root (fs_info=0x61f180, 
    location=0x7fffffffe08f) at disk-io.c:564
#7  0x000000000040816f in open_ctree_fd (fp=7, 
    path=0x2003f <Address 0x2003f out of bounds>, sb_bytenr=<optimized out>, 
    writes=<optimized out>) at disk-io.c:769
#8  0x00000000004081ff in open_ctree (filename=0x7fffffffe59a "/dev/sda6", 
    sb_bytenr=0, writes=1) at disk-io.c:590
#9  0x00000000004013c8 in main (argc=3, argv=0x7fffffffe278) at repair.c:937

Comment 73 Josef Bacik 2011-09-29 18:55:19 UTC
Alright give that a whirl, again with -d to make sure it's working.  Then I'll figure out a way to unscrew the fs from there.

Comment 74 Milan Bouchet-Valat 2011-09-29 21:01:18 UTC
Created attachment 525645 [details]
repair output

It prints a lot of warnings, and seems to enter an infinite loop. I stopped it in gdb a few times, waiting one minute or two between each stop, to check, and the pointers seems to be the same (see the end of the log). Disk activity was null, and CPU 100%.

Comment 75 Josef Bacik 2011-10-04 19:54:21 UTC
Ok so before I screw your file system up any more than I already have, I've written a basic restore program that will go through and dump out all of your data into a directory.  So pull from the git tree and run

./restore /your/dev /some/dir

and sit back and relax.  This should work because it seems like your fs tree's are a-ok, it's just your extent tree that's broken.  If you mounted with compress=lzo at all then let me know because I only added zlib support, but it will error out if it runs into anything it cant handle.  Let me know how that goes.

Comment 76 Milan Bouchet-Valat 2011-10-04 21:19:15 UTC
Ah, great! That will make both of us more relax...

I've ran it for about one hour (and 30 min CPU time), and while it was quite fast at the beginning (3GB in a few minutes), it seems to have stalled, eating 100% CPU and making no progress (file count doesn't increase). There was a warning at the beginning:
parent transid verify failed on 114248884224 wanted 131253 found 131241

But which is probably unrelated since it appeared while files were actually being copied. Now, it seems to be working very hard on a single file, always at the same position, but changing a little the params:

Program received signal SIGINT, Interrupt.
0x000000000040fbb8 in read_extent_buffer (eb=<optimized out>, 
    dst=0x7fff36b7866f, start=226, len=16) at /usr/include/bits/string3.h:52
52	  return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
(gdb) ba
#0  0x000000000040fbb8 in read_extent_buffer (eb=<optimized out>, 
    dst=0x7fff36b7866f, start=226, len=16) at /usr/include/bits/string3.h:52
#1  0x0000000000401b52 in btrfs_item_key (nr=<optimized out>, 
    disk_key=0x7fff36b7866f, eb=0x42c1190) at ctree.h:1321
#2  btrfs_item_key_to_cpu (nr=<optimized out>, key=read_sleb128: Corrupted DWARF expression.
) at ctree.h:1398
#3  copy_file (key=0x7fff36b7865e, fd=3, root=0x25381f0) at restore.c:263
#4  search_dir (root=0x25381f0, key=0x7fff36b7884e, 
    dir=0x3f41230 "/media/WD Passport/Milan/home-sda6/invite/.cache/dconf")
    at restore.c:378
#5  0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78a3e, 
    dir=0x3f47670 "/media/WD Passport/Milan/home-sda6/invite/.cache")
    at restore.c:427
#6  0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78c2e, 
    dir=0x253c550 "/media/WD Passport/Milan/home-sda6/invite") at restore.c:427
#7  0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78d2f, 
    dir=0x7fff36b78caf "/media/WD Passport/Milan/home-sda6") at restore.c:427
#8  0x0000000000401588 in main (argc=<optimized out>, argv=0x7fff36b78e38)
    at restore.c:501
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
copy_file (key=0x7fff36b7865e, fd=3, root=0x25381f0) at restore.c:264
264			if (found_key.objectid != key->objectid)
(gdb) ba
#0  copy_file (key=0x7fff36b7865e, fd=3, root=0x25381f0) at restore.c:264
#1  search_dir (root=0x25381f0, key=0x7fff36b7884e, 
    dir=0x3f41230 "/media/WD Passport/Milan/home-sda6/invite/.cache/dconf")
    at restore.c:378
#2  0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78a3e, 
    dir=0x3f47670 "/media/WD Passport/Milan/home-sda6/invite/.cache")
    at restore.c:427
#3  0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78c2e, 
    dir=0x253c550 "/media/WD Passport/Milan/home-sda6/invite") at restore.c:427
#4  0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78d2f, 
    dir=0x7fff36b78caf "/media/WD Passport/Milan/home-sda6") at restore.c:427
#5  0x0000000000401588 in main (argc=<optimized out>, argv=0x7fff36b78e38)
    at restore.c:501
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x000000000040fbb8 in read_extent_buffer (eb=<optimized out>, 
    dst=0x7fff36b7866f, start=226, len=4) at /usr/include/bits/string3.h:52
52	  return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
(gdb) ba
#0  0x000000000040fbb8 in read_extent_buffer (eb=<optimized out>, 
    dst=0x7fff36b7866f, start=226, len=4) at /usr/include/bits/string3.h:52
#1  0x0000000000401b52 in btrfs_item_key (nr=<optimized out>, 
    disk_key=0x7fff36b7866f, eb=0x42c1190) at ctree.h:1321
#2  btrfs_item_key_to_cpu (nr=<optimized out>, key=read_sleb128: Corrupted DWARF expression.
) at ctree.h:1398
#3  copy_file (key=0x7fff36b7865e, fd=3, root=0x25381f0) at restore.c:263
#4  search_dir (root=0x25381f0, key=0x7fff36b7884e, 
    dir=0x3f41230 "/media/WD Passport/Milan/home-sda6/invite/.cache/dconf")
    at restore.c:378
#5  0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78a3e, 
    dir=0x3f47670 "/media/WD Passport/Milan/home-sda6/invite/.cache")
    at restore.c:427
#6  0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78c2e, 
    dir=0x253c550 "/media/WD Passport/Milan/home-sda6/invite") at restore.c:427
#7  0x0000000000401ffc in search_dir (root=0x25381f0, key=0x7fff36b78d2f, 
    dir=0x7fff36b78caf "/media/WD Passport/Milan/home-sda6") at restore.c:427
#8  0x0000000000401588 in main (argc=<optimized out>, argv=0x7fff36b78e38)
    at restore.c:501

Comment 77 Josef Bacik 2011-10-05 13:40:00 UTC
A couple of shots in the dark, I have a bug where if you hit a prealloced extent it would just loop forever so hopefully that's it, or there is a problem getting the leaf node and we're not getting an error back.  Give that a whirl and let me know how it works.  If it does the same thing, each time you stop it do

p path->nodes[0]
p leaf
p path->slots[0]

so I can get an idea if it's looping on the same slot or not.  Thanks.

Comment 78 Milan Bouchet-Valat 2011-10-05 17:30:37 UTC
Yes, it works. :-)

But it stopped later with an error:
Error mkdiring /media/WD Passport/Milan/home-sda6/milan.sav/.local/share/evolution/mail/imap/nalimilan@imap.sfr.fr.old/folders/`"O�: 84

errno 84 is "Invalid or incomplete multibyte or wide character. It seems that the path includes weird chars coming from elsewhere... If that helps, the dir /media/WD Passport/Milan/home-sda6/milan.sav/ is still completely empty. I can actually skip that dir if needed, as it's of absolutely no interest.

And there was also a note about a snapshot not being copied, but that snapshot actually contains the interesting data... :-/

Comment 79 Josef Bacik 2011-10-05 20:14:46 UTC
Ok so I had a -s option for restoring snapshots but it was broken.  I've fixed it so run

./restore -is /dev/whatever /mnt/wherever

this will make restore ignore errors, it will still complain about them but it will keep going and try to restore other things.  So in the case of your file it will just move on to the next file.  The -s option will restore snapshots.  This means that if that snapshot has links to any of your other snapshots or subvolumes you are going to end up with duplicates of stuff restored, so use with caution.

Comment 80 Milan Bouchet-Valat 2011-10-06 13:43:47 UTC
It copied about 80GB, but now it's progressing at the pace of a few MBs per hour. It's still running, but if the trace can help, it's:

#0  0x00000035dd6d18a3 in __pread_nocancel ()
    at ../sysdeps/unix/syscall-template.S:82
#1  0x0000000000401e14 in pread (__offset=<optimized out>, 
    __nbytes=<optimized out>, __buf=0x7fb2a4bb0010, __fd=4)
    at /usr/include/bits/unistd.h:100
#2  copy_one_extent (pos=35913728, fi=<optimized out>, leaf=0x2d177d0, fd=3, 
    root=0x1afccd0) at restore.c:184
#3  copy_file (key=0x7fff902de27e, fd=3, root=0x1afccd0) at restore.c:302
#4  search_dir (root=0x1afccd0, key=0x7fff902de46e, 
    dir=0x1b1ffd0 "/media/WD Passport/Milan/home-sda6/home.snapshot/milan/P2P/Fedora-15-Beta-x86_64-Live-Desktop") at restore.c:398
#5  0x00000000004020ed in search_dir (root=0x1afccd0, key=0x7fff902de65e, 
    dir=0x34f7230 "/media/WD Passport/Milan/home-sda6/home.snapshot/milan/P2P")
    at restore.c:466
#6  0x00000000004020ed in search_dir (root=0x1afccd0, key=0x7fff902de84e, 
    dir=0x34fd670 "/media/WD Passport/Milan/home-sda6/home.snapshot/milan")
    at restore.c:466
#7  0x00000000004020ed in search_dir (root=0x1afccd0, key=0x7fff902dea3e, 
    dir=0x1af2550 "/media/WD Passport/Milan/home-sda6/home.snapshot")
    at restore.c:466
#8  0x00000000004020ed in search_dir (root=0x1aee1f0, key=0x7fff902deb3f, 
    dir=0x7fff902deabf "/media/WD Passport/Milan/home-sda6") at restore.c:466
#9  0x00000000004015a8 in main (argc=<optimized out>, argv=0x7fff902dec48)

And five minutes later it was:
#0  0x00000035dd6d18a3 in __pread_nocancel ()
    at ../sysdeps/unix/syscall-template.S:82
#1  0x0000000000401e14 in pread (__offset=<optimized out>, 
    __nbytes=<optimized out>, __buf=0x7fb2a4bb0010, __fd=4)
    at /usr/include/bits/unistd.h:100
#2  copy_one_extent (pos=37486592, fi=<optimized out>, leaf=0x2d177d0, fd=3, 
    root=0x1afccd0) at restore.c:184
#3  copy_file (key=0x7fff902de27e, fd=3, root=0x1afccd0) at restore.c:302
#4  search_dir (root=0x1afccd0, key=0x7fff902de46e, 
    dir=0x1b1ffd0 "/media/WD Passport/Milan/home-sda6/home.snapshot/milan/P2P/Fedora-15-Beta-x86_64-Live-Desktop") at restore.c:398

If I had knew that stale ISO would be so annoying! ;-) Looks like it doesn't like big files, but it will eventually succeed I guess.

Comment 81 Josef Bacik 2011-10-06 14:32:59 UTC
Did you happen to download that ISO via a torrent?  Some torrent programs don't preallocate the space for what they're downloading, so the files end up super fragmented, which is going to _suck_ for this restore program, since it's going to look for an extent, allocate a buffer the size of the extent, read in the data, and write it out, and move to the next extent.  So worst case scenario you have a 650 mb file that's broken up into 160k extents, and say it takes 1/2 a second to deal with an extent, you are looking at about 24 hours in the worst case :(.

Comment 82 Milan Bouchet-Valat 2011-10-06 15:19:36 UTC
That's it. I downloaded it via Transmission, and there are probably other files in that folder too. Now it's at pos=69M, which required a few hours to get there. :-/

Maybe I can trick it from gdb to skip that folder?

Comment 83 Milan Bouchet-Valat 2011-10-07 13:01:48 UTC
OK, I've added a little hack to skip these files, let's see how it goes...

Comment 84 Josef Bacik 2011-10-07 13:15:42 UTC
Ah good sorry I didn't see this till this morning, let me know if it doesn't work, I'm thinking about adding a timer that will pause after say 5 minutes on the same file and asking if the user wants to skip it.

Comment 85 Milan Bouchet-Valat 2011-10-10 10:22:09 UTC
Yes, that could be useful in the future.

Anyway, I think I have been able to backup everything that was needed, so you can go on destroying my filesystem now. ;-)

Comment 86 Milan Bouchet-Valat 2011-10-18 09:13:34 UTC
Ping? I'd really like to get this 400GB partition (4/5 of my hard disk) usable again... :-)

Comment 87 Josef Bacik 2011-10-18 13:10:11 UTC
Yup sorry this restore program has been hugely popular so I've had my attention on a bunch of users who were trying to get it to work for them.  That's all winding down now so I'll get back to work on the repair program.

Comment 88 Milan Bouchet-Valat 2011-10-18 15:41:15 UTC
Glad to know the tool is useful to others. btrfs will end with the most resistant repair tool on the market!

Comment 89 Milan Bouchet-Valat 2011-10-21 09:09:09 UTC
Oh, and if you plan to further improve the restore tool, one useful feature would be to preserve [mca]time. This is something definitely valuable, especially when you do incremental backups. (End of wishlist... ;-)

Comment 90 Josef Bacik 2011-10-21 14:33:08 UTC
I'm not sure if I can do that but I will definitely look into it.  I'm going to be in Prague next week for kernel summit but when I get back I will refocus on the repair tool, I got caught up again helping somebody with the restore tool.

Comment 91 Milan Bouchet-Valat 2011-11-06 11:09:22 UTC
Today, I noticed some restored files are somewhat corrupt. For example, a source file had a few hundreds of \00 at its end (the rest of the contents were OK). Any idea what do to about that?

Comment 92 Josef Bacik 2011-11-07 14:09:19 UTC
Yeah sorry Chris found a problem where I needed to truncate the file to it's actual size, so delete your copy of my tree and repull, it will have the new fixes and re-run the restore, it should give you non-corrupt files.

Comment 93 Milan Bouchet-Valat 2011-11-12 15:09:35 UTC
Looks good now! git is a pretty good data consistency checking tool. ;-)

Comment 94 Milan Bouchet-Valat 2011-12-05 16:12:07 UTC
Are you still interested in fixing my partition? If you can't find the time, or think it would be a waste of time at this point, I can perfectly format it and put the restored files on it. If there was a way to restore the timestamps, it would be perfect!

Comment 95 Milan Bouchet-Valat 2011-12-31 11:37:56 UTC
OK, I just wiped that broken partition, because I couldn't live in the few remaining GB (this prevented me from upgrading to F16). Thanks for the kind help you provided, at least I could get back all of my files. :-)

Hope the original problem is gone...


Note You need to log in before you can comment on or make changes to this bug.