Bug 726814
Summary: | btrfs filesystem corruption/crash | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Kevin Fenzi <kevin> | ||||||||||||||||||||||||||
Component: | kernel | Assignee: | Zach Brown <zab> | ||||||||||||||||||||||||||
Status: | CLOSED CANTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||||||||||||||
Priority: | unspecified | ||||||||||||||||||||||||||||
Version: | 19 | CC: | dwmw2, gansalmon, itamar, jbacik, jonathan, kernel-maint, madhu.chinakonda, sweil | ||||||||||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||||||
Last Closed: | 2013-05-15 19:27:30 UTC | Type: | --- | ||||||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||||||||||
Bug Blocks: | 689509 | ||||||||||||||||||||||||||||
Attachments: |
|
Description
Kevin Fenzi
2011-07-29 20:50:14 UTC
Created attachment 517263 [details]
More debugging
You can apply this on top of the patch I've already given you. This will dump the leaf so I can see where the corruption is. Make sure to run dmesg -n8 with netconsole so that it does actually send all the kernel messages across the wire.
ok. I have applied that patch and this one, rebooted in the new kernel. I then did a: btrfs filesystem defragment /home/kevin/Mail/lists/fedora-extras-commits/ and got the first oops in the attached file. This is one of the dirs my backups blow up on. Then, I did a 'sync' and got the second one. Happy to test further things. Created attachment 517306 [details]
netconnect dmesg output from 2 oopses.
Created attachment 519705 [details]
repair program
So I don't have a broken fs to test this on, but it passes just fine on a clean one so it seems like all my checking is right. That being said, I probably screwed up writing somewhere, so this may blow up, and if it does just load it into gdb and do a bt so I can see where it segfaulted. Hopefully this won't make your fs worse than it already is, but I make no promises :). Please run with -d first, this will do a dry run and just spit out all the errors and not actually fix anything. Attach this output to this bz so I can verify it's going to fix things correctly, you'll want to do something like
./repair -d /dev/whatever > out.txt 2>&1
and attach out.txt. You'll need to apply this patch onto btrfs-progs-unstable from upstream, here is the git tree
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
Good luck :).
Created attachment 519720 [details]
repair -d output
Here's the output from the -d.
Doesn't tell me much, but perhaps it will tell you something. ;)
Created attachment 519722 [details]
Incremental patch
Just being extra paranoid, but can you apply this on top of what you have and try again and attach the output? I just want to make absolute sure it's going to do the right thing when you do the real run.
Created attachment 519724 [details]
repair2 -d output
next repair -d run
Created attachment 519725 [details]
Another incremental
Ooops I forget you can have items with 0 size that are valid. Just apply this over the top and do the dry run again.
Created attachment 519729 [details]
Another incremental
This will work eventually, I promise :).
Created attachment 519908 [details]
Patch to check all fs roots
Same old song and dance. This one prints out what its doing so you can feel like its doing something :).
This patch doesn't seem to apply. Says already applied or reversed? Created attachment 519935 [details]
Full new repair patch
Ok just unapply everything I've sent you and apply this new one, this should be everything up to this point. Sorry about that.
Created attachment 519982 [details]
output of last repair run
Here's the last repair run output.
Created attachment 520098 [details]
An incremental
Ok so my repair program still isn't finding errors. This will check all the data extents, hopefully this finds something. If not it looks like this may just be a normal bug and not corruption, which would be great and crappy all at the same time.
ok, same output: Checking extent root Finding fs roots Checking fs roots Checking root 5 Checking root 5 refs Ok its image creation time. I'm pretty sure it's built in fedora, but if its not just run make btrfs-image in your btrfs-progs-unstable tree. Just run btrfs-image -c 9 -t <number of threads> /dev/whatever and then put the image somewhere I can suck it down. Maybe now that Chris is back from vacation I can get him to run his fsck against it and see if his picks anything up. http://www.scrye.com/~kevin/fedora/corrupt-home-20110829 Let me know if I can provide anything further. Any news? Did the image help any? I'd love to be able to get my old data off the drive... ;) Sorry I've been messing with the repair tool with another user who's fs is even more screwed than yours. Good news is my repair tool now finds a problem with your corrupt image, the bad news is the other user ran my tool without the -d option and it made things worse, so I'm going to rig up a tool that will just pull all of your data off the disk since neither of your actual fs roots are corrupted, and we'll leave the repair tool to Chris. Should have this all rigged up in a day. Ok clone this tree git://github.com/josefbacik/btrfs-progs.git and run make (make sure you have zlib-devel installed) and then run ./restore /your/device /some/dir this will dump everything from your device into that directory. It will skip any snapshots, but will work right with subvolumes. Let me know if something goes wrong. Sadly, it cranked along for about 1.75GB worth, then: # ./restore /dev/mapper/vg_ohm-lv_home /tmp/ohm/ Short write: 0 # This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19 this has been open for a really long time, with no further apparent progress. Kevin, is this something you still run into ? I'm no longer running btrfs here, so no idea. :( I guess you can close it out... unfortunate, but I can't say I blame you. |