Bug 863978 - Odd btrfs failures in F18 and Rawhide
Odd btrfs failures in F18 and Rawhide
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: btrfs-progs (Show other bugs)
18
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Zach Brown
Fedora Extras Quality Assurance
: Reopened
: 816304 (view as bug list)
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs 887973 907554 910270 922891
  Show dependency treegraph
 
Reported: 2012-10-08 04:49 EDT by Richard W.M. Jones
Modified: 2015-05-17 21:40 EDT (History)
18 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 887973 907554 (view as bug list)
Environment:
Last Closed: 2013-07-03 10:24:00 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Richard W.M. Jones 2012-10-08 04:49:22 EDT
Description of problem:

I don't have a way to reproduce this, but it does happen a lot
when running the libguestfs tests in Koji:

btrfstune -S 0 /dev/sda1
Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
read block failed check_tree_block
Couldn't read chunk root
[   87.672076] btrfstune[615]: segfault at 108 ip 08068c0e sp bfbc0cd0 error 4 in btrfstune[8048000+29000]
guestfsd: error: /dev/sda1: Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
read block failed check_tree_block
Couldn't read chunk root
libguestfs: recv_from_daemon: 256 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 4b | 00 00 00 01 | 00 12 34 9a | ...
libguestfs: trace: btrfs_set_seeding = -1 (error)
btrfs_set_seeding: /dev/sda1: Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
read block failed check_tree_block
Couldn't read chunk root
(full log:
http://kojipkgs.fedoraproject.org//work/tasks/194/4570194/build.log )

mkfs.btrfs /dev/sda1
libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 ac | ...
libguestfs: trace: mkfs_btrfs = 0
libguestfs: trace: mount "/dev/sda1" "/"
libguestfs: send_to_daemon: 68 bytes: 00 00 00 40 | 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 00 | ...
guestfsd: main_loop: proc 317 (mkfs_btrfs) took 0.39 seconds
guestfsd: main_loop: new request, len 0x40
mount -o  /dev/sda1 /sysroot/
[  104.255959] device fsid 60f2b899-f8ab-4e97-8b1b-f9fa6729abaa devid 1 transid 1 /dev/sda1
[  104.256525] device fsid 60f2b899-f8ab-4e97-8b1b-f9fa6729abaa devid 1 transid 1 /dev/sda1
[  104.263306] btrfs: disk space caching is enabled
[  104.302276] btrfs: failed to recover relocation
[  104.318866] btrfs: open_ctree failed
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 ad | ...
libguestfs: trace: mount = -1 (error)
mount: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
(full log:
http://kojipkgs.fedoraproject.org//work/tasks/193/4570193/build.log )

mkfs.btrfs /dev/sda1
libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 90 | ...
libguestfs: trace: mkfs_btrfs = 0
libguestfs: trace: btrfs_fsck "/dev/sda1"
libguestfs: send_to_daemon: 72 bytes: 00 00 00 44 | 20 00 f5 f5 | 00 00 00 04 | 00 00 01 4c | 00 00 00 00 | ...
guestfsd: main_loop: proc 317 (mkfs_btrfs) took 6.02 seconds
guestfsd: main_loop: new request, len 0x44
grep ^[[:space:]]*btrfs$ /proc/filesystems
btrfsck /dev/sda1
Check tree block failed, want=4214784, have=0
Check tree block failed, want=4214784, have=0
Check tree block failed, want=4214784, have=0
read block failed check_tree_block
Couldn't read tree root
Critical roots corrupted, unable to fsck the FS
guestfsd: error: /dev/sda1: Check tree block failed, want=4214784, have=0
Check tree block failed, want=4214784, have=0
Check tree block failed, want=4214784, have=0
read block failed check_tree_block
Couldn't read tree root
Critical roots corrupted, unable to fsck the FS
libguestfs: recv_from_daemon: 304 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 4c | 00 00 00 01 | 00 12 34 91 | ...
libguestfs: trace: btrfs_fsck = -1 (error)
btrfs_fsck: /dev/sda1: Check tree block failed, want=4214784, have=0
Check tree block failed, want=4214784, have=0
Check tree block failed, want=4214784, have=0
read block failed check_tree_block
Couldn't read tree root
Critical roots corrupted, unable to fsck the FS
(full log:
http://kojipkgs.fedoraproject.org//work/tasks/8291/4568291/build.log )

Version-Release number of selected component (if applicable):

btrfs-progs-0.19.20120817git043a639-1.fc19.i686
kernel-3.7.0-0.rc0.git2.4.fc19.i686
(Also seen on x86_64, and on latest F18)

How reproducible:

Unknown.  Happened 3 times.  Didn't happen a week ago.

Steps to Reproduce:
1. Run libguestfs tests on F18+
Comment 1 Richard W.M. Jones 2012-10-08 05:45:45 EDT
Here is another one (i686 this time .. does seem to be more
prevalent on 32 bit, but I have seen it one time on 64 bit):

btrfsck /dev/sda1
Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
read block failed check_tree_block
Couldn't read chunk root
guestfsd: error: /dev/sda1: Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
read block failed check_tree_block
Couldn't read chunk root
libguestfs: recv_from_daemon: 256 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 4c | 00 00 00 01 | 00 12 34 91 | ...
libguestfs: trace: btrfs_fsck = -1 (error)
btrfs_fsck: /dev/sda1: Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
read block failed check_tree_block
Couldn't read chunk root

http://kojipkgs.fedoraproject.org//work/tasks/261/4570261/build.log
Comment 2 Richard W.M. Jones 2012-10-08 07:07:47 EDT
Here are another couple (on x86-64):

mkfs.btrfs /dev/sda1
libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 a1 | ...
libguestfs: trace: mkfs_btrfs = 0
libguestfs: trace: mount "/dev/sda1" "/"
libguestfs: send_to_daemon: 68 bytes: 00 00 00 40 | 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 00 | ...
guestfsd: main_loop: proc 317 (mkfs_btrfs) took 0.33 seconds
guestfsd: main_loop: new request, len 0x40
mount -o  /dev/sda1 /sysroot/
[   93.385809] device fsid 5137f603-6b78-46f6-9d95-e85bf55202c2 devid 1 transid 3 /dev/sda1
[   93.386822] device fsid 5137f603-6b78-46f6-9d95-e85bf55202c2 devid 1 transid 3 /dev/sda1
[   93.408020] btrfs: disk space caching is enabled
[   93.459089] btrfs: failed to recover relocation
[   93.477535] btrfs: open_ctree failed
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 a2 | ...
libguestfs: trace: mount = -1 (error)
mount: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so

mkfs.btrfs /dev/sda1
libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 a9 | ...
libguestfs: trace: mkfs_btrfs = 0
libguestfs: trace: mount "/dev/sda1" "/"
libguestfs: send_to_daemon: 68 bytes: 00 00 00 40 | 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 00 | ...
guestfsd: main_loop: proc 317 (mkfs_btrfs) took 0.35 seconds
guestfsd: main_loop: new request, len 0x40
mount -o  /dev/sda1 /sysroot/
[   96.384211] device fsid 962db3c0-4153-450b-9ca7-c9216e81afe3 devid 1 transid 3 /dev/sda1
[   96.385314] device fsid 962db3c0-4153-450b-9ca7-c9216e81afe3 devid 1 transid 3 /dev/sda1
[   96.394158] btrfs: disk space caching is enabled
[   96.428656] btrfs: failed to recover relocation
[   96.437190] btrfs: open_ctree failed
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 aa | ...
libguestfs: trace: mount = -1 (error)
mount: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so

http://kojipkgs.fedoraproject.org//work/tasks/354/4570354/build.log

The common factor seems to be that the filesystem is created
(mkfs.btrfs /dev/sda1) and then it is immediately used -- eg.
mounted or some btrfs subtool is run on it.  There is no pause
or sync between the operations.  In libguestfs we have been
testing this for a long time, and this change has only just
happened, so it must be some change in the kernel or btrfs tools.
Comment 3 Richard W.M. Jones 2012-10-08 17:27:14 EDT
Raised upstream:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/20257
Comment 4 Richard W.M. Jones 2012-10-10 15:40:25 EDT
Chris Mason on the upstream mailing list suggested adding
this patch to btrfs-progs:

commit 6eba9002956ac40db87d42fb653a0524dc568810
Author: Goffredo Baroncelli <kreijack@inwind.it>
Date:   Tue Sep 4 19:59:26 2012 +0200

    Correct un-initialized fsid variable

I did that and it did appear to fix the problem.  I'm
still running tests to be sure about this.

(Instead of cherry picking that patch, we could instead
move to a later version of btrfs-progs, since we haven't
updated in about 2 months).
Comment 5 Fedora Update System 2012-10-10 16:24:46 EDT
btrfs-progs-0.19.20120817git043a639-2.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/btrfs-progs-0.19.20120817git043a639-2.fc18
Comment 6 Fedora Update System 2012-10-10 22:57:55 EDT
Package btrfs-progs-0.19.20120817git043a639-2.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing btrfs-progs-0.19.20120817git043a639-2.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-15895/btrfs-progs-0.19.20120817git043a639-2.fc18
then log in and leave karma (feedback).
Comment 7 Richard W.M. Jones 2012-10-11 03:29:47 EDT
(In reply to comment #4)
> Chris Mason on the upstream mailing list suggested adding
> this patch to btrfs-progs:
> 
> commit 6eba9002956ac40db87d42fb653a0524dc568810
> Author: Goffredo Baroncelli <kreijack@inwind.it>
> Date:   Tue Sep 4 19:59:26 2012 +0200
> 
>     Correct un-initialized fsid variable
> 
> I did that and it did appear to fix the problem.  I'm
> still running tests to be sure about this.

Unfortunately although this patch has reduced the
occurrence of the bug, I saw it happen again last night
even with the patched btrfs-progs.

modprobe btrfs
[   15.716610] Btrfs loaded
grep ^[[:space:]]*btrfs$ /proc/filesystems
mkfs.btrfs /dev/sda1 /dev/sdb1

[   16.656467] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 1 /dev/sda1
[   16.657467] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 1 /dev/sda1
[   17.227381] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 2 transid 3 /dev/sdb1
fsync /dev/sda
fsync /dev/sdb
fsync /dev/sdc
fsync /dev/sdd
libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 04 | ...
libguestfs: trace: mkfs_btrfs = 0
libguestfs: trace: mount "/dev/sda1" "/"
libguestfs: send_to_daemon: 68 bytes: 00 00 00 40 | 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 00 | ...
guestfsd: main_loop: proc 317 (mkfs_btrfs) took 2.22 seconds
guestfsd: main_loop: new request, len 0x40
mount -o  /dev/sda1 /sysroot/
[   17.512337] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1
[   17.758300] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 2 transid 4 /dev/sdb1
[   17.857285] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1
[   17.893279] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1
[   17.909277] btrfs: disk space caching is enabled
[   17.943272] btrfs bad tree block start 0 135168
[   17.955270] btrfs: open_ctree failed
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 05 | ...
libguestfs: trace: mount = -1 (error)

http://kojipkgs.fedoraproject.org//work/tasks/9775/4579775/build.log
Comment 8 Chris Murphy 2012-10-15 05:57:49 EDT
Is Bug 864765 related to this one? I'm not actually finding the SIGABRT in any of the logs, but maybe I've overlooked something. The file system is clearly created and valid without errors, but anaconda is crashing before subvols are created.
Comment 9 Richard W.M. Jones 2012-10-15 06:26:25 EDT
That other bug looks like a different bug ...
Comment 10 Richard W.M. Jones 2012-10-29 10:51:38 EDT
Still happens.  No resolution upstream yet.
Comment 11 Alexandre Oliva 2013-01-19 21:12:24 EST
Richard, was the device in which the btrfs filesystem is created used recently by another btrfs mount before since the last boot?  I've seen recent versions of btrfs keep the device active, possibly still pending some I/O, even after umount of all mount-points of a filesystem complete successfully, and I'm guessing this might explain corruption should information from the old filesystem make to disk and overwrite info from the newly-created one.
Comment 12 Richard W.M. Jones 2013-01-20 05:26:21 EST
(In reply to comment #11)
> Richard, was the device in which the btrfs filesystem is created used
> recently by another btrfs mount before since the last boot?  I've seen
> recent versions of btrfs keep the device active, possibly still pending some
> I/O, even after umount of all mount-points of a filesystem complete
> successfully, and I'm guessing this might explain corruption should
> information from the old filesystem make to disk and overwrite info from the
> newly-created one.

No.  I can get this by creating a new VM from scratch with
a sparse raw disk, running mkfs.btrfs immediately followed by mount.

I talked with upstream devs about this, and it is apparently
something to do with the writes from userspace (ie. mkfs)
not being seen/read by the mount (kernel space).  I have to admit
I did not fully understand the explanation.
Comment 13 Richard W.M. Jones 2013-02-04 12:22:26 EST
I thought I would update people on this bug with the current
Rawhide kernel.

There is still a bug, although it's a little different now
from the original report.  Because it seem different, I have
cloned this bug (to preserve the CC list) as bug 907554.

I would say that the current bug appears to be fixed, but
bug 907554 looks rather similar to it.
Comment 14 Richard W.M. Jones 2013-02-04 18:10:34 EST
Actually I'm going to reopen this because it occurs on Fedora 18
still (although it's very rare).

kernel-3.7.2-204.fc18.x86_64

Captured a few minutes ago:

mount -o  /dev/sda1 /sysroot/
[   22.833046] device fsid 3f86f060-d5ef-4b25-b37a-f076f2956038 devid 1 transid 
1 /dev/sda1
[   22.834326] device fsid 3f86f060-d5ef-4b25-b37a-f076f2956038 devid 1 transid 
1 /dev/sda1
[   22.835959] btrfs: disk space caching is enabled
[   22.841717] btrfs: failed to recover relocation
[   22.842300] btrfs: open_ctree failed
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option,
 bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
Comment 15 Richard W.M. Jones 2013-02-05 15:00:58 EST
Another example:

mount -o  /dev/sda1 /sysroot/
[   75.882682] device fsid c79ef5dc-239a-49cb-b1fd-fd459ca912bd devid 1 transid 4 /dev/sda1
[   76.014647] device fsid c79ef5dc-239a-49cb-b1fd-fd459ca912bd devid 1 transid 4 /dev/sda1
[   76.255755] btrfs: disk space caching is enabled
[   76.308880] device fsid c79ef5dc-239a-49cb-b1fd-fd459ca912bd devid 2 transid 4 /dev/sdb1
[   76.395366] btrfs bad tree block start 0 135168
[   76.477635] btrfs: open_ctree failed
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 05 | ...
libguestfs: trace: mount = -1 (error)
*stdin*:14: libguestfs: error: mount: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so

Full log:
http://koji.fedoraproject.org/koji/getfile?taskID=4930070&name=build.log

Although this error is much rarer than earlier, it's
still happening.
Comment 16 Richard W.M. Jones 2013-02-13 08:13:52 EST
After this discussion upstream:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/23249

I added this patch to the Rawhide btrfs-progs package:
https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff;h=8fe354744cd7b5c4f7a3314dcdbb5095192a032f

This fixes the problem on Rawhide (but NOT on Fedora 18).
Comment 17 Richard W.M. Jones 2013-02-13 08:22:16 EST
*** Bug 816304 has been marked as a duplicate of this bug. ***
Comment 18 Nathaniel McCallum 2013-07-01 13:14:09 EDT
According to the thread Richard posted above, the actual fix for this is here: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6f60cbd3ae442cb35861bb522f388db123d42ec1

Has this been resolved?
Comment 19 Josh Boyer 2013-07-01 13:41:48 EDT
(In reply to Nathaniel McCallum from comment #18)
> According to the thread Richard posted above, the actual fix for this is
> here:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=6f60cbd3ae442cb35861bb522f388db123d42ec1
> 
> Has this been resolved?

Should be.  That went into 3.9 and F18 is well past that.  However, there were different btrfs-progs commits too.
Comment 20 Richard W.M. Jones 2013-07-02 05:13:55 EDT
(In reply to Nathaniel McCallum from comment #18)
> According to the thread Richard posted above, the actual fix for this is
> here:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=6f60cbd3ae442cb35861bb522f388db123d42ec1
> 
> Has this been resolved?

This looks a lot more plausible as a fix.

However I'll need to run another libguestfs build with
all the btrfs tests reenabled to really verify it.
Comment 21 Richard W.M. Jones 2013-07-03 10:24:00 EDT
Yup, confirmed this is fixed.

Note You need to log in before you can comment on or make changes to this bug.