Bug 1013311

Summary: btrfs-progs: btrfs check segfaults when checking raid5 btrfs with one device missing
Product: Red Hat Enterprise Linux 7 Reporter: Eryu Guan <eguan>
Component: btrfs-progsAssignee: fs-maint
Status: CLOSED ERRATA QA Contact: XuWang <xuw>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: eguan, esandeen
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: btrfs-progs-3.14.2-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 13:13:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Eryu Guan 2013-09-29 08:09:49 UTC
Description of problem:
btrfsck got segfault when checking a btrfs with one device missing.

The original fs is metadata raid5 data raid1 btrfs with three devices, I losetup -d one of the devices and umount & degraded mount & do some I/O test & umount again. Then I got a btrfs with 2 out of 3 fs images.

[root@ibm-x3650m4-02-vm-06 btrfs-progs]# losetup -a
/dev/loop0: [64769]:203380888 (/root/0.img)
/dev/loop1: [64769]:203380889 (/root/2.img)
[root@ibm-x3650m4-02-vm-06 btrfs-progs]# btrfsck /dev/loop0
warning, device 2 is missing
warning devid 2 not found already
Check tree block failed, want=147914752, have=65536
Check tree block failed, want=147914752, have=65536
read block failed check_tree_block
Checking filesystem on /dev/loop0
UUID: fe7bdf36-9ff4-4169-ad34-5bab459b6950
checking extents
Segmentation fault

And dmesg shows
btrfsck[8555]: segfault at 1d3 ip 0000000000412a1a sp 00007fff6f5082b0 error 4 in btrfsck[400000+4c000]

Version-Release number of selected component (if applicable):
btrfs-progs-0.20.rc1.20130308git704a08c-1.el7
current upstream btrfs-progs has this issue too

How reproducible:
always

Steps to Reproduce:
1. download btrfs fs images from comment 1
2. setup loop devices
3. btrfsc one of the device

Actual results:
btrfsck segfaults

Expected results:
No segfault, I suppose to see no fs corruption too.

Additional info:
This is how I got the

Comment 2 Eryu Guan 2014-01-10 14:31:34 UTC
A simple way to reproduce

mkfs -t btrfs -f -K /dev/loop{0..3}
losetup -d /dev/loop2
btrfs check /dev/loop0

This patch could fix the segfault, I'll send it to upstream for review. After applying this patch btrfs check still reports failure though.

diff --git a/cmds-check.c b/cmds-check.c
index a65670e..375c563 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -5514,6 +5514,10 @@ again:
                                              btrfs_root_bytenr(&ri),
                                              btrfs_level_size(root,
                                               btrfs_root_level(&ri)), 0);
+                       if (!buf) {
+                               ret = -EIO;
+                               goto out;
+                       }
                        add_root_to_pending(buf, &extent_cache, &pending,
                                            &seen, &nodes, &found_key);
                        free_extent_buffer(buf);
diff --git a/disk-io.c b/disk-io.c
index 0af3898..b37f0bd 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -644,7 +644,10 @@ out:
        blocksize = btrfs_level_size(root, btrfs_root_level(&root->root_item));
        root->node = read_tree_block(root, btrfs_root_bytenr(&root->root_item),
                                     blocksize, generation);
-       BUG_ON(!root->node);
+       if (!root->node) {
+               free(root);
+               return ERR_PTR(-EIO);   
+       }
 insert:
        root->ref_cows = 1;
        return root;

Comment 3 Eryu Guan 2014-01-10 14:33:32 UTC
The mkfs command in comment 2 should be adding "-m raid5 -d raid5", only raid5 profile could trigger segfault.

Comment 4 Eryu Guan 2014-03-03 09:01:25 UTC
Still segfaults with btrfs-progs-3.12-4.el7

Comment 6 Eric Sandeen 2014-07-14 23:30:45 UTC
This should be fixed with:

commit b2e99e1819d967828edf149db5a203e59a40e379
Author: Eryu Guan <guaneryu>
Date:   Fri Jan 10 22:50:02 2014 +0800

    Btrfs-progs: check return value of read_tree_block() in check_chunks_and_extents()

in v3.14.2; I'll set as MODIFIED for now

Comment 8 XuWang 2014-12-08 07:35:34 UTC
reproduced in RHEL-7.0-20140507.0 as such following job:
https://beaker.engineering.redhat.com/jobs/821682
[  196.810884] btrfsck[11595]: segfault at e0 ip 0000000000415c64 sp 00007fff9fcbfe80 error 4 in btrfsck[400000+62000]

verified in RHEL-7.1-20141204.2 as such following job:
https://beaker.engineering.redhat.com/jobs/821684, no segment fault, but the btrfsck returns 1.(And it seems good, because one device in raid5 is missed)

Also run some regressions for btrfs-progs-3.16.2-1:
J:803063 	xfstests-btrfs: RHEL-7.1-20141113.0,s390x 
J:803061 	xfstests-btrfs: RHEL-7.1-20141113.0-ppc64
J:801031 	xfstests-btrfs: RHEL-7.1-20141111.0 
J:798051 	ltp-aiodio-btrfs: RHEL-LE-7.1-20141105.n.2
J:796795 	ltp-btrfs: RHEL-7.1-20141107.n.0, kernel-3.10.0-199.el7
Also I run some cases munually:
/kernel/filesystems/btrfs/degraded-mount-replace--panic for kernel reason
/kernel/filesystems/btrfs/profile-conversion--good
/kernel/filesystems/btrfs/online-resize--good
/kernel/filesystems/btrfs/regression--good
/kernel/filesystems/btrfs/mkfs--good
/kernel/filesystems/btrfs/mount--good
/kernel/filesystems/btrfs/clone--good
/kernel/filesystemd/btrfs/compress--good
/kernel/filesystem/btrfs/defragment--good
/kernel/filesystem/btrfs/online-device-add-delete-balance--good

So I think I can change this but status to verified.

Comment 10 errata-xmlrpc 2015-03-05 13:13:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0534.html