Bug 1294531 - btrfs device delete does not, hangs
btrfs device delete does not, hangs
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: btrfs-progs (Show other bugs)
7.2
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Ric Wheeler
Filesystem QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-28 15:18 EST by Konstantin Olchanski
Modified: 2017-12-01 21:14 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Konstantin Olchanski 2015-12-28 15:18:29 EST
As part of btrfs evaluation test, I am trying to remove a defective disk from a btrfs filesystem using the command "btrfs dev delete /dev/sde1 /".

I am pretty sure this worked under el7.0 or el7.1, but with the el7.2, this command has been grinding away for 20 hours now without any sign of progress.

From btrfs documentation it is not clear how to tell if it is making progress or if it is completely stuck or somewhere in between.

I do see some I/O happening between disks, but without any clear pattern (like 100% busy moving data or 100% busy trying to read the disk-to-be-removed). Actually I am pretty sure I do not see any write activity at all.

I also do see a severe slowdown of the machine, ssh root@daq11 takes several minutes instead of a few seconds.

This is not the expected behaviour, BTW, with this btrfs filesystem being fully RAID1, I expected the bad disk to be released immediately (it does not contain any unique data), followed by a rebalance to re-raid1 (re-duplicate) the data. (Ok, rebalance first, release the bad disk second).

(If instead, btrfs is trying to rebalance the data by reading it from the bad disk - there is a reason I am removing it - it is defective - with growing bad sectors, and severe SMART warnings - then the btrfs developers should be fired - clearly they did not consider the most obvious use cases).

(To remember, I am evaluating btrfs as replacement for RAID1+ext4 for using in high-availability systems - main requirement is uninterrupted operation if any 1 disk completely fails)

Some configuration details (disk /dev/sde is the one being removed):

BTW, I expected the "GiB Used" counters to change as device removal and implied rebalance is makes progress, but I do not see any numbers change at all.

[root@daq11 ~]# btrfs fi df /
Data, RAID1: total=656.00GiB, used=643.80GiB
System, RAID1: total=32.00MiB, used=160.00KiB
Metadata, RAID1: total=60.00GiB, used=51.16GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[root@daq11 ~]# 

[root@daq11 ~]# btrfs fi show
Label: 'centos_daq11'  uuid: 8ef30d1e-8671-4f99-9032-3fb1ca9ccf99
	Total devices 6 FS bytes used 694.96GiB
	devid    1 size 1.75TiB used 263.00GiB path /dev/sda3
	devid    2 size 1.75TiB used 264.00GiB path /dev/sdb3
	devid    5 size 0.00B used 318.03GiB path /dev/sde1
	devid    6 size 1.82TiB used 318.03GiB path /dev/sdf1
	devid    8 size 1.75TiB used 263.00GiB path /dev/sdd3
	devid    9 size 1.75TiB used 6.00GiB path /dev/sdc3

btrfs-progs v3.19.1

[root@daq11 ~]# btrfs dev usage /
/dev/sda3, ID: 1
   Device size:             1.75TiB
   Data,RAID1:            236.00GiB
   Metadata,RAID1:         27.00GiB
   Unallocated:             1.49TiB

/dev/sdb3, ID: 2
   Device size:             1.75TiB
   Data,RAID1:            241.00GiB
   Metadata,RAID1:         23.00GiB
   Unallocated:             1.49TiB

/dev/sdc3, ID: 9
   Device size:             1.75TiB
   Data,RAID1:              6.00GiB
   Unallocated:             1.74TiB

/dev/sdd3, ID: 8
   Device size:             1.75TiB
   Data,RAID1:            247.00GiB
   Metadata,RAID1:         16.00GiB
   Unallocated:             1.49TiB

/dev/sde1, ID: 5
   Device size:             1.82TiB
   Data,RAID1:            290.00GiB
   Metadata,RAID1:         28.00GiB
   System,RAID1:           32.00MiB
   Unallocated:            16.00EiB

/dev/sdf1, ID: 6
   Device size:             1.82TiB
   Data,RAID1:            292.00GiB
   Metadata,RAID1:         26.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.51TiB

[root@daq11 ~]# 


[root@daq11 ~]# rpm -q btrfs-progs
btrfs-progs-3.19.1-1.el7.x86_64
[root@daq11 ~]# uname -a
Linux daq11.triumf.ca 3.10.0-327.3.1.el7.x86_64 #1 SMP Wed Dec 9 14:09:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@daq11 ~]# 

K.O.
Comment 2 Chris Murphy 2015-12-31 15:42:22 EST
Btrfs doesn't yet have device 'faulty' state like md/mdadm, even upstream. It will try to read/write to defective devices indefinitely, and maybe the resulting flood of retries is what's slowing things down.
https://btrfs.wiki.kernel.org/index.php/Project_ideas#Take_device_with_heavy_IO_errors_offline_or_mark_as_.22unreliable.22

'dev delete <dev>' does not consider the specified device actually deleted (or ignorable) until all of its data is replicated on other devices, i.e. a 3rd copy must be created before sde1 is considered no longer necessary and the device is released.

Instead, physically remove the device, or issue 'echo 1 > /sys/block/device-name/device/delete' and then use 'btrfs dev delete missing' to initiate the replication of missing data that was on the bad device to remaining devices.

Alternatively, when replacing the bad device, it's better to use 'btrfs replace' either with -r option (mostly ignore the bad device unless needed), or physically remove or sysfs delete it first.
Comment 3 Konstantin Olchanski 2016-01-01 17:42:06 EST
I do not think your instructions will work.

a) If I physically remove the disk, it will not become "missing" in btrfs, instead the syslog will fill with disk errors.
b) If I "echo 1 > /sys/block/.../delete", I think the same thing will happen.

I suspect the only way to mark a disk as missing (permitting "btrfs dev delete missing") is through a reboot, but as we already know, RHEL7.2 will not boot from a degraded btrfs filesystem.

A catch22 if there is one.

K.O.

P.S. I see all this as a very bad sign. Obviously, btrfs authors failed to think though  the most simple failure scenario (a dead disk). Makes one wonder what other failure modes they ignored or dismissed as "an exercise for the user" (as in "restore from backup and start from scratch" - I do read the btrfs mailing lists).

P.P.S. As for my machine with the stuck "btrfs dev delete", after 4 days of "maybe it just takes a very very very long time, let's wait", the machine died (no ping).

K.O.
Comment 4 Chris Murphy 2016-01-02 16:21:45 EST
(In reply to Konstantin Olchanski from comment #3)
> a) If I physically remove the disk, it will not become "missing" in btrfs,
> instead the syslog will fill with disk errors.

> b) If I "echo 1 > /sys/block/.../delete", I think the same thing will happen.

Every time I've tried either of these, 'btrfs fi show' has always immediately displayed the missing device as missing.

> I suspect the only way to mark a disk as missing (permitting "btrfs dev
> delete missing") is through a reboot, but as we already know, RHEL7.2 will
> not boot from a degraded btrfs filesystem.

Try it first? I've done this a bunch of times and in the normal case it does work. When it doesn't work it's because something else is wrong, and thus an edge case, and requires supplying a lot of state information because "it doesn't work" is just totally not revealing.
Comment 5 Konstantin Olchanski 2016-01-02 19:32:07 EST
I only tried to simulate disk failure by disconnecting the disk under el7.0, did not try with el7.1 and el7.2. I am pretty sure I did not see the disconnected disk go "missing" then.

I will try again with el7.2 early next week when I can physically access the machine.

BTW, what you say is inconsistent with the BTRFS documentation:
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices
says:

"btrfs device delete missing tells btrfs to remove the first device that is described by the filesystem metadata but not present when the FS was mounted."

Which I read as: to use "btrfs dev delete missing", the btrfs fileystem has to be unmounted, then remounted in degraded mode. For the "/" filesystem it means the machine has to be rebooted.

A search for "missing" in the btrfs wiki (https://btrfs.wiki.kernel.org/index.php?title=Special%3ASearch&search=missing&go=Go) does not show any additional information on what "missing" means, does and how one gets there.

K.O.
Comment 6 Konstantin Olchanski 2016-01-04 20:27:22 EST
We are both right. Physical disconnect of the disk does make "btrfs fi show /" report "some devices missing" (as Chris M. says) and all other commands still see this disconnected device and btrfs still tries to write to it. (as I remember).

Now from this state, I confirm that "delete missing" does not work:

[root@daq11 ~]# btrfs dev delete missing /
ERROR: error removing the device 'missing' - no missing devices found to remove
[root@daq11 ~]# 
[root@daq11 ~]# btrfs dev delete /dev/sde1 /
ERROR: error removing the device '/dev/sde1' - No such file or directory
[root@daq11 ~]# 

Here is additional information:

[root@daq11 ~]# btrfs fi show /
Label: 'centos_daq11'  uuid: 8ef30d1e-8671-4f99-9032-3fb1ca9ccf99
        Total devices 6 FS bytes used 699.64GiB
        devid    1 size 1.75TiB used 263.00GiB path /dev/sda3
        devid    2 size 1.75TiB used 264.00GiB path /dev/sdb3
        devid    6 size 1.82TiB used 310.03GiB path /dev/sdf1
        devid    8 size 1.75TiB used 263.00GiB path /dev/sdd3
        devid    9 size 1.75TiB used 26.00GiB path /dev/sdc3
        *** Some devices missing

btrfs-progs v3.19.1
[root@daq11 ~]# 

[root@daq11 ~]# btrfs dev usage /
/dev/sda3, ID: 1
   Device size:             1.75TiB
   Data,RAID1:            236.00GiB
   Metadata,RAID1:         27.00GiB
   Unallocated:             1.49TiB

/dev/sdb3, ID: 2
   Device size:             1.75TiB
   Data,RAID1:            241.00GiB
   Metadata,RAID1:         23.00GiB
   Unallocated:             1.49TiB

/dev/sdc3, ID: 9
   Device size:             1.75TiB
   Data,RAID1:             26.00GiB
   Unallocated:             1.72TiB

/dev/sdd3, ID: 8
   Device size:             1.75TiB
   Data,RAID1:            247.00GiB
   Metadata,RAID1:         16.00GiB
   Unallocated:             1.49TiB

/dev/sde1, ID: 5
   Device size:               0.00B
   Data,RAID1:            276.00GiB
   Metadata,RAID1:         28.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.52TiB

/dev/sdf1, ID: 6
   Device size:             1.82TiB
   Data,RAID1:            284.00GiB
   Metadata,RAID1:         26.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.52TiB

[root@daq11 ~]#
Comment 7 Konstantin Olchanski 2016-01-05 18:37:27 EST
With help from Chris M., my catch-22 is resolved:

a) disconnect disk that will be removed from btrfs
b) reboot with "rd.shell" and "rd.break=pre-init" (I type them in the grub editor from the grub menu)
c) get the "emergency shell" (looks like right before the infinite wait for btrfs uuid)
d) # mount -o degraded /dev/sdb3 /sysroot
e) btrfs dev delete missing /sysroot
f) watch progress of btrfs data balancer, will take some time.

Would be nice if normal "btrfs dev delete" were fixed some day.

K.O.
Comment 8 Konstantin Olchanski 2016-01-05 18:38:32 EST
Made a typo in previous message: "rd.break=pre-mount", not "pre-init". K.O.
Comment 9 Konstantin Olchanski 2016-01-29 20:37:57 EST
Additional information. Back on January 5th, I booted the machine in single-user mode and it was running "btrfs delete missing /" ever since.

Today "btrfs delete" finally completed - around 300 GB of data rearranged in 20 days - this must a speed record of sorts - 15 GB per day, 0.2 Mbytes/sec.

With btrfs no longer degraded, rebooted the machine in multi-user mode (degraded btrfs will not boot, remember?), rebooted latest kernel.

[root@daq11 ~]# uname -a
Linux daq11.triumf.ca 3.10.0-327.4.5.el7.x86_64 #1 SMP Mon Jan 25 22:07:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Now running "btrfs dev delete" to liberate one more disk, expect an update in 20 days.

Impressive!
K.O.
Comment 10 Konstantin Olchanski 2016-05-18 15:03:11 EDT
For the record. "btrfs dev delete" never completed. After 1 month (I am patient), I ended up reinstalling the OS (to move "/" from btrfs on 6xHDD to xfs on SSD) and erasing the btrfs disks (complete data loss, if this were actual data).

As summary, btrfs in el7.2 is useless junk. (and I do not care if it works oh so well on the ssd on your laptop).

K.O.
Comment 11 Konstantin Olchanski 2016-05-18 15:05:22 EDT
ok to close this bug, I do not see how I can close it myself. K.O.
Comment 12 Konstantin Olchanski 2016-06-30 17:27:39 EDT
My btrfs evaluation is complete, btrfs in el7.2 is unusable, will be using zfs instead. K.O.
Comment 13 Konstantin Olchanski 2017-03-06 12:25:39 EST
close this bug already. nobody but bots left at red hat? K.O.

Note You need to log in before you can comment on or make changes to this bug.