Bug 1855174 - btrfs installation crashes with multiple disks, when one of them is too small for the OS
Summary: btrfs installation crashes with multiple disks, when one of them is too small...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: btrfs-progs
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Neal Gompa
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: BetaBlocker, F33BetaBlocker 1851166
TreeView+ depends on / blocked
 
Reported: 2020-07-09 07:09 UTC by Kamil Páral
Modified: 2020-07-25 01:52 UTC (History)
15 users (show)

Fixed In Version: btrfs-progs-5.7-3.fc33 btrfs-progs-5.7-4.fc31 btrfs-progs-5.7-4.fc32
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-25 01:07:27 UTC
Type: Bug


Attachments (Terms of Use)
error screenshot (67.67 KB, image/png)
2020-07-09 07:13 UTC, Kamil Páral
no flags Details
lsblk.out (631 bytes, text/plain)
2020-07-09 07:13 UTC, Kamil Páral
no flags Details
df-h.out (710 bytes, text/plain)
2020-07-09 07:13 UTC, Kamil Páral
no flags Details
findmnt.out (5.27 KB, text/plain)
2020-07-09 07:13 UTC, Kamil Páral
no flags Details
anaconda.log (18.50 KB, text/plain)
2020-07-09 07:13 UTC, Kamil Páral
no flags Details
packaging.log (54 bytes, text/plain)
2020-07-09 07:13 UTC, Kamil Páral
no flags Details
program.log (9.93 KB, text/plain)
2020-07-09 07:13 UTC, Kamil Páral
no flags Details
storage.log (196.18 KB, text/plain)
2020-07-09 07:13 UTC, Kamil Páral
no flags Details


Links
System ID Priority Status Summary Last Updated
Github kdave btrfs-progs issues 269 None open 'btrfs fi us' misreports freespace for raid0 with dissimilar sized devices 2020-07-20 16:26:23 UTC
Github kdave btrfs-progs issues 270 None open mkfs.btrfs shouldn't always use data profile 'raid0' by default with multiple devices 2020-07-20 16:26:23 UTC

Description Kamil Páral 2020-07-09 07:09:05 UTC
Description of problem:
As part of the btrfs test day https://fedoraproject.org/wiki/Test_Day:2020-07-08_Btrfs_default , I followed
https://fedoraproject.org/wiki/QA:Testcase_partitioning_guided_multi_empty_all

I had two disks in my VM, a 15GB disk and a 2GB disk, both empty (MBR). I selected both in guided partitioning, it allowed me to continue just fine. But the installation crashed in the middle with "rsync exited with code 11". After inspection, it seems that anaconda chose the 2GB disk as the system root, home and boot, and the 15GB disk just for swap. (The default partition layout was btrfs, because of the testday). There is something wrong with disk selection logic in anaconda.

Version-Release number of selected component (if applicable):
anaconda-33.20-1.1.btrfs.fc33.x86_64

How reproducible:
tried once, but probably always

Steps to Reproduce:
1. use the image from https://fedoraproject.org/wiki/Test_Day:2020-07-08_Btrfs_default
2. have an empty 15GB and 2GB disk in your VM. It seems the system must see the 2GB disk as vda (the first disk).
3. use the guided partitioning to select both disks, confirm and start install
4. crash during installation

Comment 1 Kamil Páral 2020-07-09 07:13:16 UTC
Created attachment 1700395 [details]
error screenshot

Comment 2 Kamil Páral 2020-07-09 07:13:26 UTC
Created attachment 1700396 [details]
lsblk.out

Comment 3 Kamil Páral 2020-07-09 07:13:30 UTC
Created attachment 1700397 [details]
df-h.out

Comment 4 Kamil Páral 2020-07-09 07:13:34 UTC
Created attachment 1700398 [details]
findmnt.out

Comment 5 Kamil Páral 2020-07-09 07:13:38 UTC
Created attachment 1700399 [details]
anaconda.log

Comment 6 Kamil Páral 2020-07-09 07:13:42 UTC
Created attachment 1700400 [details]
packaging.log

Comment 7 Kamil Páral 2020-07-09 07:13:47 UTC
Created attachment 1700401 [details]
program.log

Comment 8 Kamil Páral 2020-07-09 07:13:52 UTC
Created attachment 1700402 [details]
storage.log

Comment 9 Kamil Páral 2020-07-09 07:18:55 UTC
Note: The testcase is probably not ready for the new btrfs installation scheme. I'm not sure what should happen when you select multiple drives with btrfs? Should we set up btrfs raid? Should we just use the largest/fastest drive and leave the other ones untouched? Currently, the extra drive can be used for swap, but swap is going away from the default installation anyway. And it doesn't make sense to place /boot on one drive and everything else on the other. So I think this needs more discussion. But either way, anaconda should not select a way too small drive for / (instead it should inform the user or make a smart selection or both), and it should not crash during installation.

Comment 10 Kamil Páral 2020-07-09 07:48:44 UTC
Actually, I just repeated the steps with two 15GB disks, and it seems anaconda sets up btrfs RAID automatically. The installed system has:

$ sudo btrfs filesystem show
Label: 'fedora_localhost-live'  uuid: 3b71dcda-6f80-41a0-b81a-adaf0c97435a
	Total devices 2 FS bytes used 5.71GiB
	devid    1 size 11.05GiB used 3.26GiB path /dev/vda2
	devid    2 size 15.00GiB used 3.26GiB path /dev/vdb1

$ sudo btrfs filesystem df /
Data, RAID0: total=6.00GiB, used=5.65GiB
System, RAID1: total=8.00MiB, used=16.00KiB
Metadata, RAID1: total=256.00MiB, used=169.45MiB
GlobalReserve, single: total=14.05MiB, used=0.00B

Note again, that this is not the system from comment 0, but a different one with 2x15GB disks.

So, the btrfs raid support is included in anaconda. But something fails hard when one of the disks (it might need to be the first one) is way too small.

Comment 11 Kamil Páral 2020-07-09 07:59:58 UTC
If btrfs is accepted by default, this will violate https://fedoraproject.org/wiki/Fedora_33_Beta_Release_Criteria#guided-partitioning . If btrfs is not by default, we might find a different criterion (and see if I can reproduce the same problem using custom partitioning).

Comment 12 Davide Cavalca 2020-07-09 15:43:28 UTC
> $ sudo btrfs filesystem df /
> Data, RAID0: total=6.00GiB, used=5.65GiB
> System, RAID1: total=8.00MiB, used=16.00KiB
> Metadata, RAID1: total=256.00MiB, used=169.45MiB
> GlobalReserve, single: total=14.05MiB, used=0.00B

This doesn't look right. Did you specifically ask Anaconda for this configuration? We should either setup RAID1 or RAID0, but doing RAID1 for metadata and RAID0 for data on two disks isn't something most people will want.

Comment 13 Kamil Páral 2020-07-09 16:59:08 UTC
No, I didn't ask specifically for this configuration. I used the guided partitioning only, where I selected both disks and clicked Done, that's it.

Comment 14 Chris Murphy 2020-07-09 19:41:40 UTC
(I thought I clicked send, hopefully I didn't add this to some other bug!)

Discussed on devel@:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/MGYQSJVHLSILKTLXOKYD3W2RDQLRL3DO/

Talked to Josef about it and mkfs.btrfs might need to get smarter. It's OK to use raid0 for unlike sized disks on btrfs, but it's not intuitive. So the idea is, if they're like sized (I don't know what threshold maybe 10-15%) then mkfs can default to data raid0. Otherwise data single. And it's reasonable to use raid1 for metadata by default.

But why the installer crashed, I'm not sure.

Comment 15 Kamil Páral 2020-07-10 06:51:41 UTC
(In reply to Chris Murphy from comment #14)
> But why the installer crashed, I'm not sure.

Looking at attachment 1700397 [details] and attachment 1700401 [details] , I see /dev/vda2 to be 100% full and the log contains:
08:52:53,121 INF program: rsync: write failed on "/mnt/sysroot/usr/lib64/libgdata.so.22.5.1": No space left on device (28)
Also, rsync exit code 11 is "Error in file I/O". So, out of space.

I'm quite confused by the "df -h" output, though. /dev/vda is a 2GB disk, but it shows up as 15+1GB (vda2+vda1). And for vda2, it says "15G  1.6G  488K 100%", which doesn't make sense at all. It seems that df is very confused when reporting sizes for btrfs.

Comment 16 Chris Murphy 2020-07-10 07:28:46 UTC
df output looks like I expect. The btrfs raid0 volume is comprised of two /dev nodes, but df can only show one. But shows the values for the two device volume. Hence confusion.

The central problem here is raid0 needs to stripe across two devices so as soon as the small device is full, there's not enough space. This isn't reported by the kernel though. Just by rsync. The solution is either (a) automatic partitioning should use 'mkfs.btrfs -d single -m raid1' or mkfs needs to get smarter

Comment 17 Chris Murphy 2020-07-11 02:01:00 UTC
'btrfs fi us' misreports freespace for raid0 with dissimilar sized devices #269 
https://github.com/kdave/btrfs-progs/issues/269

mkfs.btrfs shouldn't always use data profile 'raid0' by default with multiple devices #270 
https://github.com/kdave/btrfs-progs/issues/270

Comment 18 Chris Murphy 2020-07-14 19:29:51 UTC
Strictly speaking I think rsync errors out and the installer keeps running? If that's incorrect, switch the component back to anaconda because it shouldn't crash. The upstream btrfs-progs fix is going to be to use 'single' profile for data instead of 'raid0' - this will make Btrfs behave similar to lvm+ext4 in the same situations.

Comment 19 Geoffrey Marr 2020-07-20 18:56:15 UTC
Discussed during the 2020-07-20 blocker review meeting: [0]

The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following criterion:

"When using the guided partitioning flow, the installer must be able to: ... Complete an installation using any combination of disk configuration options it allows the user to select"

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-07-20/f33-blocker-review.2020-07-20-16.18.txt

Comment 20 Neal Gompa 2020-07-21 22:07:23 UTC
I've backported fixes from upstream and pushed it out as btrfs-progs-5.7-3.fc33: https://koji.fedoraproject.org/koji/buildinfo?buildID=1544664

Comment 21 Chris Murphy 2020-07-21 23:02:24 UTC
Tested install with Fedora-Workstation-Live-x86_64-Rawhide-20200719.n.0.iso and updated btrfs-progs-5.7-3.fc33 applied. The installation succeeds and reboots. The btrfs volume data profile is now 'single'.

I did further try a 1500M device instead of 2G, and this falls below Anaconda's threshold, and it refuses to install with:
Unable to allocate requested partition scheme.

Perhaps the minimum device size in Anaconda should be bumped above 2G. But at the moment this fix seems pretty good.

Comment 22 Fedora Update System 2020-07-24 13:32:25 UTC
FEDORA-2020-b719b4ebe3 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-b719b4ebe3

Comment 23 Fedora Update System 2020-07-24 13:32:28 UTC
FEDORA-2020-216f6116c0 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-216f6116c0

Comment 24 Fedora Update System 2020-07-25 01:07:27 UTC
FEDORA-2020-216f6116c0 has been pushed to the Fedora 31 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 25 Fedora Update System 2020-07-25 01:52:23 UTC
FEDORA-2020-b719b4ebe3 has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.