Description of problem: As part of the btrfs test day https://fedoraproject.org/wiki/Test_Day:2020-07-08_Btrfs_default , I followed https://fedoraproject.org/wiki/QA:Testcase_partitioning_guided_multi_empty_all I had two disks in my VM, a 15GB disk and a 2GB disk, both empty (MBR). I selected both in guided partitioning, it allowed me to continue just fine. But the installation crashed in the middle with "rsync exited with code 11". After inspection, it seems that anaconda chose the 2GB disk as the system root, home and boot, and the 15GB disk just for swap. (The default partition layout was btrfs, because of the testday). There is something wrong with disk selection logic in anaconda. Version-Release number of selected component (if applicable): anaconda-33.20-1.1.btrfs.fc33.x86_64 How reproducible: tried once, but probably always Steps to Reproduce: 1. use the image from https://fedoraproject.org/wiki/Test_Day:2020-07-08_Btrfs_default 2. have an empty 15GB and 2GB disk in your VM. It seems the system must see the 2GB disk as vda (the first disk). 3. use the guided partitioning to select both disks, confirm and start install 4. crash during installation
Created attachment 1700395 [details] error screenshot
Created attachment 1700396 [details] lsblk.out
Created attachment 1700397 [details] df-h.out
Created attachment 1700398 [details] findmnt.out
Created attachment 1700399 [details] anaconda.log
Created attachment 1700400 [details] packaging.log
Created attachment 1700401 [details] program.log
Created attachment 1700402 [details] storage.log
Note: The testcase is probably not ready for the new btrfs installation scheme. I'm not sure what should happen when you select multiple drives with btrfs? Should we set up btrfs raid? Should we just use the largest/fastest drive and leave the other ones untouched? Currently, the extra drive can be used for swap, but swap is going away from the default installation anyway. And it doesn't make sense to place /boot on one drive and everything else on the other. So I think this needs more discussion. But either way, anaconda should not select a way too small drive for / (instead it should inform the user or make a smart selection or both), and it should not crash during installation.
Actually, I just repeated the steps with two 15GB disks, and it seems anaconda sets up btrfs RAID automatically. The installed system has: $ sudo btrfs filesystem show Label: 'fedora_localhost-live' uuid: 3b71dcda-6f80-41a0-b81a-adaf0c97435a Total devices 2 FS bytes used 5.71GiB devid 1 size 11.05GiB used 3.26GiB path /dev/vda2 devid 2 size 15.00GiB used 3.26GiB path /dev/vdb1 $ sudo btrfs filesystem df / Data, RAID0: total=6.00GiB, used=5.65GiB System, RAID1: total=8.00MiB, used=16.00KiB Metadata, RAID1: total=256.00MiB, used=169.45MiB GlobalReserve, single: total=14.05MiB, used=0.00B Note again, that this is not the system from comment 0, but a different one with 2x15GB disks. So, the btrfs raid support is included in anaconda. But something fails hard when one of the disks (it might need to be the first one) is way too small.
If btrfs is accepted by default, this will violate https://fedoraproject.org/wiki/Fedora_33_Beta_Release_Criteria#guided-partitioning . If btrfs is not by default, we might find a different criterion (and see if I can reproduce the same problem using custom partitioning).
> $ sudo btrfs filesystem df / > Data, RAID0: total=6.00GiB, used=5.65GiB > System, RAID1: total=8.00MiB, used=16.00KiB > Metadata, RAID1: total=256.00MiB, used=169.45MiB > GlobalReserve, single: total=14.05MiB, used=0.00B This doesn't look right. Did you specifically ask Anaconda for this configuration? We should either setup RAID1 or RAID0, but doing RAID1 for metadata and RAID0 for data on two disks isn't something most people will want.
No, I didn't ask specifically for this configuration. I used the guided partitioning only, where I selected both disks and clicked Done, that's it.
(I thought I clicked send, hopefully I didn't add this to some other bug!) Discussed on devel@: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/MGYQSJVHLSILKTLXOKYD3W2RDQLRL3DO/ Talked to Josef about it and mkfs.btrfs might need to get smarter. It's OK to use raid0 for unlike sized disks on btrfs, but it's not intuitive. So the idea is, if they're like sized (I don't know what threshold maybe 10-15%) then mkfs can default to data raid0. Otherwise data single. And it's reasonable to use raid1 for metadata by default. But why the installer crashed, I'm not sure.
(In reply to Chris Murphy from comment #14) > But why the installer crashed, I'm not sure. Looking at attachment 1700397 [details] and attachment 1700401 [details] , I see /dev/vda2 to be 100% full and the log contains: 08:52:53,121 INF program: rsync: write failed on "/mnt/sysroot/usr/lib64/libgdata.so.22.5.1": No space left on device (28) Also, rsync exit code 11 is "Error in file I/O". So, out of space. I'm quite confused by the "df -h" output, though. /dev/vda is a 2GB disk, but it shows up as 15+1GB (vda2+vda1). And for vda2, it says "15G 1.6G 488K 100%", which doesn't make sense at all. It seems that df is very confused when reporting sizes for btrfs.
df output looks like I expect. The btrfs raid0 volume is comprised of two /dev nodes, but df can only show one. But shows the values for the two device volume. Hence confusion. The central problem here is raid0 needs to stripe across two devices so as soon as the small device is full, there's not enough space. This isn't reported by the kernel though. Just by rsync. The solution is either (a) automatic partitioning should use 'mkfs.btrfs -d single -m raid1' or mkfs needs to get smarter
'btrfs fi us' misreports freespace for raid0 with dissimilar sized devices #269 https://github.com/kdave/btrfs-progs/issues/269 mkfs.btrfs shouldn't always use data profile 'raid0' by default with multiple devices #270 https://github.com/kdave/btrfs-progs/issues/270
Strictly speaking I think rsync errors out and the installer keeps running? If that's incorrect, switch the component back to anaconda because it shouldn't crash. The upstream btrfs-progs fix is going to be to use 'single' profile for data instead of 'raid0' - this will make Btrfs behave similar to lvm+ext4 in the same situations.
Discussed during the 2020-07-20 blocker review meeting: [0] The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following criterion: "When using the guided partitioning flow, the installer must be able to: ... Complete an installation using any combination of disk configuration options it allows the user to select" [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-07-20/f33-blocker-review.2020-07-20-16.18.txt
I've backported fixes from upstream and pushed it out as btrfs-progs-5.7-3.fc33: https://koji.fedoraproject.org/koji/buildinfo?buildID=1544664
Tested install with Fedora-Workstation-Live-x86_64-Rawhide-20200719.n.0.iso and updated btrfs-progs-5.7-3.fc33 applied. The installation succeeds and reboots. The btrfs volume data profile is now 'single'. I did further try a 1500M device instead of 2G, and this falls below Anaconda's threshold, and it refuses to install with: Unable to allocate requested partition scheme. Perhaps the minimum device size in Anaconda should be bumped above 2G. But at the moment this fix seems pretty good.
FEDORA-2020-b719b4ebe3 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-b719b4ebe3
FEDORA-2020-216f6116c0 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-216f6116c0
FEDORA-2020-216f6116c0 has been pushed to the Fedora 31 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2020-b719b4ebe3 has been pushed to the Fedora 32 stable repository. If problem still persists, please make note of it in this bug report.