*** Description of problem: The "tests/md/test-md-and-lvm-devices.sh" script of the test suite kept failing for me; I looked into it. Unfortunately, it's a horrible complication for libguestfs (this is why this BZ is being filed as an RFE). The log file contains the following lines: > test-md-and-lvm-devices.sh: info: you can skip this test by setting SKIP_TEST_MD_AND_LVM_DEVICES_SH=1 > *stdin*:22: libguestfs: error: md_create: mdadm: md-sda1-lv0: mdadm: Defaulting to version 1.2 metadata > mdadm: RUN_ARRAY failed: Unknown error 524 > FAIL md/test-md-and-lvm-devices.sh (exit status: 1) A web search promptly leads to the following references: - https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ - https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ - https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 My understanding is the following. - In Linux 3.14, a regression was introduced, whereby the layout of a RAID0 array of disks of *different sizes* changed incompatibly. Assembling an old layout array with a new kernel, or a new layout array with an old kernel, would lead to data corruption. - There is no way to query the layout of an existent array. A read-only mis-assembly would not cause data corruption, but it cannot be reliably used to probe the existent layout. - To mitigate the mess, Linux 5.4 introduced a module parameter called "raid0.default_layout", where the user is *required* to specify the layout of the RAID0 arrays. If Linux detects a RAID0 array of disks of different sizes, it will not guess, but reject the assembly, unless the user explicitly sets the layout via the module parameter. - This is a module parameter, so it governs all RAID0 arrays. There is no way to specify different layouts for different arrays. Consequences for libguestfs: - If the guest array was created with a kernel >= 3.14, but the appliance uses a kernel < 3.14, the appliance will corrupt the array. - If the guest's RAID0 array of disks of different sizes was created with a kernel < 3.14, and the appliance uses a kernel in [3.14, 5.4), the appliance will corrupt the array. - Because of the above two points, libguestfs should reject starting with an appliance kernel < 5.4. - If the appliance uses a kernel >= 5.4, then the libguestfs application (or end-user, in case of guestfish) must be *forced* with some new parameter to set the RAID0 layout. This layout parameter then has to be passed to the appliance kernel when the appliance is launched. - The "test-md-and-lvm-devices.sh" test case needs to be updated. In the short term, it should be modified so that the block devices *directly constituting* any given array have precisely the same size -- this should activate the (conf->nr_strip_zones == 1) branch in the appliance kernel, and evade the problem. - In the long term, based on the above requirement that the appliance kernel be >= 5.4 (even for array creation), a new test case using different disk sizes for RAID0 should be added, and it should explicitly pass "raid0.default_layout=2" to the appliance kernel (using the new libguestfs / guestfish parameter). *** Version-Release number of selected component (if applicable): Upstream f47e0bb67254 ("appliance: reorder mounting of special filesystems in init", 2021-09-15). *** How reproducible: 100% *** Steps to Reproduce: 1. make check
I have a patch for the short term workaround, I'll post it shortly.
[Libguestfs] [PATCH 2/4] test-md-and-lvm-devices: work around RAID0 regression in Linux v3.14/v5.4 https://listman.redhat.com/archives/libguestfs/2021-September/msg00099.html Message-Id: <20210920052335.3358-3-lersek>