Bug 1931821
Summary: | work around mkfs.vfat writes corrupted filesystem/partition table when used on whole block device | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | YongkuiGuo <yoguo> | ||||||||
Component: | libguestfs | Assignee: | Laszlo Ersek <lersek> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | YongkuiGuo <yoguo> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 9.0 | CC: | lersek, rjones | ||||||||
Target Milestone: | pre-dev-freeze | Keywords: | Regression, Triaged | ||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | libguestfs-1.46.1-1.el9 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 1931866 (view as bug list) | Environment: | |||||||||
Last Closed: | 2022-05-17 12:28:37 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 2011711 | ||||||||||
Bug Blocks: | 910269, 1931866 | ||||||||||
Attachments: |
|
Description
YongkuiGuo
2021-02-23 10:45:19 UTC
It looks like the log file is truncated? It ends with: guestfsd: receive_file: reading length word guestfsd: receive_file: got chunk: cancel = 0x0, len = 8192, buf = 0x563e8b65a770 gu Created attachment 1758813 [details]
virt-make-fs new output
Sorry, it's my mistake. Attached the new entire output.
This bug is really mysterious. I cannot reproduce it in Fedora Rawhide, but it looks like it could be a kernel bug. Can you try the following commands: $ guestfish -N disk mkfs vfat /dev/sda label:TEST $ file test1.img $ virt-filesystems -a test1.img --all --long -h And also try the following command and attach the large log file: $ guestfish -vx -N disk mkfs vfat /dev/sda label:TEST >& log Created attachment 1758816 [details]
log
$ guestfish -N disk mkfs vfat /dev/sda label:TEST
$ file test1.img
test1.img: DOS/MBR boot sector, code offset 0x58+2, OEM-ID "mkfs.fat", sectors/cluster 8, Media descriptor 0xf8, sectors/track 61, heads 34, sectors 2097119 (volumes > 32 MB), FAT (32 bit), sectors/FAT 2048, serial number 0xde99c846, label: "TEST "
$ virt-filesystems -a test1.img --all --long -h
libguestfs: error: list_filesystems: /dev/sda: not a partitioned device -- the same error
$ guestfish -vx -N disk mkfs vfat /dev/sda label:TEST >& log
...
SELinux enabled state cached to: disabled
No filesystem is currently mounted on /sys/fs/cgroup.
Failed to determine unit we run in, ignoring: No data available
fsync /dev/sda
...
Attached the log
It's actually a bug in dosfstools which happened between dosfstools-4.1-12.fc33 and dosfstools-4.2-1.fc34. The following command: mkfs -t vfat -I -n TEST /dev/sda previously would create a filesystem directly on /dev/sda. Now it creates some kind of corrupted MBR partition table. If you look at the difference between good (-) and bad (+) you can see the extra partition table at offset 0x1b0-0x1cf in the image: --- test1.img.hex 2021-02-23 12:29:49.815043947 +0000 +++ test1-bad.img.hex 2021-02-23 12:30:00.197159306 +0000 @@ -1,6 +1,6 @@ 00000000 eb 3c 90 6d 6b 66 73 2e 66 61 74 00 02 04 04 00 |.<.mkfs.fat.....| 00000010 02 00 02 00 50 f8 14 00 14 00 01 00 00 00 00 00 |....P...........| -00000020 00 00 00 00 80 00 29 c1 0d 23 51 54 45 53 54 20 |......)..#QTEST | +00000020 00 00 00 00 80 00 29 65 e1 e3 4f 54 45 53 54 20 |......)e..OTEST | 00000030 20 20 20 20 20 20 46 41 54 31 36 20 20 20 0e 1f | FAT16 ..| 00000040 be 5b 7c ac 22 c0 74 0b 56 b4 0e bb 07 00 cd 10 |.[|.".t.V.......| 00000050 5e eb f0 32 e4 cd 16 cd 19 eb fe 54 68 69 73 20 |^..2.......This | @@ -12,6 +12,10 @@ 000000b0 72 79 20 61 67 61 69 6e 20 2e 2e 2e 20 0d 0a 00 |ry again ... ...| 000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * +000001b0 00 00 00 00 00 00 00 00 e4 e4 e3 4f 00 00 80 00 |...........O....| +000001c0 01 00 04 fe ff ff 00 00 00 00 00 50 00 00 00 00 |...........P....| +000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +* 000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.| 00000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * @@ -21,8 +25,8 @@ 00003000 f8 ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00003010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * -00005800 54 45 53 54 20 20 20 20 20 20 20 08 00 00 a3 63 |TEST ....c| -00005810 57 52 57 52 00 00 a3 63 57 52 00 00 00 00 00 00 |WRWR...cWR......| +00005800 54 45 53 54 20 20 20 20 20 20 20 08 00 00 97 63 |TEST ....c| +00005810 57 52 57 52 00 00 97 63 57 52 00 00 00 00 00 00 |WRWR...cWR......| 00005820 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00a00000 This only happens when mkfs is used to create a filesystem on a block device like /dev/sda, not when you do it on a local file. --mbr[=y|yes|n|no|a|auto] Fill (fake) MBR table with disk signature one partition which starts at sector 0 (includes MBR itself) and spans whole disk device. It is needed only for non-removable disks used on Microsoft Windows systems and only when formatting whole unpartitioned disk. Location of the disk signature and partition table overlaps with the end of the first FAT sector (boot code location), therefore there is no additional space usage. Default is auto mode in which mkfs.fat put MBR ta‐ ble only for non-removable disks when formatting whole unpartitioned disk. Oh I didn't notice that option. It does indeed suppress the bogus PT. We can probably add this as a workaround in libguestfs, since upstream is disputing that broken PTs are a bug at all. The behaviour is intended by upstream. Can we close this? I.e there is nothing to fix downstream. Assigning it back to libguestfs, we'll have to find some way to workaround this. Probably using --mbr=no and also coping with the invalid PTs. (In reply to Richard W.M. Jones from comment #9) > also coping with the invalid PTs. Should this be added to "parted" instead? After all, the block device in question is partitioned (in a sense), except the first partition (the FAT filesystem) contains the partition table (including the entry that describes the partition itself). Otherwise I expect libguestfs will have to add some special probing which differs from its normal order of operations. Currently, I assume, (1) the partition table is found at first (so we start unnesting partitions / filesystems), but (2) the partition table (or PT entry) is not accepted as valid. Due to (1), we don't try filesystem recognition on the whole device, and due to (2) we don't try fs recognition on the partition entry (which too spans the entire device). The hack would be to still try whole-device fs recognition, despite (1). I think it would be cleaner if "parted" simply reported the one partition entry. I think there's really two parts to this bug. (a) Change guestfs_mkfs so that if the filesystem is vfat we use --mbr=no (only if asked to create the fs on the whole device? or always? not sure). (b) Work out exactly why we get the "not a partitioned device" error and fix that. (a) is an easy fix. Currently for (b): $ rpm -q dosfstools dosfstools-4.2-2.fc35.x86_64 $ guestfish -N disk mkfs vfat /dev/sda $ guestfish -a test1.img run : list-filesystems libguestfs: error: list_filesystems: /dev/sda: not a partitioned device but: $ guestfish -a test1.img run : list-devices /dev/sda $ guestfish -a test1.img run : list-partitions /dev/sda1 So apart from the error it all sort of works. Now interestingly we don't use parted for list-filesystems etc, we use kernel /sys probing (in other words, we use the kernel's own partition detection code). Which is kind of good and bad. The partitioning code in libguestfs has evolved over a decade and probably needs a fresh eye on the whole thing. (Having said that, I doubt using parted would be a good idea since parted is only semi-maintained and it has a really weird API.) Hmm, I need to digest this some more. Regarding (In reply to Richard W.M. Jones from comment #11) > (b) Work out exactly why we get the "not a partitioned device" > error and fix that. we find a pointer to bug 634246 in the source code. Namely, if "parted" is invoked on a non-partitioned device (or, well, in this case, invalidly / obscurely partitioned device), it prints "loop" as the partition table type. So "parted" doesn't fail outright, it gracefully (?) reports "no partition table found" by printing "loop" -- it's libguestfs that turns that into an error. Originally from this (historical) commit: commit b3d27793f04ae44b2c11f6422a19b4422ac941cd Author: Richard W.M. Jones <rjones> Date: Mon Oct 18 12:56:54 2010 +0100 parted: Don't return "loop" for non-partitioned devices (RHBZ#634246). If you ran part-get-parttype command on a device which didn't contain a partition, it used to return the string "loop". This is an internal representation that parted uses. We should instead return an error because part-get-parttype makes no sense for devices which are not partitioned. (In reply to Richard W.M. Jones from comment #11) > I think there's really two parts to this bug. (a) Change guestfs_mkfs > so that if the filesystem is vfat we use --mbr=no (only if asked > to create the fs on the whole device? or always? not sure). > (a) is an easy fix. I think it should be safe to make "--mbr=no" independent of the target block device type (whole device vs. partition). However, can we make that option independent of the mkfs.vfat (that is, dosfstools) version? Because, for example, on RHEL7, the dosfstools-3.0.20-10.el7.x86_64 package does not recognize the "--mbr" option. So if we implemented such a modification *upstream* without any kind of feature detection, it would break the mkfs.vfat invocation altogether, on older systems. Now my understanding is that, while "mkfs.vfat" runs in the appliance, the appliance is rebuilt from *host side* utilities by supermin5. Is that correct? Or only partially correct? Where does "mkfs.vfat" originate from, ultimately? Thanks! (In reply to Laszlo Ersek from comment #12) > Hmm, I need to digest this some more. > > Regarding > > (In reply to Richard W.M. Jones from comment #11) > > (b) Work out exactly why we get the "not a partitioned device" > > error and fix that. > > we find a pointer to bug 634246 in the source code. Namely, if "parted" is > invoked on a non-partitioned device (or, well, in this case, invalidly / > obscurely partitioned device), it prints "loop" as the partition table type. > > So "parted" doesn't fail outright, it gracefully (?) reports "no partition > table found" by printing "loop" -- it's libguestfs that turns that into an > error. Originally from this (historical) commit: ... I still agree with that commit. At the API level if someone calls guestfs_part_get_parttype on a block device which isn't partitioned them surely that is an error. More likely the caller isn't prepared for this case and should catch the error and ignore it (or whatever is appropriate). I would say that it is attach a specific errno to error messages and that makes it easier for the caller to distinguish this specific error from some other kind of failure. In OCaml daemon functions this is done using raise (Unix.Unix_error (Unix.EXXX, "message")) (where EXXX is an errno value like EINVAL). If the error escapes back to the library then the caller can find the errno using guestfs_last_errno. If the caller is inside the daemon you can catch it using ordinary ocaml try ... with Unix.Unix_error (EXXX, _) -> ... > However, can we make that option independent of the mkfs.vfat (that is, dosfstools) version? Because, for example, on RHEL7, the dosfstools-3.0.20-10.el7.x86_64 package does not recognize the "--mbr" option. So if we implemented such a modification *upstream* without any kind of feature detection, it would break the mkfs.vfat invocation altogether, on older systems. There are other places where we detect features of tools at runtime, eg: https://github.com/libguestfs/libguestfs/blob/efb3d019925c10c9567c91ccf1bb3f283919ba55/daemon/parted.ml#L31 However at some point we should decide that we're no longer going to support {old distro | old tool}. I feel that for RHEL 7 we're getting to that stage, but maybe not quite yet. For libnbd and nbdkit I'm trying to keep the projects going on RHEL 7, although it's some effort. Might be worth asking Martin Kletzander if he has any opinions on this because he's leading the CI effort and thus has to make everything work on the distros. BTW it's often good to look at the documentation for an API to see what we promised callers: https://libguestfs.org/guestfs.3.html#guestfs_part_get_parttype This command examines the partition table on device and returns the partition table type (format) being used. Common return values include: msdos (a DOS/Windows style MBR partition table), gpt (a GPT/EFI-style partition table). Other values are possible, although unusual. See guestfs_part_init for a full list. which links to: https://libguestfs.org/guestfs.3.html#guestfs_part_init I would say that "msdos" and "gpt" are the only reasonable values that this API should return, and anything else is an error (including 'there's no partition table, why are you asking me?'). This documentation is generated from: https://github.com/libguestfs/libguestfs/blob/efb3d019925c10c9567c91ccf1bb3f283919ba55/generator/actions_core.ml#L5109 The problem seems to start (from reading the source) with > let partitions = Devsparts.list_partitions () in in "daemon/listfs.ml". I need to figure out what Devsparts.list_partitions does ("daemon/devsparts.ml"). Arguably, the fake (bogus) partition table that is embedded in the whole-disk FAT filesystem should be caught and removed, before Devsparts.list_partitions returns it. Because once Devsparts.list_partitions returns it, we have this: > (* Partitions. *) > let partitions = Devsparts.list_partitions () in > let partitions = List.filter is_partition_can_hold_filesystem partitions in and then > and is_partition_can_hold_filesystem partition = > let device = Devsparts.part_to_dev partition in > let partnum = Devsparts.part_to_partnum partition in > let parttype = Parted.part_get_parttype device in i.e., "Devsparts.part_to_dev" maps the bogus partition back to the containing device, which is what "Parted.part_get_parttype" then blows up on. We need to prevent the bogus partition table / bogus single partition to escape Devsparts.list_partitions. See also this: (In reply to Richard W.M. Jones from comment #11) > $ guestfish -a test1.img run : list-devices > /dev/sda > $ guestfish -a test1.img run : list-partitions > /dev/sda1 > > So apart from the error it all sort of works. My point is that list-devices works fine here, but list-partitions, although it does not report an error, is wrong. /dev/sda1 is a *bogus* partition; /dev/sda is in fact not a (validly) partitioned device, and so /dev/sda1 should be removed from the output. This is also where the "Parted.part_get_parttype" error originates from. ("If a partition exists, then it is described by a partition table, and the partition table must have a type", is the *correct* argument that "Parted.part_get_parttype" makes. So where we break that (otherwise correct) chain, leading to the error, is falsifying the initial condition: "if a partition exists". That's not a partition.) I'll look into the details of "Devsparts.list_partitions" later. The "add_partitions" [daemon/devsparts.ml] helper function is the one that needs modification. For a given $dev parameter, it basically lists /sys/block/$dev/$dev* We should scan the block device with the supposed partition table, namely $dev, ourselves. If $dev seems to contain a malformed MBR partition table, such as the one created by recent dosfstools, then "add_partitions" should produce an empty list, not the list of quasi-partitions that the Linux partition table driver exposes as /sys/block/$dev/$dev* "add_partitions" is passed in to both "map_block_devices" and "map_md_devices", and a bogus MBR may exist on either kind of block device. So we should either modify "add_partitions" (see above), or apply the same kind of device-level filtering to *both* "map_block_devices" and "map_md_devices". $ diff -u -U 12 --label good --label bad \ <(hexdump -n 512 -v -C loopf.good) \ <(hexdump -n 512 -v -C loopf.bad) --- good +++ bad @@ -1,33 +1,33 @@ 00000000 eb 3c 90 6d 6b 66 73 2e 66 61 74 00 02 04 01 00 |.<.mkfs.fat.....| 00000010 02 00 02 00 08 f8 02 00 10 00 02 00 00 00 00 00 |................| -00000020 00 00 00 00 80 00 29 02 41 21 58 54 45 53 54 20 |......).A!XTEST | +00000020 00 00 00 00 80 00 29 08 2f 2b 4e 54 45 53 54 20 |......)./+NTEST | 00000030 20 20 20 20 20 20 46 41 54 31 32 20 20 20 0e 1f | FAT12 ..| 00000040 be 5b 7c ac 22 c0 74 0b 56 b4 0e bb 07 00 cd 10 |.[|.".t.V.......| 00000050 5e eb f0 32 e4 cd 16 cd 19 eb fe 54 68 69 73 20 |^..2.......This | 00000060 69 73 20 6e 6f 74 20 61 20 62 6f 6f 74 61 62 6c |is not a bootabl| 00000070 65 20 64 69 73 6b 2e 20 20 50 6c 65 61 73 65 20 |e disk. Please | 00000080 69 6e 73 65 72 74 20 61 20 62 6f 6f 74 61 62 6c |insert a bootabl| 00000090 65 20 66 6c 6f 70 70 79 20 61 6e 64 0d 0a 70 72 |e floppy and..pr| 000000a0 65 73 73 20 61 6e 79 20 6b 65 79 20 74 6f 20 74 |ess any key to t| 000000b0 72 79 20 61 67 61 69 6e 20 2e 2e 2e 20 0d 0a 00 |ry again ... ...| 000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| -000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| -000001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| +000001b0 00 00 00 00 00 00 00 00 98 33 2b 4e 00 00 80 00 |.........3+N....| +000001c0 01 00 01 00 01 40 00 00 00 00 00 08 00 00 00 00 |.....@..........| 000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.| 00000200 The loop device I used as an example in comment 19 is 1MiB in size. https://en.wikipedia.org/wiki/Master_boot_record#PTE https://en.wikipedia.org/wiki/Partition_type $ hexdump -v -C -n 16 -s 0x1be loopf.bad 000001be 80 00 01 00 01 00 01 40 00 00 00 00 00 08 00 00 |.......@........| ^^ ^^ ^^^^^ ^^ ^^ ^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^ | | | | | | | | | | | | | | | number of 512B sectors in | | | | | | | partition: 2048 (0x0000_0800) | | | | | | | | | | | | | starting LBA: 0 | | | | | | | | | | | ending sector: 1, ending cylinder: 64 (0x40) | | | | | | | | | ending head: 0 | | | | | | | partition type: 1 (MBR) | | | | | starting sector: 1, starting cylinder: 0 | | | starting head: 0 | bootable What matters to us here are the following fields: - bootable, - C/H/S address of first absolute sector in partition: 0/0/1 - partition type: 1 (MBR) - starting LBA: 0 The "sector count" and "C/H/S address of last absolute sector in partition" fields are irrelevant to us. The former (= ending C/H/S) is simply unusable for any disk of nontrivial size, and the latter (LBA) is something we cannot verify in OCaml, unless we can call the BLKGETSIZE or BLKGETSIZE64 ioctl()s in some way. (I see that "daemon/debug-bmap.c", "daemon/utils.c", and "lib/create.c" call BLKGETSIZE64; however, those calls are not reusable for this purpose, and anyway, checking whether the end of the 1st partition matches the end of the block device is not too important. The invalid-ness of the partition table is that the 1st partition *starts* at the very beginning of the block device, overlaying the partition table itself.) Additionally, we should check the boot signature { 0x55, 0xAA } at offset 0x01FE. (In reply to Richard W.M. Jones from comment #11) > I think there's really two parts to this bug. (a) Change guestfs_mkfs > so that if the filesystem is vfat we use --mbr=no (only if asked to > create the fs on the whole device? or always? not sure). > > (b) Work out exactly why we get the "not a partitioned device" error > and fix that. I've got one patch for each of (a) and (b); I'll test them tomorrow and post them if everything works. Unfortunately, this is an incredible shit-show. The "parted" utility fails if the bogus partition table is *present*. On the other hand, the "blkid" utility (called from check_with_vfs_type -> Blkid.vfs_type) fails if the bogus partition is *not* used. It's basically impossible to satisfy both utilities at the same time. "Parted" insists that the partition is broken (it is), whereas "blkid" insists that the (broken) partition be used for accessing the vfat filesystem; whole disk access does not work. The mind boggles. Here's another idea: given that the dumpster fire that the bogus partition is cannot be constrained (per comment 22), I'll try to kludge only "Parted.part_get_parttype" into submission. This is a U-turn from the previous approach: rather than *hiding* the bogus partition from everything, *expose* the bogus partition to everything, and only override the exception thrown by Parted.part_get_parttype. After all, my helper function Utils.has_bogus_mbr *already* checks (per comment 20) that the partition type is MBR. So if we can see that, there's no need to invoke parted at all. I guess I can give this a go as well. /smh So in the Parted module, there are three functions that relate to MBR: - part_get_parttype -> the one that currently breaks, and we need to hack, - part_get_mbr_id -> this one uses the "sfdisk" command with the "--part-type" (or "--print-id") option, as in sfdisk --part-type /dev/loop0 1 Thankfully, sfdisk works with the insane partition table / pte. - part_get_mbr_part_type -> this relies on "part_get_parttype" and "part_get_mbr_id" from above, so in case I can "fix part_get_parttype", this third function should work as well. Yup, this is a dumpster fire; the function that calls the "parted" utility directly is "print_partition_table_machine_readable", and that one is called by *both* "part_get_parttype" *and* "part_list". And "part_list" is called further from "daemon/inspect_fs_windows.ml", plus it's a public libguestfs API. We cannot call "parted" on the device with the broken partition table *at all*! It does not print any partition lines. Compare: * works: # parted -m -s -- /dev/nvme0n1 unit b print BYT; /dev/nvme0n1:512110190592B:nvme:512:512:gpt:SAMSUNG MZVKW512HMJP-000H1:; 1:1048576B:630194175B:629145600B:fat32:EFI System Partition:boot, esp; 2:630194176B:1703935999B:1073741824B:ext4::; 3:1703936000B:512109838335B:510405902336B:::; * does not work: # parted -m -s -- /dev/loop0 unit b print Error: Invalid partition table - recursive partition on /dev/loop0. BYT; /dev/loop0:1048576B:loopback:512:512:unknown:Loopback device:; Yet more damage. All the libguestfs APIs implemented in "daemon/parted.c" break on such a disk image, because all those operations rely on parted -- and parted just won't work on such a partition table. I considered manually reimplementing "part_get_parttype" and "part_list" in OCaml, parsing the partition table in binary, if we detect the damage caused by dosfstools. But that's just impossible to do for *all* the libguestfs APIs implemented in "daemon/parted.c". When the dosfstools developer "pali" said in <https://github.com/dosfstools/dosfstools/issues/161#issuecomment-784200231> that "Kernel correctly handles it and also other standard tools", they were *wrong*. "parted" is a standard tool, and it does *not* handle the bogus partition table. I think our remaining option here is to contribute a patch to parted. We use parted so extensively in libguestfs that replacing all of it just for the sake of this screwed up dosfstools misfeature is infeasible. (1) Rich: I have a patch for "--mbr=no", so that at the least we don't create broken partition tables. If you agree, I can post that to the list. (2) Regarding the other direction, coping with malformed partition tables like this, we need to reassign this bug to -- or make it dependent on a new bug for -- parted. parted is our internal abstraction layer and we use it *very* extensively (I've only realized gradually how extensively in fact); breaking through or dancing around that abstraction in every spot wherever we use it is completely wrong. Not just because it's a lot of coding, but because it's *bad design* (especially that the dosfstools misfeature is mostly useless from our perspective). Should we reassign this bug to parted, or create a new one, or maybe consider contributing a patch to upstream parted? ... For reference, the parted code does this -- in file "libparted/labels/dos.c", function read_table(), at current master (b20227adf575): ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1018) /* parse the partitions from this table */ ca4f538076fdf (Petr Uzel 2009-08-26 12:29:26 +0200 1019) for (i = 0; i < DOS_N_PRI_PARTITIONS; i++) { b4c53f6d14746 (Jim Meyering 2007-05-30 15:55:30 +0200 1020) raw_part = &table->partitions [i]; ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1021) if (raw_part->type == PARTITION_EMPTY || !raw_part->length) ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1022) continue; ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1023) ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1024) /* process nested extended partitions after normal logical ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1025) * partitions, to make sure we get the order right. ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1026) */ ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1027) if (is_extended_table && raw_part_is_extended (raw_part)) 099eaa4acd0d8 (Jim Meyering 2009-03-05 19:22:39 +0100 1028) continue; ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1029) ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1030) lba_offset = is_extended_table ? sector : 0; ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1031) ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1032) if (linear_start (disk, raw_part, lba_offset) == sector) { ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1033) if (ped_exception_throw ( ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1034) PED_EXCEPTION_ERROR, ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1035) PED_EXCEPTION_IGNORE_CANCEL, ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1036) _("Invalid partition table - recursive " ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1037) "partition on %s."), ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1038) disk->dev->path) ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1039) != PED_EXCEPTION_IGNORE) ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1040) goto error; ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1041) continue; /* avoid infinite recursion */ ^232dbda915df (Anant Narayanan 2006-09-14 15:18:45 +0000 1042) } I think we should reassign this BZ to parted. Basically we have two appliance-side *sets* of utilities, (i) "parted" and (ii) "everything else (including the kernel itself)". Both sets disagree on the validity of the bogus partition table. Trying to hide the disagreement in the libguestfs daemon is just not feasible; the underlying utilities themselves need to be reconciled. YongkuiGuo: regarding your exact reproducer in comment#0, namely where you create a FAT filesystem with "virt-make-fs", and then check it with "guestfish": that particular use case can be made work with just the --mbr=no patch I have. [PATCH] daemon/mkfs: disable creation of fake MBR partition table with "mkfs.fat" Message-Id: <20211123140050.14767-1-lersek> https://listman.redhat.com/archives/libguestfs/2021-November/msg00211.html (In reply to Laszlo Ersek from comment #27) > (1) Rich: I have a patch for "--mbr=no", so that at the least we don't > create broken partition tables. If you agree, I can post that to the list. I reviewed that and it looks good, thanks. > (2) Regarding the other direction, coping with malformed partition tables > like this, we need to reassign this bug to -- or make it dependent on a new > bug for -- parted. parted is our internal abstraction layer and we use it > *very* extensively (I've only realized gradually how extensively in fact); > breaking through or dancing around that abstraction in every spot wherever > we use it is completely wrong. Not just because it's a lot of coding, but > because it's *bad design* (especially that the dosfstools misfeature is > mostly useless from our perspective). Should we reassign this bug to parted, > or create a new one, or maybe consider contributing a patch to upstream > parted? Always err on the side of creating a new bug. In this particular case we need to keep this bug around because we need to patch libguestfs in RHEL 9 (at the very least for the --mbr=n fix). And then if parted needs a fix then it'll also need a bug. In more general terms about our use of parted - well actually we use at least three different code bases in libguestfs to read partition tables (the perils of having code around for over a decade): parted, sfdisk, and the kernel. They all have very peculiar APIs, and they all parse slightly different partition table formats. I really don't like parted, because it tries to be all things to all partition formats, and fails at all of them. At some point we had a plan to implement "libmbr" and "libgpt" (because those are the only two formats anyone cares about), but that was a bunch of work and we never did it. Unifying the whole mess around a couple of straightforward libraries dedicated to their formats would probably be a good idea, but is obviously a huge amount of actual work. > I think we should reassign this BZ to parted. New bug, but yes. However I don't know if anyone at Red Hat still works on parted, after Jim Meyering left. The reproducibility mystery has been solved: the parted breakage depends on the size of the device. for S in 1M 200M; do echo "==== $S ====" fallocate -l $S img-$S mkfs -t vfat -I -n TEST --mbr=yes img-$S parted -m -s -- img-$S unit b print done Output: > ==== 1M ==== > mkfs.fat 4.2 (2021-01-31) > Error: Invalid partition table - recursive partition on /root/img-1M. > BYT; > /root/img-1M:1048576B:file:512:512:unknown::; > ==== 200M ==== > mkfs.fat 4.2 (2021-01-31) > BYT; > /root/img-200M:209715200B:file:512:512:loop::; > 1:0B:209715199B:209715200B:fat16::; From comment 19 onward, I've done *everything* with a 1MB FAT image. (Refer to "0x0000_0800" in comment 20.) Note the difference: - with the 1M image (using FAT12), parted completely breaks (this is what all my struggles have come from, above) - with the 200M image (using FAT16), while parted indeed mis-reports the partition table type (it should report "mbr", not "loop"), it does print the (bogus) partition table entry (the first partition's descriptor) transparently. This implies that a limited / contained workaround in libguestfs could be possible for FAT16 & FAT32 images where the bogus MBR is present, but FAT12 images will break all the parted-dependent APIs still. At least now I know how to file the parted bug. (I can reproduce the issue with larger images too, forcing FAT12 with mkfs.fat -- FAT12 works up to and including 255MiB, and triggers the parted bug.) (In reply to Laszlo Ersek from comment #29) > YongkuiGuo: regarding your exact reproducer in comment#0, namely where you > create a FAT filesystem with "virt-make-fs", and then check it with > "guestfish": that particular use case can be made work with just the > --mbr=no patch I have. Got it. Thanks a lot. Reported the following parted bugs: - https://bugzilla.redhat.com/show_bug.cgi?id=2026220 - https://bugzilla.redhat.com/show_bug.cgi?id=2026224 Okay, new status summary. (i) When the fake MBR sits in sector#0 of a FAT12 fs, parted is wholly broken (bug 2026220). (ii) With FAT16 or FAT32, parted prints the one partition entry alright, only the device line is broken (bug 2026224). We can't do anything about (i). On a FAT12 image with the fake MBR, no "parted"-based API of libguestfs will work, until parted is fixed. On the other hand, FAT12 is mostly irrelevant: - mkfs.fat, when the FAT variant is not forced with the -F option, switches from FAT12 to FAT16 when the image size grows from 8MB to 9MB. Only a tiny fraction of disk images handled with libguestfs should be smaller than 9MB. - Even when forced with the "-F 12" option, "mkfs.fat" only permits FAT12 up to and including 255MB image size. A 256MB image requires FAT16 (or FAT32). This means that even when someone insists on creating a FAT12 image, such a disk image is unlikely to be used as an actual virtual machine disk (because it is limited to 255MB in size). Regarding (ii): Again, "print_partition_table_machine_readable" is the function that calls "parted" (in OCaml). It returns a tuple (a pair, to be more precise). The first component is the device line, the second component is the list of partition entry lines. The first use of "print_partition_table_machine_readable" is in "part_list", and there the first component of the return pair is ignored -- therefore, "part_list" is unaffected by the wrong device line. The second use is in "part_get_parttype", where *only* the device line matters; the second component of the return pair is ignored. The device line is what this bug is about. Therefore, it should be possible to work around bug 2026224 for FAT16 and FAT32 images -- i.e., for the majority of FAT images used as virtual machine disks, most likely -- in "part_get_parttype". Regarding "daemon/parted.c" (that is, the parted-based APIs implemented in C), the partition table is fetched by the internal function print_partition_table(). It is consumed by two other (public API) functions, print_partition_table() and do_part_get_name(). Luckily, neither cares about the device line, so the C source need not be modified. [PATCH v2 0/5] work around part table type misreporting by "parted" Message-Id: <20211124103747.9751-1-lersek> https://listman.redhat.com/archives/libguestfs/2021-November/msg00219.html [PATCH v3 0/5] work around part table type misreporting by "parted" Message-Id: <20211125094954.9713-1-lersek> https://listman.redhat.com/archives/libguestfs/2021-November/msg00236.html (In reply to Laszlo Ersek from comment #38) > [PATCH v3 0/5] work around part table type misreporting by "parted" > Message-Id: <20211125094954.9713-1-lersek> > https://listman.redhat.com/archives/libguestfs/2021-November/msg00236.html Upstream commit range e7f72ab146b9..d829f9ff9ae0 (with a small tweak to patch#4, according to Rich's feedback.) Test with package: libguestfs-1.46.1-1.el9.x86_64 Steps: 1. On rhel9 host $ guestfish -N disk mkfs vfat /dev/sda label:TEST 2. $ file test1.img test1.img: DOS/MBR boot sector, code offset 0x58+2, OEM-ID "mkfs.fat", sectors/cluster 8, Media descriptor 0xf8, sectors/track 61, heads 34, sectors 2097119 (volumes > 32 MB), FAT (32 bit), sectors/FAT 2048, serial number 0x8004f182, label: "TEST " 3. $ guestfish -a test1.img run : list-filesystems /dev/sda: vfat It works well. Verified with package: libguestfs-1.46.1-2.el9.x86_64 Steps: 1. On rhel9 host $ guestfish -N disk mkfs vfat /dev/sda label:TEST 2. $ file test1.img test1.img: DOS/MBR boot sector, code offset 0x58+2, OEM-ID "mkfs.fat", sectors/cluster 8, Media descriptor 0xf8, sectors/track 61, heads 34, sectors 2097119 (volumes > 32 MB), FAT (32 bit), sectors/FAT 2048, serial number 0x8004f182, label: "TEST " 3. $ guestfish -a test1.img run : list-filesystems /dev/sda: vfat It works. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: libguestfs), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2317 |