1931821 – work around mkfs.vfat writes corrupted filesystem/partition table when used on whole block device

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1931821 - work around mkfs.vfat writes corrupted filesystem/partition table when used on whole block device

Summary: work around mkfs.vfat writes corrupted filesystem/partition table when used o...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	libguestfs
Sub Component:
Version:	9.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	pre-dev-freeze
Target Release:	---
Assignee:	Laszlo Ersek
QA Contact:	YongkuiGuo
Docs Contact:
URL:
Whiteboard:
Depends On:	2011711
Blocks:	TRACKER-bugs-affecting-libguestfs 1931866
TreeView+	depends on / blocked

Reported:	2021-02-23 10:45 UTC by YongkuiGuo
Modified:	2022-05-17 12:38 UTC (History)
CC List:	2 users (show)
Fixed In Version:	libguestfs-1.46.1-1.el9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1931866 (view as bug list)
Environment:
Last Closed:	2022-05-17 12:28:37 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
virt-make-fs whole output (138.59 KB, text/plain) 2021-02-23 10:45 UTC, YongkuiGuo	no flags	Details
virt-make-fs new output (1.83 MB, text/plain) 2021-02-23 11:29 UTC, YongkuiGuo	no flags	Details
log (57.73 KB, text/plain) 2021-02-23 12:07 UTC, YongkuiGuo	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	dosfstools dosfstools issues 161	0	None	open	mkfs.vfat 4.2 on a whole block device adds bogus partition table	2021-02-23 13:11:30 UTC
Red Hat Bugzilla	2026220	1	unspecified	CLOSED	parted fails to parse the fake MBR partition table created by dosfstools-4.2 for FAT12 images	2021-12-03 00:55:02 UTC
Red Hat Bugzilla	2026224	1	unspecified	CLOSED	parted reports the type of the fake MBR partition table, created by dosfstools-4.2, as "loop"	2021-12-03 00:27:03 UTC
Red Hat Product Errata	RHBA-2022:2317	0	None	None	None	2022-05-17 12:29:12 UTC

Description YongkuiGuo 2021-02-23 10:45:19 UTC

Created attachment 1758806 [details]
virt-make-fs whole output

Description of problem:
The virt-make-fs command fails to create vfat filesystem on RHEL9.


Version-Release number of selected component (if applicable):
libguestfs-1.44.0-5.el9.x86_64
kernel-5.11.0-1.el9.x86_64


How reproducible:
100%


Steps:

1. On rhel9.0 host with RHEL-9.0.0-20210221.6 compose
# virt-make-fs -vx --format=raw --size=200M --type=vfat --label=TEST /boot test_vfat.img
...
creating vfat filesystem on /dev/sda ...
libguestfs: trace: mkfs "vfat" "/dev/sda" "label:TEST"
guestfsd: <= mkfs (0x116) request length 84 bytes
commandrvf: stdout=n stderr=y flags=0x0
commandrvf: udevadm --debug settle -E /dev/sda
SELinux enabled state cached to: disabled
No filesystem is currently mounted on /sys/fs/cgroup.
Failed to determine unit we run in, ignoring: No data available
commandrvf: stdout=y stderr=y flags=0x0
commandrvf: wipefs --help
commandrvf: stdout=n stderr=n flags=0x0
commandrvf: wipefs -a --force /dev/sda
commandrvf: stdout=n stderr=y flags=0x0
commandrvf: mkfs -t vfat -I -n TEST /dev/sda
libguestfs: trace: mkfs = 0
libguestfs: trace: mount_options "utf8" "/dev/sda" "/"
guestfsd: => mkfs (0x116) took 0.08 secs
guestfsd: <= mount_options (0x4a) request length 68 bytes
commandrvf: stdout=n stderr=y flags=0x0
commandrvf: udevadm --debug settle -E /dev/sda
[    1.800802]  sda: sda1
SELinux enabled state cached to: disabled
No filesystem is currently mounted on /sys/fs/cgroup.
Failed to determine unit we run in, ignoring: No data available
command: mount '-o' 'utf8' '/dev/sda' '/sysroot//'
...

2.
# guestfish -a test_vfat.img
><fs> run
><fs> list-filesystems
libguestfs: error: list_filesystems: /dev/sda: not a partitioned device



Actual results:
As above

Expected results:
The virt-make-fs can create vfat filesystem successfully.

Additional info:

Comment 1 Richard W.M. Jones 2021-02-23 11:08:13 UTC

It looks like the log file is truncated?  It ends with:

guestfsd: receive_file: reading length word
guestfsd: receive_file: got chunk: cancel = 0x0, len = 8192, buf = 0x563e8b65a770
gu

Comment 2 YongkuiGuo 2021-02-23 11:29:20 UTC

Created attachment 1758813 [details]
virt-make-fs new output

Sorry, it's my mistake. Attached the new entire output.

Comment 3 Richard W.M. Jones 2021-02-23 11:51:16 UTC

This bug is really mysterious.  I cannot reproduce it in Fedora Rawhide, but
it looks like it could be a kernel bug.

Can you try the following commands:

$ guestfish -N disk mkfs vfat /dev/sda label:TEST 
$ file test1.img
$ virt-filesystems -a test1.img --all --long -h

And also try the following command and attach the large log file:

$ guestfish -vx -N disk mkfs vfat /dev/sda label:TEST >& log

Comment 4 YongkuiGuo 2021-02-23 12:07:57 UTC

Created attachment 1758816 [details]
log

$ guestfish -N disk mkfs vfat /dev/sda label:TEST 
$ file test1.img
test1.img: DOS/MBR boot sector, code offset 0x58+2, OEM-ID "mkfs.fat", sectors/cluster 8, Media descriptor 0xf8, sectors/track 61, heads 34, sectors 2097119 (volumes > 32 MB), FAT (32 bit), sectors/FAT 2048, serial number 0xde99c846, label: "TEST       "
$ virt-filesystems -a test1.img --all --long -h
libguestfs: error: list_filesystems: /dev/sda: not a partitioned device    -- the same error


$ guestfish -vx -N disk mkfs vfat /dev/sda label:TEST >& log 
...
SELinux enabled state cached to: disabled
No filesystem is currently mounted on /sys/fs/cgroup.
Failed to determine unit we run in, ignoring: No data available
fsync /dev/sda
...

Attached the log

Comment 5 Richard W.M. Jones 2021-02-23 12:35:15 UTC

It's actually a bug in dosfstools which happened between dosfstools-4.1-12.fc33
and dosfstools-4.2-1.fc34.

The following command:

mkfs -t vfat -I -n TEST /dev/sda

previously would create a filesystem directly on /dev/sda.  Now it
creates some kind of corrupted MBR partition table.

If you look at the difference between good (-) and bad (+) you can
see the extra partition table at offset 0x1b0-0x1cf in the image:

--- test1.img.hex       2021-02-23 12:29:49.815043947 +0000
+++ test1-bad.img.hex   2021-02-23 12:30:00.197159306 +0000
@@ -1,6 +1,6 @@
 00000000  eb 3c 90 6d 6b 66 73 2e  66 61 74 00 02 04 04 00  |.<.mkfs.fat.....|
 00000010  02 00 02 00 50 f8 14 00  14 00 01 00 00 00 00 00  |....P...........|
-00000020  00 00 00 00 80 00 29 c1  0d 23 51 54 45 53 54 20  |......)..#QTEST |
+00000020  00 00 00 00 80 00 29 65  e1 e3 4f 54 45 53 54 20  |......)e..OTEST |
 00000030  20 20 20 20 20 20 46 41  54 31 36 20 20 20 0e 1f  |      FAT16   ..|
 00000040  be 5b 7c ac 22 c0 74 0b  56 b4 0e bb 07 00 cd 10  |.[|.".t.V.......|
 00000050  5e eb f0 32 e4 cd 16 cd  19 eb fe 54 68 69 73 20  |^..2.......This |
@@ -12,6 +12,10 @@
 000000b0  72 79 20 61 67 61 69 6e  20 2e 2e 2e 20 0d 0a 00  |ry again ... ...|
 000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 *
+000001b0  00 00 00 00 00 00 00 00  e4 e4 e3 4f 00 00 80 00  |...........O....|
+000001c0  01 00 04 fe ff ff 00 00  00 00 00 50 00 00 00 00  |...........P....|
+000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
+*
 000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
 00000200  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 *
@@ -21,8 +25,8 @@
 00003000  f8 ff ff ff 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00003010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 *
-00005800  54 45 53 54 20 20 20 20  20 20 20 08 00 00 a3 63  |TEST       ....c|
-00005810  57 52 57 52 00 00 a3 63  57 52 00 00 00 00 00 00  |WRWR...cWR......|
+00005800  54 45 53 54 20 20 20 20  20 20 20 08 00 00 97 63  |TEST       ....c|
+00005810  57 52 57 52 00 00 97 63  57 52 00 00 00 00 00 00  |WRWR...cWR......|
 00005820  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 *
 00a00000

This only happens when mkfs is used to create a filesystem on a block
device like /dev/sda, not when you do it on a local file.

Comment 6 Jaroslav Škarvada 2021-02-23 13:03:32 UTC

--mbr[=y|yes|n|no|a|auto]
  Fill  (fake)  MBR table with disk signature one partition which starts at sector 0 (includes MBR itself) and spans whole
  disk device.  It is needed only for non-removable disks used on Microsoft Windows systems and only when formatting whole
  unpartitioned  disk.   Location  of the disk signature and partition table overlaps with the end of the first FAT sector
  (boot code location), therefore there is no additional space usage.  Default is auto mode in which mkfs.fat put MBR  ta‐
  ble only for non-removable disks when formatting whole unpartitioned disk.

Comment 7 Richard W.M. Jones 2021-02-23 13:20:22 UTC

Oh I didn't notice that option.  It does indeed suppress the bogus PT.
We can probably add this as a workaround in libguestfs, since upstream
is disputing that broken PTs are a bug at all.

Comment 8 Jaroslav Škarvada 2021-02-24 16:04:49 UTC

The behaviour is intended by upstream. Can we close this? I.e there is nothing to fix downstream.

Comment 9 Richard W.M. Jones 2021-02-25 11:22:36 UTC

Assigning it back to libguestfs, we'll have to find some way to
workaround this.  Probably using --mbr=no and also coping with
the invalid PTs.

Comment 10 Laszlo Ersek 2021-08-19 15:30:49 UTC

(In reply to Richard W.M. Jones from comment #9)
> also coping with the invalid PTs.

Should this be added to "parted" instead?

After all, the block device in question is partitioned (in a sense), except the first partition (the FAT filesystem) contains the partition table (including the entry that describes the partition itself).

Otherwise I expect libguestfs will have to add some special probing which differs from its normal order of operations. Currently, I assume, (1) the partition table is found at first (so we start unnesting partitions / filesystems), but (2) the partition table (or PT entry) is not accepted as valid. Due to (1), we don't try filesystem recognition on the whole device, and due to (2) we don't try fs recognition on the partition entry (which too spans the entire device). The hack would be to still try whole-device fs recognition, despite (1). I think it would be cleaner if "parted" simply reported the one partition entry.

Comment 11 Richard W.M. Jones 2021-08-19 16:59:47 UTC

I think there's really two parts to this bug.  (a) Change guestfs_mkfs
so that if the filesystem is vfat we use --mbr=no (only if asked
to create the fs on the whole device? or always? not sure).

(b) Work out exactly why we get the "not a partitioned device"
error and fix that.

(a) is an easy fix.

Currently for (b):

$ rpm -q dosfstools
dosfstools-4.2-2.fc35.x86_64
$ guestfish -N disk mkfs vfat /dev/sda
$ guestfish -a test1.img run : list-filesystems
libguestfs: error: list_filesystems: /dev/sda: not a partitioned device

but:

$ guestfish -a test1.img run : list-devices
/dev/sda
$ guestfish -a test1.img run : list-partitions
/dev/sda1

So apart from the error it all sort of works.

Now interestingly we don't use parted for list-filesystems etc, we
use kernel /sys probing (in other words, we use the kernel's own
partition detection code).  Which is kind of good and bad.  The
partitioning code in libguestfs has evolved over a decade and probably
needs a fresh eye on the whole thing.  (Having said that, I doubt
using parted would be a good idea since parted is only semi-maintained
and it has a really weird API.)

Comment 12 Laszlo Ersek 2021-08-23 14:36:31 UTC

Hmm, I need to digest this some more.

Regarding

(In reply to Richard W.M. Jones from comment #11)
> (b) Work out exactly why we get the "not a partitioned device"
> error and fix that.

we find a pointer to bug 634246 in the source code. Namely, if "parted" is invoked on a non-partitioned device (or, well, in this case, invalidly / obscurely partitioned device), it prints "loop" as the partition table type.

So "parted" doesn't fail outright, it gracefully (?) reports "no partition table found" by printing "loop" -- it's libguestfs that turns that into an error. Originally from this (historical) commit:

commit b3d27793f04ae44b2c11f6422a19b4422ac941cd
Author: Richard W.M. Jones <rjones>
Date:   Mon Oct 18 12:56:54 2010 +0100

    parted: Don't return "loop" for non-partitioned devices (RHBZ#634246).
    
    If you ran part-get-parttype command on a device which didn't
    contain a partition, it used to return the string "loop".  This
    is an internal representation that parted uses.  We should instead
    return an error because part-get-parttype makes no sense for
    devices which are not partitioned.

Comment 13 Laszlo Ersek 2021-08-23 14:42:54 UTC

(In reply to Richard W.M. Jones from comment #11)
> I think there's really two parts to this bug.  (a) Change guestfs_mkfs
> so that if the filesystem is vfat we use --mbr=no (only if asked
> to create the fs on the whole device? or always? not sure).

> (a) is an easy fix.

I think it should be safe to make "--mbr=no" independent of the target block device type (whole device vs. partition).

However, can we make that option independent of the mkfs.vfat (that is, dosfstools) version? Because, for example, on RHEL7, the dosfstools-3.0.20-10.el7.x86_64 package does not recognize the "--mbr" option. So if we implemented such a modification *upstream* without any kind of feature detection, it would break the mkfs.vfat invocation altogether, on older systems.

Now my understanding is that, while "mkfs.vfat" runs in the appliance, the appliance is rebuilt from *host side* utilities by supermin5. Is that correct? Or only partially correct? Where does "mkfs.vfat" originate from, ultimately?

Thanks!

Comment 14 Richard W.M. Jones 2021-08-23 15:39:26 UTC

(In reply to Laszlo Ersek from comment #12)
> Hmm, I need to digest this some more.
> 
> Regarding
> 
> (In reply to Richard W.M. Jones from comment #11)
> > (b) Work out exactly why we get the "not a partitioned device"
> > error and fix that.
> 
> we find a pointer to bug 634246 in the source code. Namely, if "parted" is
> invoked on a non-partitioned device (or, well, in this case, invalidly /
> obscurely partitioned device), it prints "loop" as the partition table type.
> 
> So "parted" doesn't fail outright, it gracefully (?) reports "no partition
> table found" by printing "loop" -- it's libguestfs that turns that into an
> error. Originally from this (historical) commit:
...

I still agree with that commit.  At the API level if someone
calls guestfs_part_get_parttype on a block device which isn't partitioned
them surely that is an error.  More likely the caller isn't prepared
for this case and should catch the error and ignore it (or whatever
is appropriate).

I would say that it is attach a specific errno to error messages
and that makes it easier for the caller to distinguish this specific
error from some other kind of failure.  In OCaml daemon functions this
is done using

  raise (Unix.Unix_error (Unix.EXXX, "message"))

(where EXXX is an errno value like EINVAL).  If the error escapes back
to the library then the caller can find the errno using
guestfs_last_errno.  If the caller is inside the daemon you can catch
it using ordinary ocaml try ... with Unix.Unix_error (EXXX, _) -> ...

> However, can we make that option independent of the mkfs.vfat (that is, dosfstools) version? Because, for example, on RHEL7, the dosfstools-3.0.20-10.el7.x86_64 package does not recognize the "--mbr" option. So if we implemented such a modification *upstream* without any kind of feature detection, it would break the mkfs.vfat invocation altogether, on older systems.

There are other places where we detect features of tools
at runtime, eg:

https://github.com/libguestfs/libguestfs/blob/efb3d019925c10c9567c91ccf1bb3f283919ba55/daemon/parted.ml#L31

However at some point we should decide that we're no longer going
to support {old distro | old tool}.  I feel that for RHEL 7 we're
getting to that stage, but maybe not quite yet.  For libnbd and
nbdkit I'm trying to keep the projects going on RHEL 7, although
it's some effort.  Might be worth asking Martin Kletzander if he
has any opinions on this because he's leading the CI effort and thus
has to make everything work on the distros.

Comment 15 Richard W.M. Jones 2021-08-23 19:14:52 UTC

BTW it's often good to look at the documentation for an API
to see what we promised callers:

https://libguestfs.org/guestfs.3.html#guestfs_part_get_parttype

    This command examines the partition table on device and returns the partition table type (format) being used.

    Common return values include: msdos (a DOS/Windows style MBR partition table), gpt (a GPT/EFI-style partition table). Other values are possible, although unusual. See guestfs_part_init for a full list.

which links to:
https://libguestfs.org/guestfs.3.html#guestfs_part_init

I would say that "msdos" and "gpt" are the only reasonable values
that this API should return, and anything else is an error (including
'there's no partition table, why are you asking me?').

This documentation is generated from:
https://github.com/libguestfs/libguestfs/blob/efb3d019925c10c9567c91ccf1bb3f283919ba55/generator/actions_core.ml#L5109

Comment 17 Laszlo Ersek 2021-11-17 13:56:08 UTC

The problem seems to start (from reading the source) with

>   let partitions = Devsparts.list_partitions () in

in "daemon/listfs.ml". I need to figure out what Devsparts.list_partitions does ("daemon/devsparts.ml"). Arguably, the fake (bogus) partition table that is embedded in the whole-disk FAT filesystem should be caught and removed, before Devsparts.list_partitions returns it. Because once Devsparts.list_partitions returns it, we have this:

>   (* Partitions. *)
>   let partitions = Devsparts.list_partitions () in
>   let partitions = List.filter is_partition_can_hold_filesystem partitions in

and then

> and is_partition_can_hold_filesystem partition =
>   let device = Devsparts.part_to_dev partition in
>   let partnum = Devsparts.part_to_partnum partition in
>   let parttype = Parted.part_get_parttype device in

i.e., "Devsparts.part_to_dev" maps the bogus partition back to the containing device, which is what "Parted.part_get_parttype" then blows up on. We need to prevent the bogus partition table / bogus single partition to escape Devsparts.list_partitions.

See also this:

(In reply to Richard W.M. Jones from comment #11)

> $ guestfish -a test1.img run : list-devices
> /dev/sda
> $ guestfish -a test1.img run : list-partitions
> /dev/sda1
> 
> So apart from the error it all sort of works.

My point is that list-devices works fine here, but list-partitions, although it does not report an error, is wrong. /dev/sda1 is a *bogus* partition; /dev/sda is in fact not a (validly) partitioned device, and so /dev/sda1 should be removed from the output. This is also where the "Parted.part_get_parttype" error originates from. ("If a partition exists, then it is described by a partition table, and the partition table must have a type", is the *correct* argument that "Parted.part_get_parttype" makes. So where we break that (otherwise correct) chain, leading to the error, is falsifying the initial condition: "if a partition exists". That's not a partition.)

I'll look into the details of "Devsparts.list_partitions" later.

Comment 18 Laszlo Ersek 2021-11-19 14:58:52 UTC

The "add_partitions" [daemon/devsparts.ml] helper function is the one that needs modification. For a given $dev parameter, it basically lists

  /sys/block/$dev/$dev*

We should scan the block device with the supposed partition table, namely $dev, ourselves. If $dev seems to contain a malformed MBR partition table, such as the one created by recent dosfstools, then "add_partitions" should produce an empty list, not the list of quasi-partitions that the Linux partition table driver exposes as

  /sys/block/$dev/$dev*

"add_partitions" is passed in to both "map_block_devices" and "map_md_devices", and a bogus MBR may exist on either kind of block device. So we should either modify "add_partitions" (see above), or apply the same kind of device-level filtering to *both* "map_block_devices" and "map_md_devices".

Comment 19 Laszlo Ersek 2021-11-22 10:53:09 UTC

$ diff -u -U 12  --label good --label bad \
    <(hexdump -n 512 -v -C loopf.good) \
    <(hexdump -n 512 -v -C loopf.bad)

--- good
+++ bad
@@ -1,33 +1,33 @@
 00000000  eb 3c 90 6d 6b 66 73 2e  66 61 74 00 02 04 01 00  |.<.mkfs.fat.....|
 00000010  02 00 02 00 08 f8 02 00  10 00 02 00 00 00 00 00  |................|
-00000020  00 00 00 00 80 00 29 02  41 21 58 54 45 53 54 20  |......).A!XTEST |
+00000020  00 00 00 00 80 00 29 08  2f 2b 4e 54 45 53 54 20  |......)./+NTEST |
 00000030  20 20 20 20 20 20 46 41  54 31 32 20 20 20 0e 1f  |      FAT12   ..|
 00000040  be 5b 7c ac 22 c0 74 0b  56 b4 0e bb 07 00 cd 10  |.[|.".t.V.......|
 00000050  5e eb f0 32 e4 cd 16 cd  19 eb fe 54 68 69 73 20  |^..2.......This |
 00000060  69 73 20 6e 6f 74 20 61  20 62 6f 6f 74 61 62 6c  |is not a bootabl|
 00000070  65 20 64 69 73 6b 2e 20  20 50 6c 65 61 73 65 20  |e disk.  Please |
 00000080  69 6e 73 65 72 74 20 61  20 62 6f 6f 74 61 62 6c  |insert a bootabl|
 00000090  65 20 66 6c 6f 70 70 79  20 61 6e 64 0d 0a 70 72  |e floppy and..pr|
 000000a0  65 73 73 20 61 6e 79 20  6b 65 79 20 74 6f 20 74  |ess any key to t|
 000000b0  72 79 20 61 67 61 69 6e  20 2e 2e 2e 20 0d 0a 00  |ry again ... ...|
 000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000100  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000110  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000120  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000170  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000180  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000190  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 000001a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
-000001b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
-000001c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
+000001b0  00 00 00 00 00 00 00 00  98 33 2b 4e 00 00 80 00  |.........3+N....|
+000001c0  01 00 01 00 01 40 00 00  00 00 00 08 00 00 00 00  |.....@..........|
 000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 000001e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
 00000200

Comment 20 Laszlo Ersek 2021-11-22 11:23:40 UTC

The loop device I used as an example in comment 19 is 1MiB in size.

https://en.wikipedia.org/wiki/Master_boot_record#PTE
https://en.wikipedia.org/wiki/Partition_type

$ hexdump -v -C -n 16 -s 0x1be loopf.bad
000001be  80 00 01 00 01 00 01 40  00 00 00 00 00 08 00 00  |.......@........|
          ^^ ^^ ^^^^^ ^^ ^^ ^^^^^  ^^^^^^^^^^^ ^^^^^^^^^^^
          |  |  |     |  |  |      |           |
          |  |  |     |  |  |      |           number of 512B sectors in
          |  |  |     |  |  |      |           partition: 2048 (0x0000_0800)
          |  |  |     |  |  |      |
          |  |  |     |  |  |      starting LBA: 0
          |  |  |     |  |  |
          |  |  |     |  |  ending sector: 1, ending cylinder: 64 (0x40)
          |  |  |     |  |
          |  |  |     |  ending head: 0
          |  |  |     |
          |  |  |     partition type: 1 (MBR)
          |  |  |
          |  |  starting sector: 1, starting cylinder: 0
          |  |
          |  starting head: 0
          |
          bootable

What matters to us here are the following fields:

- bootable,
- C/H/S address of first absolute sector in partition: 0/0/1
- partition type: 1 (MBR)
- starting LBA: 0

The "sector count" and "C/H/S address of last absolute sector in
partition" fields are irrelevant to us. The former (= ending C/H/S) is
simply unusable for any disk of nontrivial size, and the latter (LBA) is
something we cannot verify in OCaml, unless we can call the BLKGETSIZE
or BLKGETSIZE64 ioctl()s in some way.

(I see that "daemon/debug-bmap.c", "daemon/utils.c", and "lib/create.c"
call BLKGETSIZE64; however, those calls are not reusable for this
purpose, and anyway, checking whether the end of the 1st partition
matches the end of the block device is not too important. The
invalid-ness of the partition table is that the 1st partition *starts*
at the very beginning of the block device, overlaying the partition
table itself.)

Additionally, we should check the boot signature { 0x55, 0xAA } at
offset 0x01FE.

Comment 21 Laszlo Ersek 2021-11-22 15:15:36 UTC

(In reply to Richard W.M. Jones from comment #11)
> I think there's really two parts to this bug.  (a) Change guestfs_mkfs
> so that if the filesystem is vfat we use --mbr=no (only if asked to
> create the fs on the whole device? or always? not sure).
>
> (b) Work out exactly why we get the "not a partitioned device" error
> and fix that.

I've got one patch for each of (a) and (b); I'll test them tomorrow and
post them if everything works.

Comment 22 Laszlo Ersek 2021-11-23 08:53:12 UTC

Unfortunately, this is an incredible shit-show.

The "parted" utility fails if the bogus partition table is *present*.

On the other hand, the "blkid" utility (called from check_with_vfs_type -> Blkid.vfs_type) fails if the bogus partition is *not* used.

It's basically impossible to satisfy both utilities at the same time. "Parted" insists that the partition is broken (it is), whereas "blkid" insists that the (broken) partition be used for accessing the vfat filesystem; whole disk access does not work. The mind boggles.

Comment 23 Laszlo Ersek 2021-11-23 08:56:36 UTC

Here's another idea: given that the dumpster fire that the bogus partition is cannot be constrained (per comment 22), I'll try to kludge only "Parted.part_get_parttype" into submission. This is a U-turn from the previous approach: rather than *hiding* the bogus partition from everything, *expose* the bogus partition to everything, and only override the exception thrown by Parted.part_get_parttype.

After all, my helper function Utils.has_bogus_mbr *already* checks (per comment 20) that the partition type is MBR. So if we can see that, there's no need to invoke parted at all. I guess I can give this a go as well. /smh

Comment 24 Laszlo Ersek 2021-11-23 09:07:43 UTC

So in the Parted module, there are three functions that relate to MBR:

- part_get_parttype -> the one that currently breaks, and we need to hack,

- part_get_mbr_id -> this one uses the "sfdisk" command with the "--part-type" (or "--print-id") option, as in

  sfdisk --part-type /dev/loop0 1

  Thankfully, sfdisk works with the insane partition table / pte.

- part_get_mbr_part_type -> this relies on "part_get_parttype" and "part_get_mbr_id" from above, so in case I can "fix part_get_parttype", this third function should work as well.

Comment 25 Laszlo Ersek 2021-11-23 09:26:43 UTC

Yup, this is a dumpster fire; the function that calls the "parted" utility directly is "print_partition_table_machine_readable", and that one is called by *both* "part_get_parttype" *and* "part_list". And "part_list" is called further from "daemon/inspect_fs_windows.ml", plus it's a public libguestfs API.

We cannot call "parted" on the device with the broken partition table *at all*! It does not print any partition lines. Compare:

* works:
# parted -m -s -- /dev/nvme0n1 unit b print
BYT;
/dev/nvme0n1:512110190592B:nvme:512:512:gpt:SAMSUNG MZVKW512HMJP-000H1:;
1:1048576B:630194175B:629145600B:fat32:EFI System Partition:boot, esp;
2:630194176B:1703935999B:1073741824B:ext4::;
3:1703936000B:512109838335B:510405902336B:::;

* does not work:
# parted -m -s -- /dev/loop0 unit b print
Error: Invalid partition table - recursive partition on /dev/loop0.
BYT;
/dev/loop0:1048576B:loopback:512:512:unknown:Loopback device:;

Comment 26 Laszlo Ersek 2021-11-23 09:52:40 UTC

Yet more damage. All the libguestfs APIs implemented in "daemon/parted.c" break on such a disk image, because all those operations rely on parted -- and parted just won't work on such a partition table.

I considered manually reimplementing "part_get_parttype" and "part_list" in OCaml, parsing the partition table in binary, if we detect the damage caused by dosfstools. But that's just impossible to do for *all* the libguestfs APIs implemented in "daemon/parted.c".

When the dosfstools developer "pali" said in <https://github.com/dosfstools/dosfstools/issues/161#issuecomment-784200231> that "Kernel correctly handles it and also other standard tools", they were *wrong*. "parted" is a standard tool, and it does *not* handle the bogus partition table.

I think our remaining option here is to contribute a patch to parted. We use parted so extensively in libguestfs that replacing all of it just for the sake of this screwed up dosfstools misfeature is infeasible.

Comment 27 Laszlo Ersek 2021-11-23 10:05:25 UTC

(1) Rich: I have a patch for "--mbr=no", so that at the least we don't create broken partition tables. If you agree, I can post that to the list.

(2) Regarding the other direction, coping with malformed partition tables like this, we need to reassign this bug to -- or make it dependent on a new bug for -- parted. parted is our internal abstraction layer and we use it *very* extensively (I've only realized gradually how extensively in fact); breaking through or dancing around that abstraction in every spot wherever we use it is completely wrong. Not just because it's a lot of coding, but because it's *bad design* (especially that the dosfstools misfeature is mostly useless from our perspective). Should we reassign this bug to parted, or create a new one, or maybe consider contributing a patch to upstream parted?

... For reference, the parted code does this -- in file "libparted/labels/dos.c", function read_table(), at current master (b20227adf575):

^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1018)     /* parse the partitions from this table */
ca4f538076fdf (Petr Uzel            2009-08-26 12:29:26 +0200 1019)     for (i = 0; i < DOS_N_PRI_PARTITIONS; i++) {
b4c53f6d14746 (Jim Meyering         2007-05-30 15:55:30 +0200 1020)             raw_part = &table->partitions [i];
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1021)             if (raw_part->type == PARTITION_EMPTY || !raw_part->length)
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1022)                     continue;
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1023) 
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1024)             /* process nested extended partitions after normal logical
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1025)              * partitions, to make sure we get the order right.
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1026)              */
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1027)             if (is_extended_table && raw_part_is_extended (raw_part))
099eaa4acd0d8 (Jim Meyering         2009-03-05 19:22:39 +0100 1028)                     continue;
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1029) 
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1030)             lba_offset = is_extended_table ? sector : 0;
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1031) 
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1032)             if (linear_start (disk, raw_part, lba_offset) == sector) {
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1033)                     if (ped_exception_throw (
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1034)                             PED_EXCEPTION_ERROR,
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1035)                             PED_EXCEPTION_IGNORE_CANCEL,
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1036)                             _("Invalid partition table - recursive "
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1037)                             "partition on %s."),
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1038)                             disk->dev->path)
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1039)                                     != PED_EXCEPTION_IGNORE)
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1040)                             goto error;
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1041)                     continue;       /* avoid infinite recursion */
^232dbda915df (Anant Narayanan      2006-09-14 15:18:45 +0000 1042)             }

I think we should reassign this BZ to parted.

Comment 28 Laszlo Ersek 2021-11-23 10:19:58 UTC

Basically we have two appliance-side *sets* of utilities, (i) "parted" and (ii) "everything else (including the kernel itself)". Both sets disagree on the validity of the bogus partition table. Trying to hide the disagreement in the libguestfs daemon is just not feasible; the underlying utilities themselves need to be reconciled.

Comment 29 Laszlo Ersek 2021-11-23 10:22:58 UTC

YongkuiGuo: regarding your exact reproducer in comment#0, namely where you create a FAT filesystem with "virt-make-fs", and then check it with "guestfish": that particular use case can be made work with just the --mbr=no patch I have.

Comment 30 Laszlo Ersek 2021-11-23 14:01:27 UTC

[PATCH] daemon/mkfs: disable creation of fake MBR partition table with "mkfs.fat"
Message-Id: <20211123140050.14767-1-lersek>
https://listman.redhat.com/archives/libguestfs/2021-November/msg00211.html

Comment 31 Richard W.M. Jones 2021-11-23 14:23:12 UTC

(In reply to Laszlo Ersek from comment #27)
> (1) Rich: I have a patch for "--mbr=no", so that at the least we don't
> create broken partition tables. If you agree, I can post that to the list.

I reviewed that and it looks good, thanks.

> (2) Regarding the other direction, coping with malformed partition tables
> like this, we need to reassign this bug to -- or make it dependent on a new
> bug for -- parted. parted is our internal abstraction layer and we use it
> *very* extensively (I've only realized gradually how extensively in fact);
> breaking through or dancing around that abstraction in every spot wherever
> we use it is completely wrong. Not just because it's a lot of coding, but
> because it's *bad design* (especially that the dosfstools misfeature is
> mostly useless from our perspective). Should we reassign this bug to parted,
> or create a new one, or maybe consider contributing a patch to upstream
> parted?

Always err on the side of creating a new bug.  In this particular
case we need to keep this bug around because we need to patch
libguestfs in RHEL 9 (at the very least for the --mbr=n fix).
And then if parted needs a fix then it'll also need a bug.

In more general terms about our use of parted - well actually we
use at least three different code bases in libguestfs to read
partition tables (the perils of having code around for over
a decade): parted, sfdisk, and the kernel.  They all have
very peculiar APIs, and they all parse slightly different
partition table formats.  I really don't like parted, because
it tries to be all things to all partition formats, and fails
at all of them.  At some point we had a plan to implement
"libmbr" and "libgpt" (because those are the only two formats
anyone cares about), but that was a bunch of work and we never
did it.  Unifying the whole mess around a couple of straightforward
libraries dedicated to their formats would probably be a good idea,
but is obviously a huge amount of actual work.
 
> I think we should reassign this BZ to parted.

New bug, but yes.  However I don't know if anyone at Red Hat still
works on parted, after Jim Meyering left.

Comment 33 Laszlo Ersek 2021-11-23 16:53:44 UTC

The reproducibility mystery has been solved: the parted breakage depends on the size of the device.

for S in 1M 200M; do
  echo "==== $S ===="
  fallocate -l $S img-$S
  mkfs -t vfat -I -n TEST --mbr=yes img-$S
  parted -m -s -- img-$S unit b print
done

Output:

> ==== 1M ====
> mkfs.fat 4.2 (2021-01-31)
> Error: Invalid partition table - recursive partition on /root/img-1M.
> BYT;
> /root/img-1M:1048576B:file:512:512:unknown::;

> ==== 200M ====
> mkfs.fat 4.2 (2021-01-31)
> BYT;
> /root/img-200M:209715200B:file:512:512:loop::;
> 1:0B:209715199B:209715200B:fat16::;

From comment 19 onward, I've done *everything* with a 1MB FAT image. (Refer to "0x0000_0800" in comment 20.)

Note the difference:

- with the 1M image (using FAT12), parted completely breaks (this is what all my struggles have come from, above)

- with the 200M image (using FAT16), while parted indeed mis-reports the partition table type (it should report "mbr", not "loop"), it does print the (bogus) partition table entry (the first partition's descriptor) transparently.

This implies that a limited / contained workaround in libguestfs could be possible for FAT16 & FAT32 images where the bogus MBR is present, but FAT12 images will break all the parted-dependent APIs still.

At least now I know how to file the parted bug. (I can reproduce the issue with larger images too, forcing FAT12 with mkfs.fat -- FAT12 works up to and including 255MiB, and triggers the parted bug.)

Comment 34 YongkuiGuo 2021-11-24 06:17:19 UTC

(In reply to Laszlo Ersek from comment #29)
> YongkuiGuo: regarding your exact reproducer in comment#0, namely where you
> create a FAT filesystem with "virt-make-fs", and then check it with
> "guestfish": that particular use case can be made work with just the
> --mbr=no patch I have.

Got it. Thanks a lot.

Comment 35 Laszlo Ersek 2021-11-24 06:58:46 UTC

Reported the following parted bugs:
- https://bugzilla.redhat.com/show_bug.cgi?id=2026220
- https://bugzilla.redhat.com/show_bug.cgi?id=2026224

Comment 36 Laszlo Ersek 2021-11-24 07:50:52 UTC

Okay, new status summary.

(i) When the fake MBR sits in sector#0 of a FAT12 fs, parted is wholly broken (bug 2026220).

(ii) With FAT16 or FAT32, parted prints the one partition entry alright, only the device line is broken (bug 2026224).


We can't do anything about (i). On a FAT12 image with the fake MBR, no "parted"-based API of libguestfs will work, until parted is fixed.

On the other hand, FAT12 is mostly irrelevant:

- mkfs.fat, when the FAT variant is not forced with the -F option, switches from FAT12 to FAT16 when the image size grows from 8MB to 9MB. Only a tiny fraction of disk images handled with libguestfs should be smaller than 9MB.

- Even when forced with the "-F 12" option, "mkfs.fat" only permits FAT12 up to and including 255MB image size. A 256MB image requires FAT16 (or FAT32). This means that even when someone insists on creating a FAT12 image, such a disk image is unlikely to be used as an actual virtual machine disk (because it is limited to 255MB in size).


Regarding (ii):

Again, "print_partition_table_machine_readable" is the function that calls "parted" (in OCaml). It returns a tuple (a pair, to be more precise). The first component is the device line, the second component is the list of partition entry lines.

The first use of "print_partition_table_machine_readable" is in "part_list", and there the first component of the return pair is ignored -- therefore, "part_list" is unaffected by the wrong device line.

The second use is in "part_get_parttype", where *only* the device line matters; the second component of the return pair is ignored. The device line is what this bug is about. Therefore, it should be possible to work around bug 2026224 for FAT16 and FAT32 images -- i.e., for the majority of FAT images used as virtual machine disks, most likely -- in "part_get_parttype".

Regarding "daemon/parted.c" (that is, the parted-based APIs implemented in C), the partition table is fetched by the internal function print_partition_table(). It is consumed by two other (public API) functions, print_partition_table() and do_part_get_name(). Luckily, neither cares about the device line, so the C source need not be modified.

Comment 37 Laszlo Ersek 2021-11-24 10:38:28 UTC

[PATCH v2 0/5] work around part table type misreporting by "parted"
Message-Id: <20211124103747.9751-1-lersek>
https://listman.redhat.com/archives/libguestfs/2021-November/msg00219.html

Comment 38 Laszlo Ersek 2021-11-25 09:51:49 UTC

[PATCH v3 0/5] work around part table type misreporting by "parted"
Message-Id: <20211125094954.9713-1-lersek>
https://listman.redhat.com/archives/libguestfs/2021-November/msg00236.html

Comment 39 Laszlo Ersek 2021-11-26 09:39:06 UTC

(In reply to Laszlo Ersek from comment #38)
> [PATCH v3 0/5] work around part table type misreporting by "parted"
> Message-Id: <20211125094954.9713-1-lersek>
> https://listman.redhat.com/archives/libguestfs/2021-November/msg00236.html

Upstream commit range e7f72ab146b9..d829f9ff9ae0 (with a small tweak to patch#4, according to Rich's feedback.)

Comment 42 YongkuiGuo 2021-12-10 09:08:54 UTC

Test with package:
libguestfs-1.46.1-1.el9.x86_64


Steps:

1. On rhel9 host
$ guestfish -N disk mkfs vfat /dev/sda label:TEST

2. 
$ file test1.img
test1.img: DOS/MBR boot sector, code offset 0x58+2, OEM-ID "mkfs.fat", sectors/cluster 8, Media descriptor 0xf8, sectors/track 61, heads 34, sectors 2097119 (volumes > 32 MB), FAT (32 bit), sectors/FAT 2048, serial number 0x8004f182, label: "TEST       "

3.
$ guestfish -a test1.img run : list-filesystems
/dev/sda: vfat

It works well.

Comment 47 YongkuiGuo 2022-02-15 09:03:04 UTC

Verified with package:
libguestfs-1.46.1-2.el9.x86_64


Steps:

1. On rhel9 host
$ guestfish -N disk mkfs vfat /dev/sda label:TEST

2. 
$ file test1.img
test1.img: DOS/MBR boot sector, code offset 0x58+2, OEM-ID "mkfs.fat", sectors/cluster 8, Media descriptor 0xf8, sectors/track 61, heads 34, sectors 2097119 (volumes > 32 MB), FAT (32 bit), sectors/FAT 2048, serial number 0x8004f182, label: "TEST       "

3.
$ guestfish -a test1.img run : list-filesystems
/dev/sda: vfat

It works.

Comment 49 errata-xmlrpc 2022-05-17 12:28:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: libguestfs), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2317

Note You need to log in before you can comment on or make changes to this bug.