1303157 – fdisk does not wipe existing signatures before creating partition table

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1303157 - fdisk does not wipe existing signatures before creating partition table

Summary: fdisk does not wipe existing signatures before creating partition table

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	util-linux
Sub Component:
Version:	7.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Karel Zak
QA Contact:	qe-baseos-daemons
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1298243 1393867 1400961
TreeView+	depends on / blocked

Reported:	2016-01-29 17:46 UTC by John Pittman
Modified:	2019-09-12 09:52 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-01-17 14:26:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description John Pittman 2016-01-29 17:46:32 UTC

Description of problem:

If there is a leftover msdos label from an erroneous partition creation, the label prevents VG activation at boot.  The VG can be manually activated with 'vgchange -ay'.

Version-Release number of selected component (if applicable):

kernel-3.10.0-327.4.5.el7.x86_64
lvm2-2.02.130-5.el7.x86_64
lvm2-libs-2.02.130-5.el7.x86_64

How reproducible (commands taken from 'history'; data gathering commands omitted):

- Create initial lvm structure, format, add to fstab, reboot.

   20  pvcreate /dev/sdb
   21  vgcreate testvg /dev/sdb
   22  lvcreate -n testlv -l +100%FREE testvg
   23  mkfs.ext4 /dev/mapper/testvg-testlv
   27  vi /etc/fstab 
   28  mkdir /test
   29  reboot

- Create partition table, then remove.

   31  fdisk /dev/sdb
>n
>p
>1
>Enter
>Enter
>w

- Remove partition table.

   33  fdisk /dev/sdb
>d
>1
>w

- Reboot.  At this point the job will fail for testvg/testlv; drops to maintenance.

systemd[1]: Job dev-mapper-testvg\x2dtestlv.device/start timed out
systemd[1]:Timed out waiting for device dev-mapper-testvg\x2dtestlv.device
systemd[1]:Dependency failed for File System Check on /dev/mapper/testvg-testlv
systemd[1]:Job systemd-fsck@dev-mapper-testvg\x2dtestlv.service/start failed with result 'dependency'
systemd[1]:Job dev-mapper-testvg\x2dtestlv.device/start failed with result 'timeout

Actual results:

System goes into maintenance mode at reboot if lvmetad is enabled.

Expected results:

In my opinion, we should not fail.  The partition table is gone, so we should be able to pick up the PV and allow activation.  At a minimum we should be consistent.  Either the VG should activate or it shouldn't.  If it doesn't, perhaps failed activation should be coupled with an alert that a label exists on the PV?

Workarounds:

- add rd.lvm.lv=testvg/testlv to grub line
- comment filesystem from fstab, activate from runlevel 3
- set use_lvmetad to 0 in lvm.conf

I have a reproducer, so if any help is needed at all, please let me know. I'd be glad to help where I can.

John

Comment 1 Peter Rajnoha 2016-02-01 09:49:19 UTC

The reason here is that blkid gives preference to partition table over LVM2 PV signature if both are found on disk.

When using lvmetad, LVM2 is in event-based autoactivation mode which means it runs "pvscan --cache -aay" (via lvm2-pvscan@major:minor.service) for each device which is identified by blkid as LVM2_member. The "pvscan --cache -aay" informs lvmetad about new PVs that are present in the system and if it finds that this is the last PVs which makes the VG complete, it activates the whole VG. If the device is not identified as LVM2_member by blkid, then there's no "pvscan" triggered and hence lvmetad doesn't know about such device at all and also it can't do any autoactivation.

The exact problem here is mixing two different signatures which should be always avoided. For example, lvm2 wipes all existing signatures before creating a its own PV signature on that device. I suppose fdisk should be doing the same and it should either reject creation of partition tables if it finds that there's already an existing signature or it should provide a way to wipe it. LVM2 uses libblkid to do this detection of existing signatures on device before it continues to create its own signature.

Simply, it's not correct to mix several signatures. You would need to remove partition table completely for this to work again (the sequence you reported "d 1 w" just removes the partition, not the partition table).

You can also check the output of:
blkid -o udev <path_to_device>
wipefs <path_to_device>

For example:

# blkid -o udev /dev/sda
ID_PART_TABLE_UUID=c08a9ff6
ID_PART_TABLE_TYPE=dos

# wipefs /dev/sda
offset type
----------------------------------------------------------------
0x1fe dos [partition table]

0x218 LVM2_member [raid]
UUID: GBi2bu-fdld-R4UY-HU66-od4x-G4Nw-8aiwSc

You will see that blkid gives you only one signature - blkid needs to return only one type to identify the device deterministically. The wipefs will list all signatures found. If more signatures are found, most of the time it's a pure bug - you can't even be sure that one signature is not overlapping part of another signature and hence damaging it.

As said above, if we want to go the safe way, we either need:

A: fdisk to do signature detection and possible wiping before it creates its own signatures (or it should just call "wipefs -a <path_to_device>" before it creates its own signature)

B: blkid to return with an error if more signatures are found when using "blkid -o udev" which is used in udev to export information about device. Or if it finds partition table (without any partitions defined) and LVM2 signature, it should give preference to LVM2.

Comment 2 Peter Rajnoha 2016-02-01 09:58:09 UTC

Parted is better here - when it creates partition tabel, it also wipes LVM2 PV signature, hence we end up with one signature and deterministic state of the device, not mixing the two signatures together.

Comment 4 Peter Rajnoha 2016-02-01 13:47:08 UTC

Note: the reason this scenario works without lvmetad is because in this case, we do not rely on blkid to tell us whether this is a PV or not - LVM commands do the scan on their own and if it sees dos partition table *without any partitions* defined, it just ignores that and it gives preference to the LVM PV signature it finds next. If there was at least one partition defined, LVM would filter such device out and it would consider such device as not being able to hold PV signature at the same time. So that's for completeness why this doesn't work with lvmetad (where we rely on blkid result within udev rule execution) and why it works without it (where LVM2 scans for partition table itself).

Comment 5 John Pittman 2016-02-01 20:09:10 UTC

If this were changed, would cfdisk be changed as well?  I checked sfdisk and couldn't find any option to delete partitions.

Are there other partitioning tools that should be included?

John

Comment 6 Peter Rajnoha 2016-02-02 07:56:51 UTC

(In reply to John Pittman from comment #5)
> If this were changed, would cfdisk be changed as well?  I checked sfdisk and
> couldn't find any option to delete partitions.
> 

Yes, if cfdisk doesn't do that already, it should as well...

Comment 7 Karel Zak 2016-02-02 09:31:55 UTC

The current fdisk upstream prints warning and recommends wipefs if the device already contains a filesystem/LVM/ signature. It does not delete foreign signatures automatically -- maybe we can enable it.

I don't plan to implement into libblkid any policies "if PT without partitions then prefer LVM" ... that's too complex and too crazy. It's better to force people to keep their disk without mess and wipe devices in partitioning tools and mkfs-like utils.

Note for comment #0, delete all partitions does not mean that whole partition table is gone. It's pretty valid use-case to have empty partition table without partitions.

Comment 8 Karel Zak 2016-02-02 09:45:23 UTC

Note that the current fdisk and cfdisk upstream wipe "bootbits" (area before the first sector).

Comment 9 Karel Zak 2016-04-06 09:22:19 UTC

Since v2.28 fdisks wipe all device when executed in interactive mode and a new command line option --wipe=auto|never|always controls this behaviour. 

Unfortunately, this cannot be backported to RHEL7.

Comment 12 Karel Zak 2017-01-17 14:26:38 UTC

Closing. fdisk has been improved in upstream tree, but it change is too invasive to backport to RHEL7.

Note You need to log in before you can comment on or make changes to this bug.