Description of problem:
When a disk using GPT partition scheme has a partially corrupt backup GPT table, anaconda treats the disk as completely blank, without a partition at all. Other tools see the corruption and honor the valid primary header+table.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a GPT partitioned disk with any number of partitions.
2. Alter one byte in the LBA -33 (the sector for the secondary table).
3. Launch anaconda and attempt an installation.
After checking the disk in Installation Destination, Installation Options claims the disk has 81.91GB of free space, which is the size of the entire disk. When clicking on review/modify to customize, Manual Partitioning lists nothing.
Not this behavior. Alternatives could be a ! on the disk icon in Installation Destination to prevent the user from selecting it all; or anaconda allows the use of the disk showing the contents based on the valid primary GPT.
Untested whether the same behavior occurs if the primary is corrupt and the secondary is valid.
[root@localhost ~]# parted -s /dev/vda u s p
Error: The backup GPT table is corrupt, but the primary appears OK, so that will be used.
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 167772160s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 2048s 167772126s 167770079s ext4 Linux filesystem
[root@localhost ~]# gdisk -l /dev/vda
GPT fdisk (gdisk) version 0.8.7
Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
on the recovery & transformation menu to examine the two tables.
Warning! One or more CRCs don't match. You should repair the disk!
Partition table scan:
BSD: not present
APM: not present
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
Disk /dev/vda: 167772160 sectors, 80.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 6013791C-C1D0-40B7-81D4-B3EB6793A86C
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 167772126
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)
Number Start (sector) End (sector) Size Code Name
1 2048 167772126 80.0 GiB 8300 Linux filesystem
Created attachment 813833 [details]
Created attachment 813834 [details]
Created attachment 813835 [details]
tflink thinks this may be a duplicate of bug 1019541. I'm reproducing this in qemu/kvm however so it's not limited to UEFI.
UEFI 2.3.1 Errata C page 162: "If the backup GPT is valid it must be used to restore the primary GPT. If the primary GPT is valid and the backup GPT is invalid software must restore the backup GPT."
Further, since the protective MBR is in place and valid, the disk is a valid GPT disk with a corrupt backup GPT.
Proposing as beta blocker: "When using the guided partitioning flow, the installer must be able to cleanly install to a disk with a valid ms-dos or gpt disk label and partition table which contains existing data...Remove existing storage volumes to free up space, at the user's direction...Reject or disallow invalid disk and volume configurations without crashing." And custom partitioning criteria additional requires ability for user to "Remove existing storage volumes; Assign mount points to existing storage volumes."
Sorry that's page 106 of the June 27 2012 v 2.3.1 Errata C UEFI spec.
(In reply to Chris Murphy from comment #4)
> tflink thinks this may be a duplicate of bug 1019541. I'm reproducing this
> in qemu/kvm however so it's not limited to UEFI.
Not a complete duplicate, no. 1019541 has 2 parts - the part where the corrupt secondary gpt table was showing the disk as empty and the part where the empty disk was triggering a reset of the selected disks.
This is the same as the first part of that where the disks showed up blank. It's probably better to keep this separate from the selected disks being reset.
Here's an updates.img with the fix for 1019541 against TC5, please give it a try:
Discussed this in 2013-10-21 Blocker Review Meeting . It was agreed that this is a very tough call, and if it turns out that the 1019541 fix fixes it, we can happily dodge making that tough call. So for now the evaluation is postponed until we check whether that fix also fixes this bug.
/tmp has a 13625 bytes updates.img file in it; this bug is still present.
Discussed this again in 2013-10-21 Blocker Review Meeting . Accepted as a freeze exception issue as a severe and potentially dangerous misbehaviour of the installer in a fairly unusual condition. agreement cannot be reached on blocker status, will be discussed again at the next meeting.
Created attachment 814720 [details]
Created attachment 814721 [details]
Created attachment 814722 [details]
Created attachment 814723 [details]
Created attachment 814725 [details]
Reproducible in anaconda-20.25.2-1, and in shipping versions since at least Fedora 17. What's strange is their storage.log's recognize the GPT on disk, see the partitions, their volume names, formats, UUIDs, etc. but then it's discarded in favor of creating a new msdos disklabel.
Additional testing: when corrupting only the primary GPT table, anaconda also considers the disk blank.
On the one hand, this is a really nasty bug. The entire point of having two GPTs on disk is completely obviated by a bug like this.
On the other hand, it looks like a textbook example of a "conditional blocker" as described in the Blocker Bug FAQ. The bug does not cause occurrence of this rare condition; the bug is old, and has just been found after feature freeze. A proposal to document the bug for beta release and simultaneously accepted as final blocker, seems reasonable. I think the point of conditional blocker is to recognize a nasty bug, and without excusing it, allow devs to sort out the when-to-fix between now and final release rather than enforcing the written criteria as abruptly as this bug was discovered.
So I'm fine with freeze exception, and I won't vote on a proposal to block beta release if it comes up at the next blocker review. I would vote on escalating it now to final blocker.
Discussed at 2013-10-23 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-10-23/f20-blocker-review.2013-10-23-16.00.log.txt . After some reflection, there was a consensus that this does not need to block Beta release, but should be proposed for Final and will likely be accepted.
In fact, we went ahead and accepted it as a final blocker already, in the above meeting.
Unsurprising, but this bug also affects anaconda rescue mode; if the primary GPT table is zero'd, anaconda rescue says no linux partitions can be found.
In conjunction with grub2 bug 1022743, we get a probably rare but eyeball opening situation where we can't boot or rescue the system even though it has a valid backup GPT available. (Just stating facts, not trying to beat a dead horse here.)
Status reviewed at 2013-11-20 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-11-20/f20-blocker-review.2013-11-20-17.00.log.txt . This bug seems to be clearly defined and reproducible and is now just waiting in the queue to be fixed by anaconda devs.
Here is a possible fix for this:
Disks with corrupt GPT labels will show up as 'Invalid Disk Label'. You can delete it and re-partition it if you want. It will *not* be fixed by Anaconda. For that you will need to use other tools.
Note, the fix is somewhat complex and has the theoretical possibility that it might break something in some other case. if you have a bit of time, cmurf, it'd be great if you could throw your exotic-storage-configuration kung fu at it and see if you can break it.
Is blivet deferring to parted to determine whether a gpt is corrupt? That narrows the test parameters a ton.
Works for me, no regressions.
This is not exhaustive, but I corrupted a gpt 4 different ways affecting the primary header, primary table, secondary header, and secondary table. Each is recognized by the installer as an invalid disk label. Before corrupting, and after repairing with gdisk (as parted lacks a specific repair or rewrite feature) the installer recognizes the gpt as valid and modifiable.
Uncertain what the lvm related changes in the patch are about or how to test if there are regressions in that area.
(In reply to Chris Murphy from comment #24)
> Is blivet deferring to parted to determine whether a gpt is corrupt? That
> narrows the test parameters a ton.
parted (via pyparted) is what raises the error when the label is blank or corrupt or unknown.
Here's a slightly modified patch. You could crash the previous one by using custom to delete the invalid disk and then using the custom autopart:
anaconda-20.25.14-1.fc20, python-blivet-0.23.8-1.fc20 has been submitted as an update for Fedora 20.
Package anaconda-20.25.14-1.fc20, python-blivet-0.23.8-1.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing anaconda-20.25.14-1.fc20 python-blivet-0.23.8-1.fc20'
as soon as you are able to.
Please go to the following url:
then log in and leave karma (feedback).
cmurf: when you have a sec could you confirm again with TC5, which includes the update? thanks!
anaconda-20.25.14-1.fc20, python-blivet-0.23.8-1.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
With anaconda-20.25.15-1 I am still having this problem: Disks show as "Invalid disk label" with a valid primary GPT but invalid backup GPT regions. Fixing them by rewriting with gdisk fixes the problems.
(In reply to Jason Knight from comment #32)
That's the expected behavior. If either GPT header or table is corrupt, then the installer considers the whole disk invalid, because it's deciding not to fix the problem.