Bug 1020974 - incorrectly treats a disk with partially corrupt GPT as having no partition at all
Summary: incorrectly treats a disk with partially corrupt GPT as having no partition a...
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Brian Lane
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Keywords:
Depends On:
Blocks: F20FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2013-10-18 16:00 UTC by Chris Murphy
Modified: 2014-07-24 17:51 UTC (History)
14 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2013-12-10 06:53:46 UTC


Attachments (Terms of Use)
anaconda.log (10.47 KB, text/plain)
2013-10-18 16:02 UTC, Chris Murphy
no flags Details
program.log (17.51 KB, text/plain)
2013-10-18 16:02 UTC, Chris Murphy
no flags Details
storage.state (20.00 KB, text/plain)
2013-10-18 16:03 UTC, Chris Murphy
no flags Details
anaconda.log c10 (5.39 KB, text/plain)
2013-10-21 18:00 UTC, Chris Murphy
no flags Details
program.log c10 (17.13 KB, text/plain)
2013-10-21 18:01 UTC, Chris Murphy
no flags Details
storage.log c10 (84.28 KB, text/plain)
2013-10-21 18:01 UTC, Chris Murphy
no flags Details
storage.state (20.00 KB, application/octet-stream)
2013-10-21 18:01 UTC, Chris Murphy
no flags Details
storage.state c10 (20.00 KB, application/octet-stream)
2013-10-21 18:03 UTC, Chris Murphy
no flags Details

Description Chris Murphy 2013-10-18 16:00:07 UTC
Description of problem:
When a disk using GPT partition scheme has a partially corrupt backup GPT table, anaconda treats the disk as completely blank, without a partition at all. Other tools see the corruption and honor the valid primary header+table.


Version-Release number of selected component (if applicable):
anaconda-20.25.1-1.fc20.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create a GPT partitioned disk with any number of partitions.
2. Alter one byte in the LBA -33 (the sector for the secondary table).
3. Launch anaconda and attempt an installation.

Actual results:
After checking the disk in Installation Destination, Installation Options claims the disk has 81.91GB of free space, which is the size of the entire disk. When clicking on review/modify to customize, Manual Partitioning lists nothing.

Expected results:

Not this behavior. Alternatives could be a ! on the disk icon in Installation Destination to prevent the user from selecting it all; or anaconda allows the use of the disk showing the contents based on the valid primary GPT.


Additional info:

Untested whether the same behavior occurs if the primary is corrupt and the secondary is valid.


[root@localhost ~]# parted -s /dev/vda u s p
Error: The backup GPT table is corrupt, but the primary appears OK, so that will be used.
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 167772160s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start  End         Size        File system  Name              Flags
 1      2048s  167772126s  167770079s  ext4         Linux filesystem



[root@localhost ~]# gdisk -l /dev/vda
GPT fdisk (gdisk) version 0.8.7

Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
on the recovery & transformation menu to examine the two tables.

Warning! One or more CRCs don't match. You should repair the disk!

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: damaged

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Disk /dev/vda: 167772160 sectors, 80.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 6013791C-C1D0-40B7-81D4-B3EB6793A86C
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 167772126
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048       167772126   80.0 GiB    8300  Linux filesystem

Comment 1 Chris Murphy 2013-10-18 16:02:36 UTC
Created attachment 813833 [details]
anaconda.log

Comment 2 Chris Murphy 2013-10-18 16:02:50 UTC
Created attachment 813834 [details]
program.log

Comment 3 Chris Murphy 2013-10-18 16:03:11 UTC
Created attachment 813835 [details]
storage.state

Comment 4 Chris Murphy 2013-10-18 16:40:42 UTC
tflink thinks this may be a duplicate of bug 1019541. I'm reproducing this in qemu/kvm however so it's not limited to UEFI.

Comment 5 Chris Murphy 2013-10-18 21:09:54 UTC
UEFI 2.3.1 Errata C page 162: "If the backup GPT is valid it must be used to restore the primary GPT. If the primary GPT is valid and the backup GPT is invalid software must restore the backup GPT." 

Further, since the protective MBR is in place and valid, the disk is a valid GPT disk with a corrupt backup GPT.

Proposing as beta blocker: "When using the guided partitioning flow, the installer must be able to cleanly install to a disk with a valid ms-dos or gpt disk label and partition table which contains existing data...Remove existing storage volumes to free up space, at the user's direction...Reject or disallow invalid disk and volume configurations without crashing." And custom partitioning criteria additional requires ability for user to "Remove existing storage volumes; Assign mount points to existing storage volumes."

Comment 6 Chris Murphy 2013-10-18 21:45:38 UTC
Sorry that's page 106 of the June 27 2012 v 2.3.1 Errata C UEFI spec.

Comment 7 Tim Flink 2013-10-21 16:13:05 UTC
(In reply to Chris Murphy from comment #4)
> tflink thinks this may be a duplicate of bug 1019541. I'm reproducing this
> in qemu/kvm however so it's not limited to UEFI.

Not a complete duplicate, no. 1019541 has 2 parts - the part where the corrupt secondary gpt table was showing the disk as empty and the part where the empty disk was triggering a reset of the selected disks.

This is the same as the first part of that where the disks showed up blank. It's probably better to keep this separate from the selected disks being reset.

Comment 8 Brian Lane 2013-10-21 16:25:08 UTC
Here's an updates.img with the fix for 1019541 against TC5, please give it a try:

http://bcl.fedorapeople.org/updates/1020974.img

Comment 9 Mike Ruckman 2013-10-21 16:31:22 UTC
Discussed this in 2013-10-21 Blocker Review Meeting [1]. It was agreed that this is a very tough call, and if it turns out that the 1019541 fix fixes it, we can happily dodge making that tough call. So for now the evaluation is postponed until we check whether that fix also fixes this bug.

[1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-10-21/

Comment 10 Chris Murphy 2013-10-21 16:53:21 UTC
/tmp has a 13625 bytes updates.img file in it; this bug is still present.

Comment 11 Mike Ruckman 2013-10-21 17:57:25 UTC
Discussed this again in 2013-10-21 Blocker Review Meeting [1]. Accepted as a freeze exception issue as a severe and potentially dangerous misbehaviour of the installer in a fairly unusual condition. agreement cannot be reached on blocker status, will be discussed again at the next meeting.

[1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-10-21/

Comment 12 Chris Murphy 2013-10-21 18:00:57 UTC
Created attachment 814720 [details]
anaconda.log c10

Applied:
http://bcl.fedorapeople.org/updates/1020974.img

Comment 13 Chris Murphy 2013-10-21 18:01:17 UTC
Created attachment 814721 [details]
program.log c10

Applied:
http://bcl.fedorapeople.org/updates/1020974.img

Comment 14 Chris Murphy 2013-10-21 18:01:32 UTC
Created attachment 814722 [details]
storage.log c10

Applied:
http://bcl.fedorapeople.org/updates/1020974.img

Comment 15 Chris Murphy 2013-10-21 18:01:53 UTC
Created attachment 814723 [details]
storage.state

Applied:
http://bcl.fedorapeople.org/updates/1020974.img

Comment 16 Chris Murphy 2013-10-21 18:03:38 UTC
Created attachment 814725 [details]
storage.state c10

Applied:
http://bcl.fedorapeople.org/updates/1020974.img

Comment 17 Chris Murphy 2013-10-22 17:15:01 UTC
Reproducible in anaconda-20.25.2-1, and in shipping versions since at least Fedora 17. What's strange is their storage.log's recognize the GPT on disk, see the partitions, their volume names, formats, UUIDs, etc. but then it's discarded in favor of creating a new msdos disklabel.

Additional testing: when corrupting only the primary GPT table, anaconda also considers the disk blank.

On the one hand, this is a really nasty bug. The entire point of having two GPTs on disk is completely obviated by a bug like this.

On the other hand, it looks like a textbook example of a "conditional blocker" as described in the Blocker Bug FAQ. The bug does not cause occurrence of this rare condition; the bug is old, and has just been found after feature freeze. A proposal to document the bug for beta release and simultaneously accepted as final blocker, seems reasonable. I think the point of conditional blocker is to recognize a nasty bug, and without excusing it, allow devs to sort out the when-to-fix between now and final release rather than enforcing the written criteria as abruptly as this bug was discovered.

So I'm fine with freeze exception, and I won't vote on a proposal to block beta release if it comes up at the next blocker review. I would vote on escalating it now to final blocker.

Comment 18 Adam Williamson 2013-10-23 17:18:04 UTC
Discussed at 2013-10-23 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-10-23/f20-blocker-review.2013-10-23-16.00.log.txt . After some reflection, there was a consensus that this does not need to block Beta release, but should be proposed for Final and will likely be accepted.

Comment 19 Adam Williamson 2013-10-23 17:24:12 UTC
In fact, we went ahead and accepted it as a final blocker already, in the above meeting.

Comment 20 Chris Murphy 2013-10-23 22:38:02 UTC
Unsurprising, but this bug also affects anaconda rescue mode; if the primary GPT table is zero'd, anaconda rescue says no linux partitions can be found.

In conjunction with grub2 bug 1022743, we get a probably rare but eyeball opening situation where we can't boot or rescue the system even though it has a valid backup GPT available. (Just stating facts, not trying to beat a dead horse here.)

Comment 21 Adam Williamson 2013-11-20 19:30:19 UTC
Status reviewed at 2013-11-20 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-11-20/f20-blocker-review.2013-11-20-17.00.log.txt . This bug seems to be clearly defined and reproducible and is now just waiting in the queue to be fixed by anaconda devs.

Comment 22 Brian Lane 2013-12-04 01:49:18 UTC
Here is a possible fix for this:

http://bcl.fedorapeople.org/updates/1020974.img

Disks with corrupt GPT labels will show up as 'Invalid Disk Label'. You can delete it and re-partition it if you want. It will *not* be fixed by Anaconda. For that you will need to use other tools.

Comment 23 Adam Williamson 2013-12-04 02:08:00 UTC
Note, the fix is somewhat complex and has the theoretical possibility that it might break something in some other case. if you have a bit of time, cmurf, it'd be great if you could throw your exotic-storage-configuration kung fu at it and see if you can break it.

Comment 24 Chris Murphy 2013-12-04 02:48:00 UTC
Is blivet deferring to parted to determine whether a gpt is corrupt? That narrows the test parameters a ton.

Comment 25 Chris Murphy 2013-12-04 06:36:37 UTC
Works for me, no regressions.

This is not exhaustive, but I corrupted a gpt 4 different ways affecting the primary header, primary table, secondary header, and secondary table. Each is recognized by the installer as an invalid disk label. Before corrupting, and after repairing with gdisk (as parted lacks a specific repair or rewrite feature) the installer recognizes the gpt as valid and modifiable.

Uncertain what the lvm related changes in the patch are about or how to test if there are regressions in that area.

Comment 26 Brian Lane 2013-12-04 15:54:38 UTC
(In reply to Chris Murphy from comment #24)
> Is blivet deferring to parted to determine whether a gpt is corrupt? That
> narrows the test parameters a ton.

parted (via pyparted) is what raises the error when the label is blank or corrupt or unknown.

Comment 27 Brian Lane 2013-12-04 19:20:52 UTC
Here's a slightly modified patch. You could crash the previous one by using custom to delete the invalid disk and then using the custom autopart:

http://bcl.fedorapeople.org/updates/1020974.2.img

Comment 28 Fedora Update System 2013-12-05 01:15:20 UTC
anaconda-20.25.14-1.fc20, python-blivet-0.23.8-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/python-blivet-0.23.8-1.fc20,anaconda-20.25.14-1.fc20

Comment 29 Fedora Update System 2013-12-05 21:25:07 UTC
Package anaconda-20.25.14-1.fc20, python-blivet-0.23.8-1.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing anaconda-20.25.14-1.fc20 python-blivet-0.23.8-1.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-22800/python-blivet-0.23.8-1.fc20,anaconda-20.25.14-1.fc20
then log in and leave karma (feedback).

Comment 30 Adam Williamson 2013-12-05 22:48:09 UTC
cmurf: when you have a sec could you confirm again with TC5, which includes the update? thanks!

Comment 31 Fedora Update System 2013-12-10 06:53:46 UTC
anaconda-20.25.14-1.fc20, python-blivet-0.23.8-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 32 Jason Knight 2014-07-24 17:41:25 UTC
With anaconda-20.25.15-1 I am still having this problem: Disks show as "Invalid disk label" with a valid primary GPT but invalid backup GPT regions. Fixing them by rewriting with gdisk fixes the problems.

Comment 33 Chris Murphy 2014-07-24 17:51:28 UTC
(In reply to Jason Knight from comment #32)
That's the expected behavior. If either GPT header or table is corrupt, then the installer considers the whole disk invalid, because it's deciding not to fix the problem.


Note You need to log in before you can comment on or make changes to this bug.