Red Hat Bugzilla – Bug 154841
Disk druid overwrites whole device LVM PV's with empty partition table
Last modified: 2007-11-30 17:07:17 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050324 Firefox/1.0.2 Red Hat/1.0.2-1.4.1.centos4
Description of problem:
I have a system with to attached RAIDs at /dev/sda and /dev/sdb
and a plain IDE disk at /dev/hda
I had RH7.3 installed on /dev/hda1 and had made
sda and sdb into two separate volume groups with:
vgcreate vg1 /dev/sda
vgcreate vg2 /dev/sdb
with several logical volumes on each
Today I upgraded to RHEL4 and made sure when Disk Druid ran to reformat
only the / partition on /dev/hda1. After the upgrade, it could not find
any volume groups:
# vgscan --verbose
Wiping cache of LVM-capable devices
Wiping internal cache
Reading all physical volumes. This may take a while...
Finding all volume groups
No volume groups found
Doing a 'fdisk /dev/sda' and 'fdisk /dev/sdb' showed that they now had empty
partition tables written on them. The only thing that I can see could have
done this is Disk Druid which must be programed to write partition tables to
any disk it sees that doesn't have one without asking the user.
Luckily I had backed up /etc from the old RH7.3 partition and was able to
use vgcfgrestore to recover. But this could have made me loose Terabytes of data.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Have LVM PV's on whole disks (e.g /dev/sda)
2. Install RHEL4
3. Those whole disks now are corrupted due to empty partition tables being
written to them (by Disk Druid most likely)
Actual Results: I lost my LVM volume groupos
Expected Results: Disk Druid should have left those disks alone
Similar issue with installation on a system with multiple paths to the same
disks, hence showing up as sda, sdb, sdc and sdd. All partition tables were
blank. I only changed sda (partitioning it and setting up the LVM the way I
wanted it). The installation appeared fine until I rebooted it and sda had an
empty partition table.
It looks like this happens)
* choose the partition scheme
* partitions for sda get set up as requested.
* partitions for sdb, c and d get reinitialised as shown, but not requested.
Since they are all paths to the same disk, goodbye partition table.
I was fortunately paying attention to allocation for partitions and was able to
fdisk the partition table back to what it should have been and recover the
installation, but if you don't change a partition then the installer shouldn't
force any partition scheme on it.
We do not support upgrades from RHL to RHEL.
See the last sentence in the second paragraph of the following FAQ:
I was doing a fresh install, not an upgrade. I have never done anything
but fresh installs.
Your missing a very CRITICAL point here. The RHEL install program
is trashing real data on disks it should not be touching! It will happen
no matter what might already be installed on the machine. Has nothing
to do with fresh install vs. upgrade.
This IS A BUG! A LVM PV on a whole disk (i.e. sda instead of sda1)
gets trashed by anaconda
To be more specific, what I suspect is happening is that Disk Druid is writing
empty partition tables on to any disk it finds on the computer that does not
already have a partition table. This is trashing the LVM PV. Disk Druid should
not write to a disk in any way unless the user explicity requests an operation
in the Disk Druid GUI for that disk.
This is really an unsupported setup -- the PV for lvm shouldn't be the whole
disk, but rather a partition of type 8e that spans it.
We may add a test for this case, but at best it'll just exclude the device from
all use -- probably not completely optimal, but this is really a case where the
LVM was set up incorrectly.
I disagree with your opinion that LVM was setup incorrectly. If making whole
disk PVs was incorrect, then pvcreate should not allow it. Hell, the LVM
HOWTO even has examples doing whole disk PVs.
And what do you mean "exclude the device from all use"?
There is a fundemental issue here that really has nothing to do with LVM
specifically. Disk Druid should not be modifying any disk it finds in any way
without express consent from the user. Other users in the world may be using
whole disks (i.e. sans any partition table) for other applications (e.g. raw
data dump from some extremely fast data acquistion device).
Perhaps the summary should be "Disk druid overwrites whole device partition
tables which have not had changes requested".
In my case, the setup is supported - I partitioned the first logical view of
the physical disk (128M for /boot (ext3 on native Linux partition), the rest
for LVM), set up LVM on the relevant partition, and left the other three (SAN
multipathed) copies untouched. My guess is it partitioned the first copy as
requested, then partitioned the remaining 3 as it saw them - blank, thus wiping
the original setup 3 times.
As it had cached the partition table for sda after performing the operation,
the install was only too happy to install onto the disk as thought it to be,
but rebooted to a blank partition table - no /boot, and no LVM. My workaround
solution was to boot in rescue mode, partition /dev/sda exactly as it was,
rewrite the partition table and reboot. I got lucky - it worked. But what
other unrequested messing around is done by the Disk Druid in an enterprise
It seems the simplest solution is to identify all partition tables modified as
the GUI goes - record them in a list or whatever, and update only those with
the Disk Druid at the stage those operations are done.
The fact that the low-level tools make it possible to configure your system in a
particular way is in no way related to if that configuration is one that is a
supported configuration for Red Hat Enterprise Linux.
That being said, I can't reproduce the behavior you're claiming happens. If I
make an lvm PV on /dev/sdb, and then boot into the RHEL 4, it gives me a window
claiming that /dev/sdb doesn't have a valid partition table. That window offers
the choice of initializing the disk or ignoring it.
If you tell it to initialize the disk, it writes a partition table. Is that
what you're doing?
More like, you tell it to initialise sda and sdb, and don't tell it to touch
sdc and sdd. Then it goes ahead and writes all partition tables as it had
displayed, _even those you do not request to be changed_. So sda and sdb are
changed as requested, and if there was data in the partition table on sdc and
sdd which it didn't recognise, well bad luck, it's just been reinitialised.
I don't believe it has much to do with LVM at all, just that LVM metadata is a
victim of the reinitialisation which has not been requested.
We don't recall seeing any question about initializing the disk or not.
It is within the realm of possibility we somehow "spaced" past that.
ONe thing is we were doing a kickstart install. But with no directives
about disks except for 'zerombr yes'. The kickstart config had no 'clearpart'.
and no 'part'. This has always made the kickstart go interactive for
the Disk Druid part and then continue on automatically with the rest.
From the documentation:
If zerombr is specified, and yes is its sole argument, any invalid
partition tables found on disks are initialized. This will destroy
all of the contents of disks with invalid partition tables. This
command should be in the following format:
No other format is effective.
So you've put in the kickstart config that you want it to clear partition tables
that don't make sense, and at the same time you're expecting it'll leave data
untouched on a disk which isn't partitioned. These two goals are totally
That seems to be it, yes. It didn't dawn on me that in dealing with an MBR,
anything other than the boot disk would be involved.
I take it that zerombr is really unnecessary anyway if one will be doing a:
or does that also affect every disk and not just the boot disk?