154841 – Disk druid overwrites whole device LVM PV's with empty partition table

Bug 154841 - Disk druid overwrites whole device LVM PV's with empty partition table

Summary: Disk druid overwrites whole device LVM PV's with empty partition table

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	anaconda
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Anaconda Maintenance Team
QA Contact:	Mike McLean
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-04-14 14:59 UTC by Paul Raines
Modified:	2007-11-30 22:07 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-05-13 20:15:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Paul Raines 2005-04-14 14:59:03 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050324 Firefox/1.0.2 Red Hat/1.0.2-1.4.1.centos4

Description of problem:
I have a system with to attached RAIDs at /dev/sda and /dev/sdb
and a plain IDE disk at /dev/hda

I had RH7.3 installed on /dev/hda1 and had made
sda and sdb into two separate volume groups with:

pvcreate /dev/sda
vgcreate vg1 /dev/sda
pvcreate /dev/sdb
vgcreate vg2 /dev/sdb

with several logical volumes on each

Today I upgraded to RHEL4 and made sure when Disk Druid ran to reformat
only the / partition on /dev/hda1. After the upgrade, it could not find
any volume groups:

# vgscan --verbose
Wiping cache of LVM-capable devices
Wiping internal cache
Reading all physical volumes. This may take a while...
Finding all volume groups
No volume groups found

Doing a 'fdisk /dev/sda' and 'fdisk /dev/sdb' showed that they now had empty
partition tables written on them. The only thing that I can see could have
done this is Disk Druid which must be programed to write partition tables to
any disk it sees that doesn't have one without asking the user.

Luckily I had backed up /etc from the old RH7.3 partition and was able to
use vgcfgrestore to recover. But this could have made me loose Terabytes of data.

Version-Release number of selected component (if applicable):

How reproducible:
Always

Steps to Reproduce:
1. Have LVM PV's on whole disks (e.g /dev/sda)
2. Install RHEL4
3. Those whole disks now are corrupted due to empty partition tables being
written to them (by Disk Druid most likely)

Actual Results: I lost my LVM volume groupos

Expected Results: Disk Druid should have left those disks alone

Additional info:

Comment 6 Mitchell Brandsma 2005-05-03 02:32:50 UTC

Similar issue with installation on a system with multiple paths to the same 
disks, hence showing up as sda, sdb, sdc and sdd.  All partition tables were 
blank.  I only changed sda (partitioning it and setting up the LVM the way I 
wanted it).  The installation appeared fine until I rebooted it and sda had an 
empty partition table.

It looks like this happens)
* choose the partition scheme
* partitions for sda get set up as requested.
* partitions for sdb, c and d get reinitialised as shown, but not requested.  
Since they are all paths to the same disk, goodbye partition table.

I was fortunately paying attention to allocation for partitions and was able to 
fdisk the partition table back to what it should have been and recover the 
installation, but if you don't change a partition then the installer shouldn't 
force any partition scheme on it.

Comment 7 Suzanne Hillman 2005-05-03 15:50:15 UTC

We do not support upgrades from RHL to RHEL.

See the last sentence in the second paragraph of the following FAQ:
http://www.redhat.com/software/rhel/faq/#17

Comment 8 Paul Raines 2005-05-03 16:07:13 UTC

I was doing a fresh install, not an upgrade.  I have never done anything
but fresh installs.

Your missing a very CRITICAL point here.  The RHEL install program
is trashing real data on disks it should not be touching!  It will happen
no matter what might already be installed on the machine.  Has nothing
to do with fresh install vs. upgrade.

This IS A BUG!  A LVM PV on a whole disk (i.e. sda instead of sda1)
gets trashed by anaconda

Comment 9 Paul Raines 2005-05-03 16:24:29 UTC

To be more specific, what I suspect is happening is that Disk Druid is writing
empty partition tables on to any disk it finds on the computer that does not
already have a partition table.  This is trashing the LVM PV.  Disk Druid should
not write to a disk in any way unless the user explicity requests an operation 
in the Disk Druid GUI for that disk.

Comment 10 Peter Jones 2005-05-03 17:50:53 UTC

This is really an unsupported setup -- the PV for lvm shouldn't be the whole
disk, but rather a partition of type 8e that spans it.

We may add a test for this case, but at best it'll just exclude the device from
all use -- probably not completely optimal, but this is really a case where the
LVM was set up incorrectly.

Comment 11 Paul Raines 2005-05-03 18:54:33 UTC

I disagree with your opinion that LVM was setup incorrectly.  If making whole
disk PVs was incorrect, then pvcreate should not allow it.  Hell, the LVM
HOWTO even has examples doing whole disk PVs.

And what do you mean "exclude the device from all use"?

There is a fundemental issue here that really has nothing to do with LVM
specifically.  Disk Druid should not be modifying any disk it finds in any way 
without express consent from the user.  Other users in the world may be using 
whole disks (i.e. sans any partition table) for other applications (e.g. raw 
data dump from some extremely fast data acquistion device).

Comment 12 Mitchell Brandsma 2005-05-04 00:41:28 UTC

Perhaps the summary should be "Disk druid overwrites whole device partition 
tables which have not had changes requested".

In my case, the setup is supported - I partitioned the first logical view of 
the physical disk (128M for /boot (ext3 on native Linux partition), the rest 
for LVM), set up LVM on the relevant partition, and left the other three (SAN 
multipathed) copies untouched.  My guess is it partitioned the first copy as 
requested, then partitioned the remaining 3 as it saw them - blank, thus wiping 
the original setup 3 times.

As it had cached the partition table for sda after performing the operation, 
the install was only too happy to install onto the disk as thought it to be, 
but rebooted to a blank partition table - no /boot, and no LVM.  My workaround 
solution was to boot in rescue mode, partition /dev/sda exactly as it was, 
rewrite the partition table and reboot.  I got lucky - it worked.  But what 
other unrequested messing around is done by the Disk Druid in an enterprise 
architecture installation?

It seems the simplest solution is to identify all partition tables modified as 
the GUI goes - record them in a list or whatever, and update only those with 
the Disk Druid at the stage those operations are done.

Comment 13 Peter Jones 2005-05-05 22:11:07 UTC

The fact that the low-level tools make it possible to configure your system in a
particular way is in no way related to if that configuration is one that is a
supported configuration for Red Hat Enterprise Linux.

That being said, I can't reproduce the behavior you're claiming happens.  If I
make an lvm PV on /dev/sdb, and then boot into the RHEL 4, it gives me a window
claiming that /dev/sdb doesn't have a valid partition table.  That window offers
the choice of initializing the disk or ignoring it.

If you tell it to initialize the disk, it writes a partition table.  Is that
what you're doing?

Comment 14 Mitchell Brandsma 2005-05-06 00:45:58 UTC

More like, you tell it to initialise sda and sdb, and don't tell it to touch 
sdc and sdd.  Then it goes ahead and writes all partition tables as it had 
displayed, _even those you do not request to be changed_.  So sda and sdb are 
changed as requested, and if there was data in the partition table on sdc and 
sdd which it didn't recognise, well bad luck, it's just been reinitialised.

I don't believe it has much to do with LVM at all, just that LVM metadata is a 
victim of the reinitialisation which has not been requested.

Comment 15 Paul Raines 2005-05-06 16:09:06 UTC

We don't recall seeing any question about initializing the disk or not.
It is within the realm of possibility we somehow "spaced" past that.

ONe thing is we were doing a kickstart install.  But with no directives
about disks except for 'zerombr yes'.  The kickstart config had no 'clearpart'.
 and no 'part'.  This has always made the kickstart go interactive for 
the Disk Druid part and then continue on automatically with the rest.

Comment 16 Peter Jones 2005-05-13 20:15:51 UTC

From the documentation:

   zerombr (optional)

       If zerombr is specified, and yes is its sole argument, any invalid
       partition tables found on disks are initialized. This will destroy
       all of the contents of disks with invalid partition tables. This
       command should be in the following format:

       zerombr yes

       No other format is effective.

So you've put in the kickstart config that you want it to clear partition tables
that don't make sense, and at the same time you're expecting it'll leave data
untouched on a disk which isn't partitioned.  These two goals are totally
incompatible.

Comment 17 Paul Raines 2005-05-14 20:28:35 UTC

That seems to be it, yes.  It didn't dawn on me that in dealing with an MBR,
anything other than the boot disk would be involved.

I take it that zerombr is really unnecessary anyway if one will be doing a:

bootloader --location=mbr

or does that also affect every disk and not just the boot disk?

Note You need to log in before you can comment on or make changes to this bug.