42879 – Installer causes HD corruption when partitioning the system

Bug 42879 - Installer causes HD corruption when partitioning the system

Summary: Installer causes HD corruption when partitioning the system

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	installer
Sub Component:
Version:	7.1
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Brent Fox
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-05-30 16:32 UTC by Greg Knight
Modified:	2005-10-31 22:00 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2001-06-26 21:17:22 UTC
Embargoed:

Attachments	(Terms of Use)
Disk layout of machine in RH 7 prior to upgrade (2.40 KB, text/plain) 2001-06-06 02:37 UTC, Greg Knight	no flags	Details
View All

Description Greg Knight 2001-05-30 16:32:16 UTC

Description of problem:
When attempting to upgrade an existing 7.0 system to 7.1, the installer 
will get up to the questions before auto or manual partitioning and starts 
complaining about 2 out of 3 drives having partions that do not begin or 
end on cylinder boundaries. It suggests passing the HD parms on the LILO 
command line but that does not seem to work.

How reproducible:
Always

Steps to Reproduce:
1.Boot Linux 7.1 CD1
2.Run Install as text
3.
	

Actual Results:  Causes Data corruption of 2 out of 3 HD

Expected Results:  Completed install / upgrade of existiting 7.0 machine

Additional info:

During Install,it suggests passing the HD parms on the LILO command line 
but that does not seem to work.

If I delete all partitions using fdisk through the setup program as well 
in VC2 and repartion manually it seems to be ok, but if I go back into 
fdisk on the drive I just finished with the partition table is all messed 
up.

If upgrading fresh from 7.0 Fresh install, the system can read all 3 
drives but then fails after formatting the volumes when it tries to mount 
any partition that it formatted.

The 2 drives affected by this problem are hda and hde.   hdb seems to not 
be affected.

System is a Cyrix 133 w/ 48mb of Ram
FIC PA-2002 motherboard
Linksys Ethernet Controller LNE100TX v.4.0 using Tulip drivers
Hard drives are 
\dev\hda	Maxtor 85250D6
\dev\hdb	JTS Corp. CHAMP Model C1300-2AF
\dev\hde	WDC WD600AB-32BVA0

hda and hdb are installed on internal IDE controller
hde is installed on a CMD6xx ATA 100 controller.

Comment 1 Brent Fox 2001-05-31 20:02:05 UTC

Try booting with 'linux noprobe'.  Does this help?

Comment 2 Greg Knight 2001-06-01 18:51:36 UTC

No.   I tried both hda=noprobe hde=noprobe as well as linux noprobe and both 
don't seem to work right.
  If I use hda and hde noprobe it started complaining that I did not have 
enough space or inodes on my / partition for all the packages to be upgraded.
  If I use 'linux noprobe' on a clean install of RH7.0 then know it complains 
about the partions on hda and hdb.

Is this something caused by the installer?  I have had the 2.4.4 kernel on here 
without a problem.   I was just hoping to get to a regular distro so I didn't 
have to compile the kernel to try and get my NIC working or something like that.

Greg

Comment 3 Brent Fox 2001-06-04 15:45:08 UTC

Can you look on VC3 and VC4 and see if there are any kernel error messages about
reading the drives?

Comment 4 Greg Knight 2001-06-06 02:36:05 UTC

Took a look and the only thing that I saw was it mentioned on VC4 that there 
were too many inodes.   I saved the exact message to a drive to retrieve on 
reboot but alas the system is corrupted again.    This time when it reboots on 
RH 7.0 / 2.4.4 kernel and attempts to mount the filesystems, it states 
that '/home: Corruption found in superblock (ionodes_per_group = 1850786)'. 
This is one of the many filesystems that fail e2fsck so it drops me to a maint 
mode.   If I try to run e2fsck on the drive manually it states that 'e2fsck 
reports group descriptors look bad ... trying backup blocks. Bad magic number 
in superblock while trying to open /dev/hda5.
  Could this be related to the large drives?  Extended partitions?   Also, I 
noticed in one of the VC screens that it insmod what appeared to be an ext3 
driver like for a new filesystem.  Could this be causing the problem?
  Any help would be appreciated because it gets old having to reload RH7.0 from 
scratch.

I am attaching a file with some 7.0 drive information before the upgrade 
occured.

Comment 5 Greg Knight 2001-06-06 02:37:19 UTC

Created attachment 20393 [details]
Disk layout of machine in RH 7 prior to upgrade

Comment 6 Brent Fox 2001-06-11 16:28:28 UTC

Well, we saw a problem very similar to this in our beta cycles.  The way the
kernel handles disk geometry changed from one beta to another, and we were
seeing this problem with people who had previously used one of the 7.1 betas. 
We have since seen this problem with a few people who never used any of the 7.1
betas, and I'm not sure why it's happening.  
I think if you used the 7.1 installer to create the partitions, then things
would work.  I know that may not be an option if you got data on the drive that
you haven't backed up.

Comment 7 Greg Knight 2001-06-14 02:41:32 UTC

Tried using the 7.1 installer version of fdisk.   I deleted all the partitions 
on hda and recreated them as per the previous attachment.   I had went on to 
work on hdb and realized I had forgotten to set a partition to a swap type so I 
reentered hda using fdisk and got the message as follows:
Warning: ignoring extra data in partition table 5
Warning: ignoring extra data in partition table 5
Warning: ignoring extra data in partition table 5
Warning: invalid flag 0x2020 of partition table 5 will be corrected by w(rite)

Well, after entering a w to write the partition table, I re-entered the hda 
drive and found the same errors but without the last line.  The partition table 
is all mangled with the first one having an id of bf and an indication that 
partition 1 has different physical/logical beginnings (non-Linux?):  phys=
(0,1,63) logical =(0,1,1).   partiton 2 and 5 are messed up similarly.   

Any other way that you would suggest to 'use the 7.1 installer' to partition 
the drives.   The data that was on these drives was long gone after the first 
install.   Thank goodness I can hopefully recover most of it.  I would like to 
settle this issue if possible so no one else suffers the same fate but would 
also like to proceed in getting the system back in a running state.

Thanks in advance, Greg

Comment 8 Brent Fox 2001-06-15 15:08:53 UTC

I don't know what the problem is.  It sounds like there could be hardware
problems with the drive.  From the errors that you are seeing in fdisk, it looks
like there could be bad sectors on the drive.  If the disk can't be partitioned
properly, the installer doesn't have much hope of working.

Comment 9 Greg Knight 2001-06-15 20:14:51 UTC

But if the partitions can be formatted and installed from RH 7.0, then why is 
the 7.1 installer trashing the drives.   The partitions were valid when I first 
entered them and it was only after I modified them with the fdisk that is used 
by the 7.1 installer that the data was erased.    Why will the drives work with 
7.0 just fine but if I install 7.1, they are destroyed?   HDA and HDB are older 
drives but HDE, the 60 gig, is brand new.   And it works fine in 7.0 as long as 
I recompile the kernel to include CMD640 controller support.  
  Everything even works fine under the 2.4.4 kernel...

Any more suggestions?

Greg

Comment 10 Brent Fox 2001-06-19 15:18:28 UTC

I'm running out of ideas.  I can't explain why this would happen, and I haven't
seen it happen on other machines.  My guess is that there's something weird
about the cmd640 driver (or the controller itself).  

If you look at the comments in the header of
/usr/src/linux-2.4/drivers/ide/cmd640.c, the feeling seems to be that the
controller is not quite up to par.

Comment 11 Greg Knight 2001-06-19 21:13:10 UTC

I'm not sure either because the problem is manifesting itself on hda which is  
on the internal interface.   I could see the CMD640 as the problem if it was 
the only place with the problem but the interal IDE interface is where the 
problem is at.  hde, on the cmd interface, is still having the problem but not 
sure if just having the interface in the machine is causing the problem.   
  The funny thing is that the drive works fine with a 2.4.4 kernel and the 
CMD640 drivers compiled in under RH7.0/2.4.4.   It works fine that way.
  Hopefully this will be fixed in 7.2.

Comment 12 Brent Fox 2001-06-26 21:17:17 UTC

The entire partitioning section of the installer is being rewritten, so this bug
should not be a problem in future releases.

Note You need to log in before you can comment on or make changes to this bug.