Bug 231330 - RHEL4 U5 beta1 installation failure: anaconda unhandled exception
RHEL4 U5 beta1 installation failure: anaconda unhandled exception
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: parted (Show other bugs)
4.5
x86_64 Linux
medium Severity urgent
: ---
: ---
Assigned To: David Cantrell
Brock Organ
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-03-07 14:25 EST by Nick Dokos
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-03-15 10:34:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Anaconda traceback (929 bytes, text/plain)
2007-03-07 14:25 EST, Nick Dokos
no flags Details
Patch to probe_partition_for_geom() for DL585g systems (4.30 KB, patch)
2007-03-13 14:33 EDT, David Cantrell
no flags Details | Diff
New parted source RPM (1.45 MB, application/x-rpm)
2007-03-13 14:34 EDT, David Cantrell
no flags Details
New parted rpm for x86_64 (489.05 KB, application/x-rpm)
2007-03-13 14:34 EDT, David Cantrell
no flags Details
New parted-devel rpm for x86_64 (191.46 KB, application/x-rpm)
2007-03-13 14:35 EDT, David Cantrell
no flags Details
New parted-debuginfo rpm for x86_64 (503.53 KB, application/x-rpm)
2007-03-13 14:36 EDT, David Cantrell
no flags Details

  None (edit)
Description Nick Dokos 2007-03-07 14:25:40 EST
Description of problem: Trying to install RHEL4U5 beta1 on an HP DL585g2 system
(8 Opteron cores, 32Gb of memory, installing on a 72Gb CCISS drive, no RAID,
booting from a boot.iso and installing over NFS, chose Everything under
packages). During the QA period, when partitioning the drive, I got a lot of
assertions:


Assertion (cyl_size <= 255*63) at disk_dos.c:556 in
function probe_partition_for_geom() failed.

Assertion (heads < 256) at disk_dos.c:576 in ...

Assertion ((C*heads + H)*sectors +S == A) at disk_dos.c:582 in ...

I clicked [Ignore] on all of them (clicking [Cancel] did not seem
to have any effect the few times I tried the first time I went through
the exercise - the second time, I ignored every assertion).

Finally, after the interactive part of the installation was over, I got
more assertions and finally anaconda died with an unhandled exception.
I'll create an attachment with the traceback.

Version-Release number of selected component (if applicable):
anaconda-10.1.1.62

How reproducible:


Steps to Reproduce:
1. Described above
2.
3.
  
Actual results: Installation failed.


Expected results: Success.


Additional info:
Comment 1 Nick Dokos 2007-03-07 14:25:40 EST
Created attachment 149479 [details]
Anaconda traceback
Comment 2 Nick Dokos 2007-03-09 15:47:26 EST
I tried the install with RHEL4.5snap1 - I get the exact same failure.

.discinfo says:
1172773896.803713
Red Hat Enterprise Linux 4
x86_64
1,2,3,4,5
RedHat/base
RedHat/RPMS
RedHat/pixmaps
Comment 3 Nick Dokos 2007-03-12 12:05:17 EDT
Some more information:

o I tried installing on an (older) Opteron blade with a CCISS drive: the
installation succeeded, the system is running fine.

o I copied parted and some of the libraries that it depends on from the above
system to the DL585g2 with a non-RHEL4.5snap1 installation and tried looking at
the disks with it. There was no problem (although I didn't push it very far).

o I copied the kernel from the RHEL4.5snap1 installation on the blade to an
older RHEL4U4 installation on the DL585g2, and rebooted it. There was no problem.

o The RHEL4U4 installation was using a kernel parameter "pci=nommconf", so
I tried adding it to the kernel command line when installing RHEL4.5: no joy -
the assertions still failed and anaconda died.

I was hoping that one of the experiments above would pinpoint the culprit
unambiguously, but it's still a mystery. It seems specific to the DL585g2 at
this point. I'd appreciate any suggestions of how to get past this.
Comment 4 David Lawrence 2007-03-13 13:33:02 EDT
Did this occur with earlier updates of RHEL4 such as U4?
Comment 5 Nick Dokos 2007-03-13 13:36:48 EDT
No - RHEL4U4 installed and works fine.
Comment 6 David Lawrence 2007-03-13 13:38:38 EDT
Adding Regression keyword.
Comment 8 David Cantrell 2007-03-13 14:31:28 EDT
I've patched disk_dos.c with what I think will work, but without having that
particular system to reproduce the problem on and test, I'm going to post my fix
here and ask you to test it.

You will find a patch, a new SRPM for parted on RHEL4U5, and a binary RPM for
this patched parted on x86_64.  Can you test out this build of parted(8) on
RHEL4U5 and see if it solves your problem.  I realize it's probably difficult to
get U5 installed on the target system, but whatever you can do to test this
build on that platform on U5 would help.  I would suggest installing RHEL4U4 and
then doing an upgrade to U5.  Anaconda does not partition in those cases, so you
would be fine.

Let me know if this parted solves the problem or breaks in new and interesting ways.
Comment 9 David Cantrell 2007-03-13 14:33:04 EDT
Created attachment 149968 [details]
Patch to probe_partition_for_geom() for DL585g systems
Comment 10 David Cantrell 2007-03-13 14:34:08 EDT
Created attachment 149969 [details]
New parted source RPM
Comment 11 David Cantrell 2007-03-13 14:34:53 EDT
Created attachment 149970 [details]
New parted rpm for x86_64
Comment 12 David Cantrell 2007-03-13 14:35:48 EDT
Created attachment 149971 [details]
New parted-devel rpm for x86_64
Comment 13 David Cantrell 2007-03-13 14:36:53 EDT
Created attachment 149972 [details]
New parted-debuginfo rpm for x86_64
Comment 14 Nick Dokos 2007-03-13 18:30:51 EDT
I have not been able to try the patch but I have new information that may make
it unnecessary. I said in comment #5 that RHEL4U4 installed and ran with no
problem: that's true but it's not the whole story. I tried installing it again
on the new disk with the intention of doing an update install to RHEL4.5snap1
and smashed into the same brick wall.

It turns out that we had added more volumes to the RAID array and there seems to
be a threshold: the original installation was done with one volume configured
(no problem there) and I've been trying to install with eight volumes configured
   (problems galore here). We deleted six volumes, recreated one and tried to
install RHEL4U4 on that (the third configured volume) - that was successful. I
have not tried RHEL4.5snap1 yet and I have not tried to find exactly where the
threshold is, but I'll do that tomorrow and let you know.

BTW, RHEL5 does not hit this problem at all (most of the deleted volumes had
versions of RHEL5 on them).
Comment 15 Nick Dokos 2007-03-14 11:10:47 EDT
I added a fourth volume to the RAID array and installed RHEL4.5snap1 with no
problem. I have not determined the threshold yet, but it is clear that the
problem is not parted: it just gets bum information. We'll try to interpose
a newer device driver at installation time, once we determine where the failure
threshold is.
Comment 16 David Cantrell 2007-03-14 11:42:58 EDT
That is good to hear [that it's not parted], but I'm interested to know what is
happening.  Thanks for the feedback.
Comment 17 Nick Dokos 2007-03-14 12:31:05 EDT
We've gone all the way back to eight volumes, step by step (i.e. adding one
volume at a time and installing RHEL4.5snap1 on each newly added volume) and I
*still* don't see the former failure - so there is no "threshold".

The only explanation that we can think of is the following:

o before blowing away (almost) all the volumes, we updated the firmware on
the box and then on the P400 controller. Neither of these solved the problem at
the time.

o we also blew away the one volume that I was trying to install on and recreated
it (with the new firmware in place). That also was unsuccessful.

o but now that we've recreated six of the volumes from scratch with the new
firmware in place, the problem has disappeared.

The situation is deeply unsatisfying but there it is.

For the record, the new firmware on the box says:

A07 (12/02/2006)

and the firmware on the P400 controller says:

v2.08

The previous version of the controller firmware was v1.18.
Comment 18 David Lawrence 2007-03-14 23:07:43 EDT
In your opinion do you think this can be closed then?
Comment 19 Nick Dokos 2007-03-15 10:22:39 EDT
I think so - it's almost certainly *not* a parted problem,
probably a firmware issue with the P400.
Comment 20 David Lawrence 2007-03-15 10:34:56 EDT
Ok closing then. Please reopen if you acquire additional information about this.
Thanks for the report.

Note You need to log in before you can comment on or make changes to this bug.