Bug 231330
Summary: | RHEL4 U5 beta1 installation failure: anaconda unhandled exception | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Nick Dokos <nicholas.dokos> | ||||||||||||||
Component: | parted | Assignee: | David Cantrell <dcantrell> | ||||||||||||||
Status: | CLOSED NOTABUG | QA Contact: | Brock Organ <borgan> | ||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||
Priority: | medium | ||||||||||||||||
Version: | 4.5 | CC: | bmarson, dkl, jturner | ||||||||||||||
Target Milestone: | --- | Keywords: | Regression | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2007-03-15 14:34:56 UTC | Type: | --- | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Attachments: |
|
Description
Nick Dokos
2007-03-07 19:25:40 UTC
Created attachment 149479 [details]
Anaconda traceback
I tried the install with RHEL4.5snap1 - I get the exact same failure. .discinfo says: 1172773896.803713 Red Hat Enterprise Linux 4 x86_64 1,2,3,4,5 RedHat/base RedHat/RPMS RedHat/pixmaps Some more information: o I tried installing on an (older) Opteron blade with a CCISS drive: the installation succeeded, the system is running fine. o I copied parted and some of the libraries that it depends on from the above system to the DL585g2 with a non-RHEL4.5snap1 installation and tried looking at the disks with it. There was no problem (although I didn't push it very far). o I copied the kernel from the RHEL4.5snap1 installation on the blade to an older RHEL4U4 installation on the DL585g2, and rebooted it. There was no problem. o The RHEL4U4 installation was using a kernel parameter "pci=nommconf", so I tried adding it to the kernel command line when installing RHEL4.5: no joy - the assertions still failed and anaconda died. I was hoping that one of the experiments above would pinpoint the culprit unambiguously, but it's still a mystery. It seems specific to the DL585g2 at this point. I'd appreciate any suggestions of how to get past this. Did this occur with earlier updates of RHEL4 such as U4? No - RHEL4U4 installed and works fine. Adding Regression keyword. I've patched disk_dos.c with what I think will work, but without having that particular system to reproduce the problem on and test, I'm going to post my fix here and ask you to test it. You will find a patch, a new SRPM for parted on RHEL4U5, and a binary RPM for this patched parted on x86_64. Can you test out this build of parted(8) on RHEL4U5 and see if it solves your problem. I realize it's probably difficult to get U5 installed on the target system, but whatever you can do to test this build on that platform on U5 would help. I would suggest installing RHEL4U4 and then doing an upgrade to U5. Anaconda does not partition in those cases, so you would be fine. Let me know if this parted solves the problem or breaks in new and interesting ways. Created attachment 149968 [details]
Patch to probe_partition_for_geom() for DL585g systems
Created attachment 149969 [details]
New parted source RPM
Created attachment 149970 [details]
New parted rpm for x86_64
Created attachment 149971 [details]
New parted-devel rpm for x86_64
Created attachment 149972 [details]
New parted-debuginfo rpm for x86_64
I have not been able to try the patch but I have new information that may make it unnecessary. I said in comment #5 that RHEL4U4 installed and ran with no problem: that's true but it's not the whole story. I tried installing it again on the new disk with the intention of doing an update install to RHEL4.5snap1 and smashed into the same brick wall. It turns out that we had added more volumes to the RAID array and there seems to be a threshold: the original installation was done with one volume configured (no problem there) and I've been trying to install with eight volumes configured (problems galore here). We deleted six volumes, recreated one and tried to install RHEL4U4 on that (the third configured volume) - that was successful. I have not tried RHEL4.5snap1 yet and I have not tried to find exactly where the threshold is, but I'll do that tomorrow and let you know. BTW, RHEL5 does not hit this problem at all (most of the deleted volumes had versions of RHEL5 on them). I added a fourth volume to the RAID array and installed RHEL4.5snap1 with no problem. I have not determined the threshold yet, but it is clear that the problem is not parted: it just gets bum information. We'll try to interpose a newer device driver at installation time, once we determine where the failure threshold is. That is good to hear [that it's not parted], but I'm interested to know what is happening. Thanks for the feedback. We've gone all the way back to eight volumes, step by step (i.e. adding one volume at a time and installing RHEL4.5snap1 on each newly added volume) and I *still* don't see the former failure - so there is no "threshold". The only explanation that we can think of is the following: o before blowing away (almost) all the volumes, we updated the firmware on the box and then on the P400 controller. Neither of these solved the problem at the time. o we also blew away the one volume that I was trying to install on and recreated it (with the new firmware in place). That also was unsuccessful. o but now that we've recreated six of the volumes from scratch with the new firmware in place, the problem has disappeared. The situation is deeply unsatisfying but there it is. For the record, the new firmware on the box says: A07 (12/02/2006) and the firmware on the P400 controller says: v2.08 The previous version of the controller firmware was v1.18. In your opinion do you think this can be closed then? I think so - it's almost certainly *not* a parted problem, probably a firmware issue with the P400. Ok closing then. Please reopen if you acquire additional information about this. Thanks for the report. |