Bug 596517 - RHEL6 Install on ibm-x3950m2-0[12].ovirt.rhts.eng.bos.redhat.com fail for no apparent reason around 8 seconds after anaconda gets network up
Summary: RHEL6 Install on ibm-x3950m2-0[12].ovirt.rhts.eng.bos.redhat.com fail for no ...
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel   
(Show other bugs)
Version: 6.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: James Takahashi (IBM)
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Keywords: Reopened
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-26 21:04 UTC by Barry Marson
Modified: 2013-01-09 22:38 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-07-28 20:43:20 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Barry Marson 2010-05-26 21:04:07 UTC
Description of problem: Several attempts have been made to install on 

 ibm-x3950m2-0[12].ovirt.rhts.eng.bos.redhat.com

systems, first through rhts/beaker and then through pure beaker provisioning.  In all cases, right after anaconda brings up the network via network manager, a message comes out saying:

   Retrieving ... and then 

   Looking for installation images on CD device /dev/sr0Running anaconda
   13.21.45, the Red Hat Enterprise Linux system installer - please wait.
   Finding storage devices

then the machine does a reset.

There is no more data other than what is attached.

I attempted to blacklist a pair of pci devices ... first lpfc, and then ixgbe as well.  In both cases the reset occured quicker from when the network was up.

Barry

Version-Release number of selected component (if applicable):

RHEL6.0-20100512.0
RHEL6.0-20100523.0

How reproducible:
every time

Steps to Reproduce:
1. try it
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Barry Marson 2010-05-26 21:22:16 UTC
Beta 1 fails as well.

Barry

Comment 3 David Cantrell 2010-05-27 18:33:00 UTC
My guess is something is occurring during storage detection that's causing the system to bail.  Based on the messages you included in the initial comment, it sounds like you are able to get in to stage 2 of anaconda.  Before clicking next (or advancing past the welcome screen), can you do the following:

1) ssh in to the system and tail -f /tmp/storage.log
2) ssh in to the system and tail -f /tmp/program.log
3) Advance the installer to the next screen

Hopefully we'll see some error output in the storage.log and/or program.log that points us to the root cause.

Comment 4 Barry Marson 2010-05-27 19:10:50 UTC
I dont think I get to stage 2.  If I do, there isn't enough time to do anything.  I have about 4-8 seconds to get into the system after a ping starts responding.  After that the system resets.  In other words, not possible.

Btw, I blacklisted the FC HBA driver lpfc to no avail.  In fact when ever I blacklisted anything, the time to reset was quicker (closer to 4 sec).  The only other storage would be the local LSI storage and thats what I need for the OS.

Barry

Comment 5 Barry Marson 2010-05-27 19:41:20 UTC
Latest attempts through beaker with a Kickstart metadata = "manual" and adding vnc to the Kernel Options install line have shown that when I get the language option at the console, waiting a minute shows it resets all by itself.  So something asynchronously is going on .. module probing ??

Barry

Comment 6 David Cantrell 2010-05-27 20:21:34 UTC
You are definitely getting to stage 2.  When you see this message:

Running anaconda 13.21.45, the Red Hat Enterprise Linux system installer - please wait.

You have entered stage 2.

Module loading occurs during stage 1.  Are you running the text mode or graphical interface for the installer?

Comment 7 Barry Marson 2010-05-27 20:58:26 UTC
Well stage two causes a machine reset or what ever in seconds ... See comment #5 for the args.

This is text mode with a request for vnc once it can get started.  But I never even get to select language in manual mode.  It's already in a hardware reset phase.  Again comment #5 says what has been tried.

There's nothing more I can provide that you can't do yourself on these box's yourself.

   console -M console.lab.bos.redhat.com HOSTNAME

Barry

Comment 8 Chris Lumens 2010-05-28 20:24:37 UTC
I wonder if netconsole (http://lxr.linux.no/#linux+v2.6.34/Documentation/networking/netconsole.txt) might be useful for debugging this?

Comment 9 Chris Lumens 2010-05-28 20:33:03 UTC
Doesn't look like the module gets automatically loaded if you pass the parameter.  We'll have to do that early on in anaconda if we want to make use of it.  Standby.

Comment 10 David Cantrell 2010-06-18 18:39:11 UTC
Without any additional debugging information, it's hard to determine what is happening.  Given that the failure happens very earlier, our guess is a kernel failure of some variety (module loading problem, etc).

Comment 13 Dor Laor 2010-07-28 13:01:20 UTC

*** This bug has been marked as a duplicate of bug 607650 ***

Comment 14 Avi Kivity 2010-07-28 16:44:27 UTC
The bug log does not mention kvm anywhere.  Is this in fact a guest install?

Comment 15 Avi Kivity 2010-07-28 16:48:57 UTC
In fact, comment #7 means it isn't kvm for sure.  kvm consoles are through the host, not console.something.

Comment 16 Barry Marson 2010-07-28 17:13:13 UTC
This never ever had anything to do with virt or kvm.  The problem is the attached FC storage.  Something about it makes stage 1 install fail.  It was disconnected and the machine works now albeit without that needed storage for certain virt testing.

We are trying to get that storage reconnected to find out if there is still an issue.

Barry

Comment 17 Avi Kivity 2010-07-28 17:52:10 UTC
Right, so this isn't a dup of the infamous #607650 as comment #13 suggests.

Comment 19 Barry Marson 2010-07-28 20:36:28 UTC
I have verified that with the FC storage attached, RHEL6.0-20100722.0 installs.  So my issue is resolved.

Barry

Comment 20 Peter Bogdanovic 2010-07-28 20:43:20 UTC
Based on Barry's comment I am going to close this bug.


Note You need to log in before you can comment on or make changes to this bug.