Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 596517

Summary:	RHEL6 Install on ibm-x3950m2-0[12].ovirt.rhts.eng.bos.redhat.com fail for no apparent reason around 8 seconds after anaconda gets network up
Product:	Red Hat Enterprise Linux 6	Reporter:	Barry Marson <bmarson>
Component:	kernel	Assignee:	James Takahashi (IBM) <nobody+PNT0273897>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.0	CC:	borgan, jburke, jolsa, knoel, mjenner, nhorman, peterm, tburke, tgraf
Target Milestone:	rc	Keywords:	Reopened
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-07-28 20:43:20 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Barry Marson 2010-05-26 21:04:07 UTC

Description of problem: Several attempts have been made to install on 

 ibm-x3950m2-0[12].ovirt.rhts.eng.bos.redhat.com

systems, first through rhts/beaker and then through pure beaker provisioning.  In all cases, right after anaconda brings up the network via network manager, a message comes out saying:

   Retrieving ... and then 

   Looking for installation images on CD device /dev/sr0Running anaconda
   13.21.45, the Red Hat Enterprise Linux system installer - please wait.
   Finding storage devices

then the machine does a reset.

There is no more data other than what is attached.

I attempted to blacklist a pair of pci devices ... first lpfc, and then ixgbe as well.  In both cases the reset occured quicker from when the network was up.

Barry

Version-Release number of selected component (if applicable):

RHEL6.0-20100512.0
RHEL6.0-20100523.0

How reproducible:
every time

Steps to Reproduce:
1. try it
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Barry Marson 2010-05-26 21:22:16 UTC

Beta 1 fails as well.

Barry

Comment 3 Dave Cantrell 2010-05-27 18:33:00 UTC

My guess is something is occurring during storage detection that's causing the system to bail.  Based on the messages you included in the initial comment, it sounds like you are able to get in to stage 2 of anaconda.  Before clicking next (or advancing past the welcome screen), can you do the following:

1) ssh in to the system and tail -f /tmp/storage.log
2) ssh in to the system and tail -f /tmp/program.log
3) Advance the installer to the next screen

Hopefully we'll see some error output in the storage.log and/or program.log that points us to the root cause.

Comment 4 Barry Marson 2010-05-27 19:10:50 UTC

I dont think I get to stage 2.  If I do, there isn't enough time to do anything.  I have about 4-8 seconds to get into the system after a ping starts responding.  After that the system resets.  In other words, not possible.

Btw, I blacklisted the FC HBA driver lpfc to no avail.  In fact when ever I blacklisted anything, the time to reset was quicker (closer to 4 sec).  The only other storage would be the local LSI storage and thats what I need for the OS.

Barry

Comment 5 Barry Marson 2010-05-27 19:41:20 UTC

Latest attempts through beaker with a Kickstart metadata = "manual" and adding vnc to the Kernel Options install line have shown that when I get the language option at the console, waiting a minute shows it resets all by itself.  So something asynchronously is going on .. module probing ??

Barry

Comment 6 Dave Cantrell 2010-05-27 20:21:34 UTC

You are definitely getting to stage 2.  When you see this message:

Running anaconda 13.21.45, the Red Hat Enterprise Linux system installer - please wait.

You have entered stage 2.

Module loading occurs during stage 1.  Are you running the text mode or graphical interface for the installer?

Comment 7 Barry Marson 2010-05-27 20:58:26 UTC

Well stage two causes a machine reset or what ever in seconds ... See comment #5 for the args.

This is text mode with a request for vnc once it can get started.  But I never even get to select language in manual mode.  It's already in a hardware reset phase.  Again comment #5 says what has been tried.

There's nothing more I can provide that you can't do yourself on these box's yourself.

   console -M console.lab.bos.redhat.com HOSTNAME

Barry

Comment 8 Chris Lumens 2010-05-28 20:24:37 UTC

I wonder if netconsole (http://lxr.linux.no/#linux+v2.6.34/Documentation/networking/netconsole.txt) might be useful for debugging this?

Comment 9 Chris Lumens 2010-05-28 20:33:03 UTC

Doesn't look like the module gets automatically loaded if you pass the parameter.  We'll have to do that early on in anaconda if we want to make use of it.  Standby.

Comment 10 Dave Cantrell 2010-06-18 18:39:11 UTC

Without any additional debugging information, it's hard to determine what is happening.  Given that the failure happens very earlier, our guess is a kernel failure of some variety (module loading problem, etc).

Comment 13 Dor Laor 2010-07-28 13:01:20 UTC


*** This bug has been marked as a duplicate of bug 607650 ***

Comment 14 Avi Kivity 2010-07-28 16:44:27 UTC

The bug log does not mention kvm anywhere.  Is this in fact a guest install?

Comment 15 Avi Kivity 2010-07-28 16:48:57 UTC

In fact, comment #7 means it isn't kvm for sure.  kvm consoles are through the host, not console.something.

Comment 16 Barry Marson 2010-07-28 17:13:13 UTC

This never ever had anything to do with virt or kvm.  The problem is the attached FC storage.  Something about it makes stage 1 install fail.  It was disconnected and the machine works now albeit without that needed storage for certain virt testing.

We are trying to get that storage reconnected to find out if there is still an issue.

Barry

Comment 17 Avi Kivity 2010-07-28 17:52:10 UTC

Right, so this isn't a dup of the infamous #607650 as comment #13 suggests.

Comment 19 Barry Marson 2010-07-28 20:36:28 UTC

I have verified that with the FC storage attached, RHEL6.0-20100722.0 installs.  So my issue is resolved.

Barry

Comment 20 Peter Bogdanovic 2010-07-28 20:43:20 UTC

Based on Barry's comment I am going to close this bug.