RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 728627 - [RHEL6.2] Kernel fails to boot. 2.6.32-179.el6 or higher
Summary: [RHEL6.2] Kernel fails to boot. 2.6.32-179.el6 or higher
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Don Zickus
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 6.2KnownIssues
TreeView+ depends on / blocked
 
Reported: 2011-08-05 20:54 UTC by Jeff Burke
Modified: 2018-11-27 20:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-10 19:02:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeff Burke 2011-08-05 20:54:27 UTC
Description of problem:
 While testing the RHEL6.2 kernels we ran across an instance where a system no longer boots.

Version-Release number of selected component (if applicable):
2.6.32-179.el6

How reproducible:
Always
  
Actual results:
This is the last thing you see on the console.

============<snip>============
ksign: Installing public key data 
Loading keyring 
- Added public key FD6F1B13E15862F9 
- User ID: Red Hat, Inc. (Kernel Module GPG key) 
- Added public key D4A26C9CCD09BEDA 
- User ID: Red Hat Enterprise Linux Driver Update Program <secalert> 
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252) 
io scheduler noop registered 
io scheduler anticipatory registered 
io scheduler deadline registered 
io scheduler cfq registered (default) 

============<\snip>============

Expected results:
System should boot

Additional info:

Comment 3 Don Zickus 2011-08-10 19:02:12 UTC
After spending a couple of days on this, I am going to close it out as broken hardware.  I have debugged the hangs to one of the various pci_config_read or pci_config_write. 

According to our pci/e guys, the system should never hang on a read, possibly a write but unlikely on the particular write I noticed.  Granted the system always seems to hang on the particular pcie port all the time, just adding in a couple of printks allows the system to boot fine.  This leads me to believe it is a system timing issue.

This machine is an Intel whitebox that is no long supported.

Yeah, it is a regression from 6.1, but to properly enable APEI support, the PCIe intialization had to be re-done to accomodate the support.  This change pokes registers a little differently.

I hit hangs on a pci_config_write in pcibios_set_master
I hit hangs on a pci_config_read in PCI_COMMAND
I hit hangs on a couple of pci_config_reads while configuring interrupts that I didn't feel like tracking down.

This is just wasting my time on a broken machine that is no longer supported.  Closing it out before I waste more time better spent elsewhere.

Cheers,
Don

Comment 4 John Villalovos 2011-08-10 20:49:10 UTC
Do we know what is the last kernel that did work on the system?

Comment 5 Don Zickus 2011-08-10 21:04:32 UTC
I believe it is -178.el6 and that the APEI changes to the PCIe root bridge probably caused the problem.

I just uncheck'd the private field of comment 3.  Not sure why I did that.  There is a workaround that I forgot to mention 'pci=noaer' and 'pcie_ports=compat' successfully worked around the problem.

Cheers,
Don

Comment 6 John Villalovos 2011-08-10 21:10:27 UTC
Thanks Don.  Makes sense to me.

Comment 7 masaya.hasegawa.hp 2012-01-25 06:21:09 UTC
>This machine is an Intel whitebox that is no long supported. (Comment #3)

On ProLiant DL165 G7, this problem happens too. Though the workarond
('pci=noaer' and 'pcie_ports=compat' ) works well, our customer expect
that this issue be fixed in the future release.

Comment 8 Don Zickus 2012-01-25 14:47:59 UTC
(In reply to comment #7)
> >This machine is an Intel whitebox that is no long supported. (Comment #3)
> 
> On ProLiant DL165 G7, this problem happens too. Though the workarond
> ('pci=noaer' and 'pcie_ports=compat' ) works well, our customer expect
> that this issue be fixed in the future release.

Hi Masaya,

You will have to open a new bugzilla for that issue.  The issue here seemed to be broken hardware, which was caused by software enabling features in the hardware.  There isn't much we can do in that case except for those workarounds.

The ProLiant should be a working box though, so I wouldn't be surprised if that issue is something different.

Cheers,
Don


Note You need to log in before you can comment on or make changes to this bug.