Bug 728627
Summary: | [RHEL6.2] Kernel fails to boot. 2.6.32-179.el6 or higher | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jeff Burke <jburke> |
Component: | kernel | Assignee: | Don Zickus <dzickus> |
Status: | CLOSED CANTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.2 | CC: | arozansk, jstancek, jvillalo, kmcmartin, masaya.hasegawa, pbunyan |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-08-10 19:02:12 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 728633 |
Description
Jeff Burke
2011-08-05 20:54:27 UTC
After spending a couple of days on this, I am going to close it out as broken hardware. I have debugged the hangs to one of the various pci_config_read or pci_config_write. According to our pci/e guys, the system should never hang on a read, possibly a write but unlikely on the particular write I noticed. Granted the system always seems to hang on the particular pcie port all the time, just adding in a couple of printks allows the system to boot fine. This leads me to believe it is a system timing issue. This machine is an Intel whitebox that is no long supported. Yeah, it is a regression from 6.1, but to properly enable APEI support, the PCIe intialization had to be re-done to accomodate the support. This change pokes registers a little differently. I hit hangs on a pci_config_write in pcibios_set_master I hit hangs on a pci_config_read in PCI_COMMAND I hit hangs on a couple of pci_config_reads while configuring interrupts that I didn't feel like tracking down. This is just wasting my time on a broken machine that is no longer supported. Closing it out before I waste more time better spent elsewhere. Cheers, Don Do we know what is the last kernel that did work on the system? I believe it is -178.el6 and that the APEI changes to the PCIe root bridge probably caused the problem. I just uncheck'd the private field of comment 3. Not sure why I did that. There is a workaround that I forgot to mention 'pci=noaer' and 'pcie_ports=compat' successfully worked around the problem. Cheers, Don Thanks Don. Makes sense to me. >This machine is an Intel whitebox that is no long supported. (Comment #3)
On ProLiant DL165 G7, this problem happens too. Though the workarond
('pci=noaer' and 'pcie_ports=compat' ) works well, our customer expect
that this issue be fixed in the future release.
(In reply to comment #7) > >This machine is an Intel whitebox that is no long supported. (Comment #3) > > On ProLiant DL165 G7, this problem happens too. Though the workarond > ('pci=noaer' and 'pcie_ports=compat' ) works well, our customer expect > that this issue be fixed in the future release. Hi Masaya, You will have to open a new bugzilla for that issue. The issue here seemed to be broken hardware, which was caused by software enabling features in the hardware. There isn't much we can do in that case except for those workarounds. The ProLiant should be a working box though, so I wouldn't be surprised if that issue is something different. Cheers, Don |