Bug 1236373 - Hosts don't boot after deployment
Summary: Hosts don't boot after deployment
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - RHEV
Version: 1.0
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: beta
: 1.0
Assignee: Jesus M. Rodriguez
QA Contact:
Dan Macpherson
URL:
Whiteboard:
Depends On: 1254615
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-28 11:02 UTC by Tzach Shefi
Modified: 2016-09-19 12:22 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: version of syslinux in RHEL causes a problem booting Dell PowerEdge c6220 Consequence: Satellite's discovery provisioning is not able to successfully provision this hardware. It will discover the hosts, install an OS, but hangs on the final boot of the installed OS. Workaround (if any): 1) Edit the 'PXELinux default local boot' template change .localboot 0 to: COM32 chain.c32 APPEND hd0 2) Put the Hosts into build mode, then remove them from build. This will update their PXE templates. Result: Nodes will boot correctly.
Clone Of:
Environment:
Last Closed: 2016-09-19 12:22:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Installer logs. (130.25 KB, application/x-gzip)
2015-06-28 11:02 UTC, Tzach Shefi
no flags Details

Description Tzach Shefi 2015-06-28 11:02:29 UTC
Created attachment 1044008 [details]
Installer logs.

Description of problem: RHEV+RHCF deployment on physical servers took a very long time (46%), left it running for the weekend, it now shows completed 100%. But both hosts are stuck at with this pce error-> 
PXE-M0F: exiting intel boot agent.  

The boot order is local disk first, it also says "Booting from local disk..." before the PXE error.

Restarting both nodes doesn't help.  

Version-Release number of selected component (if applicable):
ISO with the Media ID 1434638612.868458 

How reproducible:
Unsure first attempt to install

Steps to Reproduce:
1. Create new deployment RHEV+RHCF
2. NFS storage on RCHI server
3. Deployment got stuck for a long time on 46%.
4. Today I checked it's at 100% completed (green bar).
5. Both hosts don't bootup. 

Actual results:
Both hosts are stuck on same PXE error during boot. Restarting hosts doesn't help they return to same error. 

Expected results:
Hosts should boot up. 

Additional info:
Just started helping out with RHCI, not sure which logs to add other than ones under: 
var/log/katello-installer

I'll try this again see if I get stuck on same problem.

Comment 1 Tzach Shefi 2015-06-28 14:51:08 UTC
I've deleted my hosts rediscovered them and ran into the same error exactly.

This time during second boot (after host OS installed) I'd manually changed boot order to start from disks rather than PXE, deployment resumed and is now almost done. 

The above bug can be attributed to the fact that on second PXE boot both servers didn't move to second boot device (disk drives). 

Hosts servers both Dell PowerEdge C6220
RHCI server Dell PowerEdge C6105

Now waiting for this step: 
Synchronize repository 'Red Hat CloudForms Management Engine 5.3 Files x86_64'; product 'Red Hat CloudForms'; organization 'Default Organization'


One more thing on initial bug I've mentioned deployment indicator was green 100%, this is another bug reporting wrong status. As I know for a fact that both host servers were stuck on PXE boot after installing base OS, couldn't have installed anything more than base OS. Meaning there is a problem with the progress indicator, giving false reports.

Comment 2 Tzach Shefi 2015-08-17 14:00:36 UTC
I've run into this same problem on same servers as below
Media ID 1439475247.560921  from Aug 13th if i recall. 

If any one needs access to servers in real time let me know.

Comment 3 Thom Carlin 2015-08-17 14:30:37 UTC
Seems to be in poll loop in Actions::Fusor::Deployment::Rhev::WaitForDataCenter

Comment 4 John Matthews 2015-08-18 14:02:02 UTC
Jason Montleon learned this is a hardware dependent issue between syslinux and the bios on the Dell PowerEdge c6220

http://www.syslinux.org/wiki/index.php/Hardware_Compatibility#LOCALBOOT


A syslinux bz has been filed here to track the request for a fix:
Bug 1254615 - Dell PowerEdge c6220 hangs on localboot after being provisioned by Satellite 6.1



> To work around it you can edit the 'PXELinux default local boot' template
> change .localboot 0 to:
>      COM32 chain.c32
>      APPEND hd0



If testing this on a Satellite with discovery.
Make the edits then put the hosts into a build state and take them out to overwrite their copies of the template.

Comment 5 Jason Montleon 2015-08-18 14:11:22 UTC
One note: If you edit 'PXELinux default local boot' after the hosts are provisioned and you hit the problem you're going to need to briefly put them in build and take them out to overwrite the individual host pxe configs or manually copy them in /var/lib/tftpboot/pxelinux.cfg, from default to each of the affected hosts.

Comment 6 Tzach Shefi 2015-08-24 06:01:13 UTC
FYI Flashing latest BIOS 1.1.19-> 2.5.3 didn't resolve.
Still happens (media 1440170112.364470  21 Aug 2015).

Comment 7 Thom Carlin 2015-08-25 16:21:04 UTC
Verified as restriction to be documented.

Comment 8 Jason Montleon 2016-09-16 18:30:35 UTC
Tzach Shefi, per https://bugzilla.redhat.com/show_bug.cgi?id=1254615 could you retry with the syslinux package from 7.3 beta?

Comment 9 Tzach Shefi 2016-09-18 06:41:35 UTC
Jason, don't wait for me on this. 
I'd love to help out, but my hardware has since been replaced from Dell servers to HPs.

My current focus as RHOS storage QE means I handle mostly RHOS storage and VMware interrogation testing. 

RHCI was a side job I was fortunate enough to help with a while back, not actively on going. 

Ask Roni Rasouli I think he does some RHOS+RHCI/FC testing, might have access to such Dell servers.

Comment 10 Jason Montleon 2016-09-19 12:22:26 UTC
Closing with insufficient data. If we can test we can reopen and properly verify. Truth is it is probably fixed, we just can't know.


Note You need to log in before you can comment on or make changes to this bug.