Bug 1353464 - Nodes time out pxe boot in virtual environment
Summary: Nodes time out pxe boot in virtual environment
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Angus Thomas
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks: 1411571
TreeView+ depends on / blocked
 
Reported: 2016-07-07 08:04 UTC by Marius Cornea
Modified: 2018-02-26 21:59 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-26 21:59:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
console screenshot (8.37 KB, image/png)
2016-07-07 08:11 UTC, Marius Cornea
no flags Details

Description Marius Cornea 2016-07-07 08:04:09 UTC
Description of problem:
I've encountered situations when the VMs time out pxe booting after the image deployment stage. I suspect this is caused by a high disk i/o load on the undercloud because I am able to reproduce this with a high number of nodes - 3 controllers, 2 computes, 1 ceph node. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy overcloud: 3 ctrl + 2 computes + 1 ceph node
2. Check the VMs console

Actual results:
We can observe that in the initial phase all the VMs pxe boot and do the image deployment. After the first reboot some of the nodes are unable to pxe boot (see attached screenshot). After manually rebooting the VM it is able to pxe boot correctly.  

Additional info:

I noticed that during the image deployment the undercloud load goes very high and I suspect this is causing the timeouts. I could see:

[root@undercloud ~]# uptime
 03:34:41 up 2 days, 18:33,  2 users,  load average: 13.40, 6.09, 2.95

As a workaround I set the max_concurrent_builds to 2 in nova.conf so it limits the simultaneous instances build to 2:

crudini --set /etc/nova/nova.conf DEFAULT max_concurrent_builds 2; openstack-service restart nova

This is more an environmental issue imo but I opened it in case others hit this issue as well.

Comment 2 Marius Cornea 2016-07-07 08:11:01 UTC
Created attachment 1177212 [details]
console screenshot

Comment 3 Bob Fournier 2018-02-26 21:59:26 UTC
Closing this out, as its an issue with the environment no fix is planned.


Note You need to log in before you can comment on or make changes to this bug.