1353464 – Nodes time out pxe boot in virtual environment

Bug 1353464 - Nodes time out pxe boot in virtual environment

Summary: Nodes time out pxe boot in virtual environment

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	rhosp-director
Sub Component:
Version:	9.0 (Mitaka)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Angus Thomas
QA Contact:	Omri Hochman
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1411571
TreeView+	depends on / blocked

Reported:	2016-07-07 08:04 UTC by Marius Cornea
Modified:	2018-02-26 21:59 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-02-26 21:59:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
console screenshot (8.37 KB, image/png) 2016-07-07 08:11 UTC, Marius Cornea	no flags	Details
View All

Description Marius Cornea 2016-07-07 08:04:09 UTC

Description of problem:
I've encountered situations when the VMs time out pxe booting after the image deployment stage. I suspect this is caused by a high disk i/o load on the undercloud because I am able to reproduce this with a high number of nodes - 3 controllers, 2 computes, 1 ceph node. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy overcloud: 3 ctrl + 2 computes + 1 ceph node
2. Check the VMs console

Actual results:
We can observe that in the initial phase all the VMs pxe boot and do the image deployment. After the first reboot some of the nodes are unable to pxe boot (see attached screenshot). After manually rebooting the VM it is able to pxe boot correctly.  

Additional info:

I noticed that during the image deployment the undercloud load goes very high and I suspect this is causing the timeouts. I could see:

[root@undercloud ~]# uptime
 03:34:41 up 2 days, 18:33,  2 users,  load average: 13.40, 6.09, 2.95

As a workaround I set the max_concurrent_builds to 2 in nova.conf so it limits the simultaneous instances build to 2:

crudini --set /etc/nova/nova.conf DEFAULT max_concurrent_builds 2; openstack-service restart nova

This is more an environmental issue imo but I opened it in case others hit this issue as well.

Comment 2 Marius Cornea 2016-07-07 08:11:01 UTC

Created attachment 1177212 [details]
console screenshot

Comment 3 Bob Fournier 2018-02-26 21:59:26 UTC

Closing this out, as its an issue with the environment no fix is planned.

Note You need to log in before you can comment on or make changes to this bug.