1234343 – 75 % success for introspection (VM)

Bug 1234343 - 75 % success for introspection (VM)

Summary: 75 % success for introspection (VM)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-rdomanager-oscplugin
Sub Component:
Version:	Director
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	ga
Target Release:	Director
Assignee:	Dmitry Tantsur
QA Contact:	Toure Dunnon
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1234956 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-06-22 12:08 UTC by Jaromir Coufal
Modified:	2023-02-22 23:02 UTC (History)
CC List:	14 users (show)
Fixed In Version:	python-rdomanager-oscplugin-0.0.8-14.el7ost
Doc Type:	Bug Fix
Doc Text:	Issues in the KVM PXE code displayed failures when too many nodes tried to PXE-boot simultaneously, resulting in some nodes failing to connect to DHCP. With this update, the sleep value is increased, allowing introspection on the nodes. As a result, DHCP is no longer an issue, making the introspection a little longer.
Clone Of:
Environment:
Last Closed:	2015-08-05 13:55:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
no bootable device for one of the nodes (843.76 KB, application/octet-stream) 2015-06-22 12:08 UTC, Jaromir Coufal	no flags	Details
dnsmasq log output (20.08 KB, text/plain) 2015-06-24 18:15 UTC, Ben Nemec	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gerrithub.io	237591	0	None	None	None	Never
Red Hat Product Errata	RHEA-2015:1549	0	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OpenStack Platform director Release	2015-08-05 17:49:10 UTC

Description Jaromir Coufal 2015-06-22 12:08:09 UTC

Created attachment 1041777 [details]
no bootable device for one of the nodes

Description of problem:
I am constantly getting failures for virtual machines introspection. THe percentage is about 75 % of success. The rest of the nodes are not able to get discovered and are returning "No bootable device." (screenshot attached). Few minutes ago I was able to discover 15 nodes of 20.

Version-Release number of selected component (if applicable):
2015-06-17.2 http://openstack.etherpad.corp.redhat.com/rhel-osp-director-puddle-2015-06-17-2

How reproducible:
75 % of time

Steps to Reproduce:
1. trigger introspection on multiple nodes

Actual results:
75 % of success

Expected results:
100 % of success

Comment 4 Dmitry Tantsur 2015-06-24 08:34:17 UTC

A couple of questions: is it always the same node? does the same thing happen with deploy?

Also please attach $ sudo journalctl -u openstack-ironic-discoverd-dnsmasq

CC'ing Lucas as he may know more about iPXE.

Comment 5 Jaromir Coufal 2015-06-24 10:52:01 UTC

Hey, so...

is it always the same node?
-- no, various nodes, not always the same ones

does the same thing happen with deploy?
-- no, deploy never got stuck with similar issue

I don't have the machine available anymore, so I cannot provide any other logs.

Comment 6 Dmitry Tantsur 2015-06-24 10:55:38 UTC

Ok, I will try to reproduce it myself. In the meanwhile, if someone has the same issue, I'm in bad need of logs, please provide some!

Comment 7 Ben Nemec 2015-06-24 18:15:48 UTC

Created attachment 1042800 [details]
dnsmasq log output

This is the openstack-ironic-discoverd-dnsmasq log output from a failing run.  The MAC address of the failed node is fa:16:3e:4e:ee:38, and it looks like it's the same address in use problem you had mentioned to me before.

Comment 8 Dmitry Tantsur 2015-06-25 07:58:08 UTC

Exactly. Do you think it's a good time to redirect this bug to kvm or whatever manages the PXE firmware? I think everybody here reproduced this bug at least once...

Comment 9 Dmitry Tantsur 2015-06-25 08:18:39 UTC

Oh btw, we had a sleep in our scripts:
https://github.com/rdo-management/instack-undercloud/blob/master/scripts/instack-ironic-deployment#L134
which is no longer in a new CLI:
https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/baremetal.py#L136-L145

We have to bring it back, I'll submit a patch.

Comment 10 Dmitry Tantsur 2015-06-25 08:34:44 UTC

And here's the patch: https://review.gerrithub.io/#/c/237591/

Comment 11 Marios Andreou 2015-06-26 10:00:53 UTC

heh dmitry its like deja vu. I was having issues with vm introspection last 2 days, especially yesterday lots of poking. I remember when this happened the first time round and the sleep was added ;)

Glad I came across this, will try out since am refreshing envs today for poodle. (I can only do vm envs)

Comment 13 Dmitry Tantsur 2015-06-29 08:18:05 UTC

Patch merged, I believe it will work around the problem.

Comment 14 James Slagle 2015-07-01 14:01:23 UTC

*** Bug 1234956 has been marked as a duplicate of this bug. ***

Comment 19 errata-xmlrpc 2015-08-05 13:55:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549

Note You need to log in before you can comment on or make changes to this bug.