Bug 1234343 - 75 % success for introspection (VM)
Summary: 75 % success for introspection (VM)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-rdomanager-oscplugin
Version: Director
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ga
: Director
Assignee: Dmitry Tantsur
QA Contact: Toure Dunnon
URL:
Whiteboard:
: 1234956 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-22 12:08 UTC by Jaromir Coufal
Modified: 2015-08-27 01:30 UTC (History)
15 users (show)

Fixed In Version: python-rdomanager-oscplugin-0.0.8-14.el7ost
Doc Type: Bug Fix
Doc Text:
Issues in the KVM PXE code displayed failures when too many nodes tried to PXE-boot simultaneously, resulting in some nodes failing to connect to DHCP. With this update, the sleep value is increased, allowing introspection on the nodes. As a result, DHCP is no longer an issue, making the introspection a little longer.
Clone Of:
Environment:
Last Closed: 2015-08-05 13:55:07 UTC
Target Upstream Version:


Attachments (Terms of Use)
no bootable device for one of the nodes (843.76 KB, application/octet-stream)
2015-06-22 12:08 UTC, Jaromir Coufal
no flags Details
dnsmasq log output (20.08 KB, text/plain)
2015-06-24 18:15 UTC, Ben Nemec
no flags Details


Links
System ID Priority Status Summary Last Updated
Gerrithub.io 237591 None None None Never
Red Hat Product Errata RHEA-2015:1549 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform director Release 2015-08-05 17:49:10 UTC

Description Jaromir Coufal 2015-06-22 12:08:09 UTC
Created attachment 1041777 [details]
no bootable device for one of the nodes

Description of problem:
I am constantly getting failures for virtual machines introspection. THe percentage is about 75 % of success. The rest of the nodes are not able to get discovered and are returning "No bootable device." (screenshot attached). Few minutes ago I was able to discover 15 nodes of 20.

Version-Release number of selected component (if applicable):
2015-06-17.2 http://openstack.etherpad.corp.redhat.com/rhel-osp-director-puddle-2015-06-17-2

How reproducible:
75 % of time

Steps to Reproduce:
1. trigger introspection on multiple nodes

Actual results:
75 % of success

Expected results:
100 % of success

Comment 4 Dmitry Tantsur 2015-06-24 08:34:17 UTC
A couple of questions: is it always the same node? does the same thing happen with deploy?

Also please attach $ sudo journalctl -u openstack-ironic-discoverd-dnsmasq

CC'ing Lucas as he may know more about iPXE.

Comment 5 Jaromir Coufal 2015-06-24 10:52:01 UTC
Hey, so...

is it always the same node?
-- no, various nodes, not always the same ones

does the same thing happen with deploy?
-- no, deploy never got stuck with similar issue

I don't have the machine available anymore, so I cannot provide any other logs.

Comment 6 Dmitry Tantsur 2015-06-24 10:55:38 UTC
Ok, I will try to reproduce it myself. In the meanwhile, if someone has the same issue, I'm in bad need of logs, please provide some!

Comment 7 Ben Nemec 2015-06-24 18:15:48 UTC
Created attachment 1042800 [details]
dnsmasq log output

This is the openstack-ironic-discoverd-dnsmasq log output from a failing run.  The MAC address of the failed node is fa:16:3e:4e:ee:38, and it looks like it's the same address in use problem you had mentioned to me before.

Comment 8 Dmitry Tantsur 2015-06-25 07:58:08 UTC
Exactly. Do you think it's a good time to redirect this bug to kvm or whatever manages the PXE firmware? I think everybody here reproduced this bug at least once...

Comment 9 Dmitry Tantsur 2015-06-25 08:18:39 UTC
Oh btw, we had a sleep in our scripts:
https://github.com/rdo-management/instack-undercloud/blob/master/scripts/instack-ironic-deployment#L134
which is no longer in a new CLI:
https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/baremetal.py#L136-L145

We have to bring it back, I'll submit a patch.

Comment 10 Dmitry Tantsur 2015-06-25 08:34:44 UTC
And here's the patch: https://review.gerrithub.io/#/c/237591/

Comment 11 Marios Andreou 2015-06-26 10:00:53 UTC
heh dmitry its like deja vu. I was having issues with vm introspection last 2 days, especially yesterday lots of poking. I remember when this happened the first time round and the sleep was added ;)

Glad I came across this, will try out since am refreshing envs today for poodle. (I can only do vm envs)

Comment 13 Dmitry Tantsur 2015-06-29 08:18:05 UTC
Patch merged, I believe it will work around the problem.

Comment 14 James Slagle 2015-07-01 14:01:23 UTC
*** Bug 1234956 has been marked as a duplicate of this bug. ***

Comment 19 errata-xmlrpc 2015-08-05 13:55:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549


Note You need to log in before you can comment on or make changes to this bug.