Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1234343 - 75 % success for introspection (VM)
75 % success for introspection (VM)
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-rdomanager-oscplugin (Show other bugs)
Director
Unspecified Unspecified
high Severity unspecified
: ga
: Director
Assigned To: Dmitry Tantsur
Toure Dunnon
: Triaged
: 1234956 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-06-22 08:08 EDT by Jaromir Coufal
Modified: 2015-08-26 21:30 EDT (History)
15 users (show)

See Also:
Fixed In Version: python-rdomanager-oscplugin-0.0.8-14.el7ost
Doc Type: Bug Fix
Doc Text:
Issues in the KVM PXE code displayed failures when too many nodes tried to PXE-boot simultaneously, resulting in some nodes failing to connect to DHCP. With this update, the sleep value is increased, allowing introspection on the nodes. As a result, DHCP is no longer an issue, making the introspection a little longer.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-08-05 09:55:07 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
no bootable device for one of the nodes (843.76 KB, application/octet-stream)
2015-06-22 08:08 EDT, Jaromir Coufal
no flags Details
dnsmasq log output (20.08 KB, text/plain)
2015-06-24 14:15 EDT, Ben Nemec
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Gerrithub.io 237591 None None None Never
Red Hat Product Errata RHEA-2015:1549 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform director Release 2015-08-05 13:49:10 EDT

  None (edit)
Description Jaromir Coufal 2015-06-22 08:08:09 EDT
Created attachment 1041777 [details]
no bootable device for one of the nodes

Description of problem:
I am constantly getting failures for virtual machines introspection. THe percentage is about 75 % of success. The rest of the nodes are not able to get discovered and are returning "No bootable device." (screenshot attached). Few minutes ago I was able to discover 15 nodes of 20.

Version-Release number of selected component (if applicable):
2015-06-17.2 http://openstack.etherpad.corp.redhat.com/rhel-osp-director-puddle-2015-06-17-2

How reproducible:
75 % of time

Steps to Reproduce:
1. trigger introspection on multiple nodes

Actual results:
75 % of success

Expected results:
100 % of success
Comment 4 Dmitry Tantsur 2015-06-24 04:34:17 EDT
A couple of questions: is it always the same node? does the same thing happen with deploy?

Also please attach $ sudo journalctl -u openstack-ironic-discoverd-dnsmasq

CC'ing Lucas as he may know more about iPXE.
Comment 5 Jaromir Coufal 2015-06-24 06:52:01 EDT
Hey, so...

is it always the same node?
-- no, various nodes, not always the same ones

does the same thing happen with deploy?
-- no, deploy never got stuck with similar issue

I don't have the machine available anymore, so I cannot provide any other logs.
Comment 6 Dmitry Tantsur 2015-06-24 06:55:38 EDT
Ok, I will try to reproduce it myself. In the meanwhile, if someone has the same issue, I'm in bad need of logs, please provide some!
Comment 7 Ben Nemec 2015-06-24 14:15:48 EDT
Created attachment 1042800 [details]
dnsmasq log output

This is the openstack-ironic-discoverd-dnsmasq log output from a failing run.  The MAC address of the failed node is fa:16:3e:4e:ee:38, and it looks like it's the same address in use problem you had mentioned to me before.
Comment 8 Dmitry Tantsur 2015-06-25 03:58:08 EDT
Exactly. Do you think it's a good time to redirect this bug to kvm or whatever manages the PXE firmware? I think everybody here reproduced this bug at least once...
Comment 9 Dmitry Tantsur 2015-06-25 04:18:39 EDT
Oh btw, we had a sleep in our scripts:
https://github.com/rdo-management/instack-undercloud/blob/master/scripts/instack-ironic-deployment#L134
which is no longer in a new CLI:
https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/baremetal.py#L136-L145

We have to bring it back, I'll submit a patch.
Comment 10 Dmitry Tantsur 2015-06-25 04:34:44 EDT
And here's the patch: https://review.gerrithub.io/#/c/237591/
Comment 11 Marios Andreou 2015-06-26 06:00:53 EDT
heh dmitry its like deja vu. I was having issues with vm introspection last 2 days, especially yesterday lots of poking. I remember when this happened the first time round and the sleep was added ;)

Glad I came across this, will try out since am refreshing envs today for poodle. (I can only do vm envs)
Comment 13 Dmitry Tantsur 2015-06-29 04:18:05 EDT
Patch merged, I believe it will work around the problem.
Comment 14 James Slagle 2015-07-01 10:01:23 EDT
*** Bug 1234956 has been marked as a duplicate of this bug. ***
Comment 19 errata-xmlrpc 2015-08-05 09:55:07 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549

Note You need to log in before you can comment on or make changes to this bug.