Bug 1234343

Summary:

75 % success for introspection (VM)

Product:

Red Hat OpenStack

Reporter:

Jaromir Coufal <jcoufal>

Component:

python-rdomanager-oscplugin

Assignee:

Dmitry Tantsur <dtantsur>

Status:

CLOSED ERRATA

QA Contact:

Toure Dunnon <tdunnon>

Severity:

unspecified

Docs Contact:

Priority:

high

Version:

Director

CC:

bnemec, calfonso, dnavale, jcoufal, jliberma, jslagle, jtrowbri, lmartins, mandreou, mburns, psedlak, rhel-osp-director-maint, tdunnon, yeylon

Target Milestone:

Keywords:

Triaged

Target Release:

Director

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

python-rdomanager-oscplugin-0.0.8-14.el7ost

Doc Type:

Bug Fix

Doc Text:

Issues in the KVM PXE code displayed failures when too many nodes tried to PXE-boot simultaneously, resulting in some nodes failing to connect to DHCP. With this update, the sleep value is increased, allowing introspection on the nodes. As a result, DHCP is no longer an issue, making the introspection a little longer.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-08-05 13:55:07 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
no bootable device for one of the nodes	none
dnsmasq log output	none

Description Jaromir Coufal 2015-06-22 12:08:09 UTC

Created attachment 1041777 [details]
no bootable device for one of the nodes

Description of problem:
I am constantly getting failures for virtual machines introspection. THe percentage is about 75 % of success. The rest of the nodes are not able to get discovered and are returning "No bootable device." (screenshot attached). Few minutes ago I was able to discover 15 nodes of 20.

Version-Release number of selected component (if applicable):
2015-06-17.2 http://openstack.etherpad.corp.redhat.com/rhel-osp-director-puddle-2015-06-17-2

How reproducible:
75 % of time

Steps to Reproduce:
1. trigger introspection on multiple nodes

Actual results:
75 % of success

Expected results:
100 % of success

Comment 4 Dmitry Tantsur 2015-06-24 08:34:17 UTC

A couple of questions: is it always the same node? does the same thing happen with deploy?

Also please attach $ sudo journalctl -u openstack-ironic-discoverd-dnsmasq

CC'ing Lucas as he may know more about iPXE.

Comment 5 Jaromir Coufal 2015-06-24 10:52:01 UTC

Hey, so...

is it always the same node?
-- no, various nodes, not always the same ones

does the same thing happen with deploy?
-- no, deploy never got stuck with similar issue

I don't have the machine available anymore, so I cannot provide any other logs.

Comment 6 Dmitry Tantsur 2015-06-24 10:55:38 UTC

Ok, I will try to reproduce it myself. In the meanwhile, if someone has the same issue, I'm in bad need of logs, please provide some!

Comment 7 Ben Nemec 2015-06-24 18:15:48 UTC

Created attachment 1042800 [details]
dnsmasq log output

This is the openstack-ironic-discoverd-dnsmasq log output from a failing run.  The MAC address of the failed node is fa:16:3e:4e:ee:38, and it looks like it's the same address in use problem you had mentioned to me before.

Comment 8 Dmitry Tantsur 2015-06-25 07:58:08 UTC

Exactly. Do you think it's a good time to redirect this bug to kvm or whatever manages the PXE firmware? I think everybody here reproduced this bug at least once...

Comment 9 Dmitry Tantsur 2015-06-25 08:18:39 UTC

Oh btw, we had a sleep in our scripts:
https://github.com/rdo-management/instack-undercloud/blob/master/scripts/instack-ironic-deployment#L134
which is no longer in a new CLI:
https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/baremetal.py#L136-L145

We have to bring it back, I'll submit a patch.

Comment 10 Dmitry Tantsur 2015-06-25 08:34:44 UTC

And here's the patch: https://review.gerrithub.io/#/c/237591/

Comment 11 Marios Andreou 2015-06-26 10:00:53 UTC

heh dmitry its like deja vu. I was having issues with vm introspection last 2 days, especially yesterday lots of poking. I remember when this happened the first time round and the sleep was added ;)

Glad I came across this, will try out since am refreshing envs today for poodle. (I can only do vm envs)

Comment 13 Dmitry Tantsur 2015-06-29 08:18:05 UTC

Patch merged, I believe it will work around the problem.

Comment 14 James Slagle 2015-07-01 14:01:23 UTC

*** Bug 1234956 has been marked as a duplicate of this bug. ***

Comment 19 errata-xmlrpc 2015-08-05 13:55:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549