Bug 1323444

Summary: Introspection completes but with traceback: Request returned failure status. Error contacting Ironic server
Product: Red Hat OpenStack Reporter: Tzach Shefi <tshefi>
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED CURRENTRELEASE QA Contact: Arik Chernetsky <achernet>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: dbecker, dsneddon, jcoufal, mburns, morazi, rhel-osp-director-maint
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-14 16:41:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Tzach Shefi 2016-04-03 05:51:37 UTC
Description of problem: While introspecting two BM nodes, one Intel and one AMD, the AMD nodes causes errors/warnings to show up, even thou it says introspection completed successfully for both nodes. 

I've installed/created images on seal13 Intel CPU, on the previous attempt same output OPSD was installed on an AMD node, not sure it's relevant just adding info. 

Version-Release number of selected component (if applicable):
RHEL7.2 
[root@seal13 ~]# rpm -qa | grep ironic
openstack-ironic-conductor-4.2.2-4.el7ost.noarch
python-ironicclient-0.8.1-1.el7ost.noarch
openstack-ironic-api-4.2.2-4.el7ost.noarch
python-ironic-inspector-client-1.2.0-6.el7ost.noarch
openstack-ironic-inspector-2.2.5-2.el7ost.noarch
openstack-ironic-common-4.2.2-4.el7ost.noarch
diskimage-builder-1.11.1-1.el7ost.noarch
openstack-tripleo-image-elements-0.9.9-1.el7ost.noarch
genisoimage-1.1.11-23.el7.x86_64


How reproducible:
Happened twice on two separate build the current one and the previous build. 


Steps to Reproduce:
1. Install OSPD
2. Add two or more BM nodes, Intel and AMD mix.
3. Introspect nodes, see error on AMD node. 

Actual results:

stack@seal13 ~]$ openstack baremetal introspection bulk start
Setting nodes for introspection to manageable...
Starting introspection of node: c009328b-f0d8-485e-b27d-91c8c8b48fe5
Starting introspection of node: 3d9c2f3e-15d4-41f9-8016-ee8f758f7c1a
Waiting for introspection to finish...
Introspection for UUID c009328b-f0d8-485e-b27d-91c8c8b48fe5 finished successfully.
Introspection for UUID 3d9c2f3e-15d4-41f9-8016-ee8f758f7c1a finished successfully.
Setting manageable nodes to available...
Node c009328b-f0d8-485e-b27d-91c8c8b48fe5 has been set to available.
Request returned failure status.
Error contacting Ironic server: Node 3d9c2f3e-15d4-41f9-8016-ee8f758f7c1a is locked by host seal13.qa.lab.tlv.redhat.com, please retry after the current operation is completed.
Traceback (most recent call last):

The rest of the traceback is posted here: 
http://pastebin.test.redhat.com/361980


Expected results:
If introspection finished successfully, I don't expect to see any alarming errors/warnings.

Additional info:
This bug came out of comment:
https://bugzilla.redhat.com/show_bug.cgi?id=1316550#c37

Comment 2 Tzach Shefi 2016-04-03 06:38:25 UTC
To further pinpoint issue, I've deleted both nodes used before (intel+amd), imported a new instack file with only two Intel nodes.

introspection completed but with same errors/warnings:
http://pastebin.test.redhat.com/361982 

Updating bug title, not related to AMD CPU.

Comment 3 Mike Burns 2016-04-07 21:36:02 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 4 Dan Sneddon 2016-10-14 16:41:06 UTC
(In reply to Tzach Shefi from comment #0)
> Description of problem: While introspecting two BM nodes, one Intel and one
> AMD, the AMD nodes causes errors/warnings to show up, even thou it says
> introspection completed successfully for both nodes. 
> 
> I've installed/created images on seal13 Intel CPU, on the previous attempt
> same output OPSD was installed on an AMD node, not sure it's relevant just
> adding info. 

The messages that indicated that a node was locked can be safely ignored. This was a race condition that caused spurious warnings, but did not cause ill effects. This race condition has since been fixed, and these warnings no longer occur.