Bug 1389967

Summary: Introspection is not getting completed
Product: Red Hat OpenStack Reporter: VIKRANT <vaggarwa>
Component: openstack-ironic-python-agentAssignee: Dmitry Tantsur <dtantsur>
Status: CLOSED NOTABUG QA Contact: Raviv Bar-Tal <rbartal>
Severity: high Docs Contact:
Priority: high    
Version: 9.0 (Mitaka)CC: dtantsur, jwang, mburns, raywang, slinaber, vaggarwa
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 09:47:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
ipxe wont boot
none
/var/log/message of the introspected node none

Description VIKRANT 2016-10-30 04:46:42 UTC
Description of problem:

Overcloud nodes HW : IBM x3650 M4

Introspection for overcloud nodes never getting completed.

[stack@undercloud ~]$ ironic node-list                                                                                                                      
+--------------------------------------+------------------------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name                   | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------------------------+---------------+-------------+--------------------+-------------+
| 1e72cddd-1063-454d-b76c-82611dd840b9 | overcloud-controller-1 | None          | power off   | available          | False       |
| 96f3c503-59bd-4771-90ee-e3d74af40cbe | overcloud-controller-2 | None          | power off   | available          | False       |
| 70f884be-348c-42fa-9d3d-d5ba4b9a8f8c | overcloud-controller-3 | None          | power off   | available          | False       |
| 2912a594-ba73-4971-92c8-a9f7ea3e17a5 | overcloud-compute-1    | None          | power off   | available          | False       |
| e7e51952-f639-493b-abee-1fe6fc43e6ee | overcloud-compute-2    | None          | power off   | available          | False       |
| a48fc89b-9479-44fb-9bcc-a806da39b659 | overcloud-compute-3    | None          | power off   | available          | False       |
| 7c243947-37c9-4adc-a563-cfcb3ce459c5 | overcloud-storage-1    | None          | power off   | available          | False       |
| 9cb9d69a-7aba-4a91-8d05-0400a0992159 | overcloud-storage-2    | None          | power off   | available          | False       |
| 9f92ebac-b0d7-4a44-98f4-ec5b50fc38ef | overcloud-storage-3    | None          | power off   | available          | False       |
+--------------------------------------+------------------------+---------------+-------------+--------------------+-------------+
[stack@undercloud ~]$ openstack baremetal introspection bulk start                                                                                          
Setting nodes for introspection to manageable...
Starting introspection of node: 1e72cddd-1063-454d-b76c-82611dd840b9
Starting introspection of node: 96f3c503-59bd-4771-90ee-e3d74af40cbe
Starting introspection of node: 70f884be-348c-42fa-9d3d-d5ba4b9a8f8c
Starting introspection of node: 2912a594-ba73-4971-92c8-a9f7ea3e17a5
Starting introspection of node: e7e51952-f639-493b-abee-1fe6fc43e6ee
Starting introspection of node: a48fc89b-9479-44fb-9bcc-a806da39b659
Starting introspection of node: 7c243947-37c9-4adc-a563-cfcb3ce459c5
Starting introspection of node: 9cb9d69a-7aba-4a91-8d05-0400a0992159
Starting introspection of node: 9f92ebac-b0d7-4a44-98f4-ec5b50fc38ef
Waiting for introspection to finish...

****IT'S getting stuck here forever.****

Version-Release number of selected component (if applicable):
RHEL OSP 9 

How reproducible:


Steps to Reproduce:
1. Try to do the introspection of overcloud nodes having HW "IBM x3650 M4".
2. Introspection is never getting completed. 
3.

Actual results:
Introspection never getting completed. 

Expected results:
Introspection should get completed successfully. 

Additional info:

More information coming in internal comments.

Comment 1 VIKRANT 2016-10-30 04:48:06 UTC
Tweaked the agent.ramdisk, and add root passwd for agent.ramdisk following official Red Hat documentation. 

Confirmed that HW is certified :

https://access.redhat.com/ecosystem/hardware/956863

Comment 3 Dmitry Tantsur 2016-10-31 08:12:23 UTC
Hi! The screenshot was taken too late, when introspection was already finished. Could you please find the moment when it actually tries to post introspection data? Also please attach openstack-ironic-inspector logs.

Comment 10 Dmitry Tantsur 2016-11-02 10:05:56 UTC
Please tell me more about your environment. E.g. it seems like you're using PXE instead of iPXE, why is that? Please attach /etc/ironic-inspector/dnsmasq.conf and /tftpboot/pxelinux.cfg/default.

Comment 12 VIKRANT 2016-11-02 10:14:10 UTC
AFAIK, they have changed from ipxe to pxe because of the following bug. 

change ironic to use pxe istead ipxe as steps ironic-ipxe-to-pxe  

    https://bugzilla.redhat.com/show_bug.cgi?id=1326086#c4

Comment 13 Dmitry Tantsur 2016-11-02 10:16:06 UTC
I don't think it is needed any more, especially for OSP 9. I'd like to know the details of the original issue then.

Comment 14 Ray Wang 2016-11-02 10:56:16 UTC
Created attachment 1216481 [details]
ipxe wont boot

Comment 15 Ray Wang 2016-11-02 10:57:49 UTC
If we use ipxe instead of PXE, the server will not boot at all.

Comment 16 Dmitry Tantsur 2016-11-02 11:06:10 UTC
Hi! I see that the MAC uses for iPXE (98:be:94:41:5d:bb) differs from one used for PXE (98:be:94:41:5e:9b). Have you changed boot order when switching to PXE? Have you tried making iPXE boot from the latter NIC?

Also please attach the files requested in comment 10.

Comment 17 Ray Wang 2016-11-02 11:43:56 UTC
Created attachment 1216511 [details]
/var/log/message of the introspected node

The first NIC of IBM x3650 M4 self-contain a DHCP server, and it also share IPMI port with NIC, so the first NIC can not be PXEed, and can not use dhcp to get an IP. 

So the customer disables PXE on first NIC, and enables PXE on second NIC.

I followed the link,  https://access.redhat.com/articles/2142881https://access.redhat.com/articles/2142881

The content of default file

/tftp/pxelinux.cfg/default

default discovery

LABEL discovery
kernel agent.kernel
append initrd=agent.ramdisk inspector_callback_url=http://172.16.0.1:5050/v1/continue RUNBENCH=0
ipappend 3

Comment 18 Dmitry Tantsur 2016-11-02 11:50:48 UTC
inspector_callback_url should be inspection_callback_url :) chances are high that will fix your PXE problem. Could you please try? The customer portal documentation is purely wrong, I'll file a separate bug against it.

As to iPXE, I wonder why it switches to the first NIC anyway..

Comment 19 Dmitry Tantsur 2016-11-02 12:09:04 UTC
Oh, sorry, it should be ipa-inspection-callback-url even

Comment 20 Dmitry Tantsur 2016-11-02 12:15:04 UTC
I've filed a documentation bug: https://bugzilla.redhat.com/show_bug.cgi?id=1391019. Please check if the recommendations there help your case.

Comment 21 Ray Wang 2016-11-03 09:18:18 UTC
Hey

Thanks for the helps, "ipa-inspection-callback-url" works for me. I followed the updated document, and introspection can be finished successfully. 

Thank you so much again.

Comment 22 Dmitry Tantsur 2016-11-03 09:47:43 UTC
Thanks for confirming!