Bug 1297631

Summary: Introspection of a node (VM) fails consistently
Product: Red Hat OpenStack Reporter: Ruchika K <rkharwar>
Component: ipxeAssignee: Dmitry Tantsur <dtantsur>
Status: CLOSED INSUFFICIENT_DATA QA Contact: yeylon <yeylon>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: apevec, athomas, dtantsur, jcoufal, jslagle, lhh, mburns, rhel-osp-director-maint, rkharwar, srevivo, yeylon
Target Milestone: gaKeywords: Reopened
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-04 10:25:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Result of on undercloud "sudo journalctl -l -u openstack-ironic-discoverd -u openstack-ironic-discoverd-dnsmasq -u openstack-ironic-conductor"
none
screenshot of a node failing introspection (riming out on ipxe/dhcp)
none
undercloud.conf for 7.3 deployment none

Description Ruchika K 2016-01-12 04:33:43 UTC
Created attachment 1113837 [details]
Result of on undercloud "sudo journalctl -l -u openstack-ironic-discoverd -u openstack-ironic-discoverd-dnsmasq -u openstack-ironic-conductor"

Description of problem:

Director 7.2 (Latest)
- Latest packages installed
- Undercloud installed in a VM successfully
- A few nodes created in VMs on the same host
- introspection of a node consistently fails/hangs (this used to work fine in a previous version of Director)

- I can see the node powered on after the introspection is started.
- Logs show a successful DHCP handshake however calls to _get_hosts .. for the node under introspecton as well as others continues infinitely.

Version-Release number of selected component (if applicable):


How reproducible:

100%
Steps to Reproduce:
1. 
2. 
3.

Actual results:


Expected results:


Additional info:

Comment 2 Dmitry Tantsur 2016-01-12 13:04:08 UTC
Hi! I see messages like:

 INFO:ironic_discoverd.utils:Node e49acef4-ab33-4a2f-8f90-18f036f22cb8 is in maintenance mode, skipping provision states check

could you take a look why your node ended up in maintenance mode? `ironic node-show e49acef4-ab33-4a2f-8f90-18f036f22cb8` could probably help. Also please paste this node-show output here, as well as `ironic node-port-list` for relevant nodes.

Also please connect to the node console (with virsh or virt-manager) and see what is going on there.

Comment 3 Ruchika K 2016-01-12 14:59:15 UTC
As per 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html-single/Director_Installation_and_Usage/index.html#sect-Inspecting_the_Hardware_of_Nodes
Section 6.2.2.

The node was set to maintenance mode
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html-single/Director_Installation_and_Usage/index.html#sect-Inspecting_the_Hardware_of_Nodes


$ ironic node-set-maintenance [NODE UUID] true


[stack@undercloud ~]$ ironic node-list
+--------------------------------------+-----------------------+---------------+-------------+-----------------+-------------+
| UUID                                 | Name                  | Instance UUID | Power State | Provision State | Maintenance |
+--------------------------------------+-----------------------+---------------+-------------+-----------------+-------------+
| 4463c11a-b88b-4323-b1d9-f5092caf3e63 | overcloud-ceph1       | None          | power on    | available       | True        |
| c5ec2880-c717-42e5-b249-9c5c8e5e08ed | overcloud-ceph2       | None          | power off   | available       | False       |
| 7a779f7e-0753-4d34-b405-f43bae79766b | overcloud-ceph3       | None          | power off   | available       | False       |
| a0b15bd7-9f6a-48f2-8197-d036da007acb | overcloud-compute1    | None          | power off   | available       | False       |
| 158cb379-8ac4-4ab1-ba66-e14034e6bfad | overcloud-compute2    | None          | power off   | available       | False       |
| 404d32ae-632a-494c-871f-dbab39f86df7 | overcloud-controller1 | None          | power off   | available       | False       |
| 666fd8f0-3090-4191-b7d1-90e9f2c6b285 | overcloud-controller2 | None          | power off   | available       | False       |
| bf4a4ef0-5493-4821-9eb2-368628acd4ee | overcloud-controller3 | None          | power off   | available       | False       |
+--------------------------------------+-----------------------+---------------+-------------+-----------------+-------------+
[stack@undercloud ~]$ ironic node-show 4463c11a-b88b-4323-b1d9-f5092caf3e63
+------------------------+--------------------------------------------------------------------------+
| Property               | Value                                                                    |
+------------------------+--------------------------------------------------------------------------+
| target_power_state     | None                                                                     |
| extra                  | {u'on_discovery': u'true'}                                               |
| last_error             | None                                                                     |
| updated_at             | 2016-01-11T22:16:24+00:00                                                |
| maintenance_reason     | None                                                                     |
| provision_state        | available                                                                |
| uuid                   | 4463c11a-b88b-4323-b1d9-f5092caf3e63                                     |
| console_enabled        | False                                                                    |
| target_provision_state | None                                                                     |
| maintenance            | True                                                                     |
| inspection_started_at  | None                                                                     |
| inspection_finished_at | None                                                                     |
| power_state            | power on                                                                 |
| driver                 | pxe_ssh                                                                  |
| reservation            | None                                                                     |
| properties             | {u'memory_mb': u'1024', u'cpu_arch': u'x86_64', u'local_gb': u'10',      |
|                        | u'cpus': u'1', u'capabilities': u'boot_option:local'}                    |
| instance_uuid          | None                                                                     |
| name                   | overcloud-ceph1                                                          |
| driver_info            | {u'ssh_username': u'stack', u'deploy_kernel': u'529dc2ac-ee51-4367-adfc- |
|                        | 3a9e086230c6', u'deploy_ramdisk': u'c9213d32-95b0-4dad-                  |
|                        | 9f62-8d9f6aeb5623', u'ssh_key_contents': u'-----BEGIN RSA PRIVATE KEY--- |
|                        | --                                                                       |
|                        | MIIEogIBAAKCAQEA4cmvRw3R4M2dIu6OpGdPURjHgnUzjHcNnIbqnLxuzEA1zdGp         |
|                        | 7l                                                                       |
|                        | RwPc35dnXmt09xbGPJ7lKUxIDiCnBMl9quWKPR+M2u74BRpypHx1Z32Wmwg4Of           |
|                        | w5q+j6Al                                                                 |
|                        | 3yl25UiNdAtsSiMaTcHkAfcpZgf42bt6eRxTehEV/mKfOTmaqw7FW0uH                 |
|                        | vnGvod65S+FU57                                                           |
|                        | v8cg8HrupWk6eXxiGTMRHIyaubJ0823emJgb7qjkbLpnn8Du1g                       |
|                        | D6xtXkT1/+nuKavwWnLe                                                     |
|                        | dSTVVFKFGa8ugmHgeaBM4QBE8kuSKqHfAdtdNF9wkoPr                             |
|                        | skHks3uzOXQysyh1w8O10eRgXf                                               |
|                        | XmEmqTlGVx7wIDAQABAoIBACUjo60wXMF5kMta                                   |
|                        | KiRoyecxCEAxPxVvz9Fbb+PwKtl2BmOg                                         |
|                        | hS8qvHuuEcamhhjI/IMzttd4xfe8q3HE                                         |
|                        | HxUrZ1o1OCiQzKGgnc29aqkjU/tzIxG+6Nyn64                                   |
|                        | h8cz5N97ynPn1EE7/uHjmEFxkr                                               |
|                        | qqeZ0BkgeXjKbAC8Jr39QuuKyiIwsK8DPK8MFAIP9/HF                             |
|                        | TNmasBeF9y8S1i9a01rE                                                     |
|                        | L27Ztao7qMhqBg2Gfrs8BSTmy38t7BuwakeIkbajckUSaT1DIv                       |
|                        | 9Ou8oQcvYt+n4P                                                           |
|                        | ZX0e18G9NIOEW+nuMB8kFvm2i9bnZsZkpnAUzxEcSNHU/gkM2k+psxqF                 |
|                        | OwZebwU5                                                                 |
|                        | O4mNCcECgYEA9tPrTT93nUxyZjI9wewN4r/XVc8gwKFBYWqlHe5mELSCjMYs9U           |
|                        | Uy                                                                       |
|                        | ug3tPn8F9Ui5Xk2EoHNHkMyql+XoAWsi/uFZPwbwpG0q3x+9ismboQu5pp5gc4WH         |
|                        | f9                                                                       |
|                        | NQ9YCzOUZg1iM4XHNr/qvS1PBkh77aYAYLDUzbKJ5hiqpQYSBhy4cCgYEA6i2c           |
|                        | jgiZrbAL                                                                 |
|                        | ez1rwJrM7pzkCefonii1wA5dWAa0WdpstoLiRNZBTP00jGfhjV3QTn62                 |
|                        | mBiS/1qrurSKj8                                                           |
|                        | GQ0kIPMftzDO8cnQvaMALDqYUmnqTS78R8x6WVWaVP55N8hWYw                       |
|                        | j/zuC+K5QYAc3+kfXVoT                                                     |
|                        | Z1nOqrJXVwuTSsy40FkCgYAjB55sXyaFr3TI5jZ3kB3E                             |
|                        | YX+ZEQVP8VLLFYyLe+sGUef5PK                                               |
|                        | LiyEhTuWhDJ1ncHs8YAB5jexjcBv/rANj1YpQb                                   |
|                        | 4jV9SWnbnBaqheGrkcNBjt1xNSbxHjFF                                         |
|                        | xeLGhNZquX9CxMrZ7BOWmCIa0GckEMUD                                         |
|                        | PbhR0eeEkz26pUM1FZhrfwKBgEds1ARKQT1Fpa                                   |
|                        | rYKAZd8MWSmscesceTmSPT/cp8                                               |
|                        | eQOy6Feegg8G3nHyBNYSVSw+Aev/IAgx7pvt9tUCfgSs                             |
|                        | wFQxC9tt20CFqc+IrurX                                                     |
|                        | 3P/WedoHYcL5xilKqsvl7QIv7NnvOj6goaaEZ4a/4Y611vgtIh                       |
|                        | /yt2M+8/67rBgz                                                           |
|                        | auc5AoGAZ4NkMQBd+J2FQ5brjy0NNn61Dv+UGdyFpRwyzWDEzew31yrA                 |
|                        | ghU/0dHc                                                                 |
|                        | L2WYFT4rANJ7RP7z8K9M7Xvue21NbXWXy3mCV00OQjxzp4WmEWK2LLyGO2UVFJ           |
|                        | hp                                                                       |
|                        | 66MlzDT80sIZ6LQRPQ5IdePou82z3p6NNcRpsezFzc8XHFMgi7o=                     |
|                        | -----END RSA                                                             |
|                        | PRIVATE KEY-----', u'ssh_virt_type': u'virsh', u'ssh_address':           |
|                        | u'192.168.122.1'}                                                        |
| created_at             | 2016-01-11T22:12:33+00:00                                                |
| driver_internal_info   | {}                                                                       |
| chassis_uuid           |                                                                          |
| instance_info          | {}                                                                       |
+------------------------+--------------------------------------------------------------------------+



[stack@undercloud ~]$ ironic node-port-list 4463c11a-b88b-4323-b1d9-f5092caf3e63
+--------------------------------------+-------------------+
| UUID                                 | Address           |
+--------------------------------------+-------------------+
| f0d33ee9-1175-407d-9a78-3edc2002fef1 | 52:54:00:c6:aa:06 |
+--------------------------------------+-------------------+

Comment 4 Ruchika K 2016-01-12 19:33:28 UTC
there is no virsh console output from the node being introspected.

Comment 5 Dmitry Tantsur 2016-01-13 10:12:08 UTC
Please use virt-manager then.

As to maintenance mode, I think the documentation is outdated, but it should not affect the introspection that much..

Comment 6 Angus Thomas 2016-02-04 14:29:21 UTC
Hi. I'm closing this since we we're able to diagnose or reproduce the problem. 

If you're able to reproduce the bug on the latest version, and to provide the diagnostic information requested, please feel free to re-open then bug.

Comment 7 Ruchika K 2016-03-03 19:40:52 UTC
Created attachment 1132902 [details]
screenshot of a node failing introspection (riming out on ipxe/dhcp)

Comment 8 Ruchika K 2016-03-03 19:43:00 UTC
the attachment is for a new VM based deployment 
- with director 7.3 images
- latest public director install.

Comment 9 Ruchika K 2016-03-04 03:01:10 UTC
The logs indicate that there was no dhcp range available for br-ctlplane which is odd (since the undercloud.conf specifies the dhcp range) 
attaching .. 
Not sure if the warnings of unable to connect to Ironic or Keystone in the beginning of the log are really error worth.

-- Logs begin at Thu 2016-03-03 14:24:52 EST, end at Thu 2016-03-03 21:55:09 EST. --
Mar 03 14:25:05 undercloud systemd[1]: Started Hardware introspection service for OpenStack Ironic.
Mar 03 14:25:05 undercloud systemd[1]: Starting Hardware introspection service for OpenStack Ironic...
Mar 03 14:25:17 undercloud ironic-discoverd[507]: WARNING:ironic_discoverd.main:Unable to connect to Ironic or Keystone, retrying 6 times more: Authorization Failed: Unable t
Mar 03 14:25:27 undercloud ironic-discoverd[507]: WARNING:ironic_discoverd.main:Unable to connect to Ironic or Keystone, retrying 5 times more: Authorization Failed: Unable t
Mar 03 14:25:36 undercloud systemd[1]: Starting PXE boot dnsmasq service for ironic-discoverd...
Mar 03 14:25:36 undercloud dnsmasq[2599]: started, version 2.66 DNS disabled
Mar 03 14:25:36 undercloud dnsmasq[2599]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth
Mar 03 14:25:36 undercloud dnsmasq-dhcp[2599]: DHCP, IP range 192.0.2.100 -- 192.0.2.120, lease time 2m
Mar 03 14:25:36 undercloud dnsmasq-tftp[2599]: TFTP root is /tftpboot
Mar 03 14:25:37 undercloud ironic-discoverd[507]: WARNING:ironic_discoverd.main:Unable to connect to Ironic or Keystone, retrying 4 times more: Authorization Failed: Unable t
Mar 03 14:25:37 undercloud systemd[1]: Started PXE boot dnsmasq service for ironic-discoverd.
Mar 03 14:25:47 undercloud ironic-discoverd[507]: WARNING:ironic_discoverd.main:Unable to connect to Ironic or Keystone, retrying 3 times more: Authorization Failed: Unable t
Mar 03 14:25:57 undercloud ironic-discoverd[507]: WARNING:ironic_discoverd.main:Unable to connect to Ironic or Keystone, retrying 2 times more: Authorization Failed: Unable t
Mar 03 14:26:07 undercloud ironic-discoverd[507]: WARNING:ironic_discoverd.main:Unable to connect to Ironic or Keystone, retrying 1 times more: Authorization Failed: Unable t
Mar 03 14:26:42 undercloud ironic-discoverd[507]: INFO:ironic_discoverd.main:Enabled processing hooks: ['ramdisk_error', 'root_device_hint', 'scheduler', 'validate_interfaces
Mar 03 14:26:42 undercloud ironic-discoverd[507]: INFO:werkzeug: * Running on http://0.0.0.0:5050/
Mar 03 14:27:56 undercloud ironic-discoverd[507]: INFO:werkzeug:192.168.122.240 - - [03/Mar/2016 14:27:56] "POST /v1/introspection/f59938c4-ee1e-4527-8607-5fab74d82fad HTTP/1
Mar 03 14:27:58 undercloud ironic-discoverd[507]: INFO:ironic_discoverd.introspect:Whitelisting MAC's [u'52:54:00:29:67:56'] for node f59938c4-ee1e-4527-8607-5fab74d82fad on 
Mar 03 14:28:05 undercloud ironic-discoverd[507]: INFO:werkzeug:192.168.122.240 - - [03/Mar/2016 14:28:05] "POST /v1/introspection/03bc3679-e36e-4023-942c-c26fd994a409 HTTP/1
Mar 03 14:28:07 undercloud ironic-discoverd[507]: INFO:ironic_discoverd.introspect:Whitelisting MAC's [u'52:54:00:48:c6:2f'] for node 03bc3679-e36e-4023-942c-c26fd994a409 on 
Mar 03 14:28:14 undercloud ironic-discoverd[507]: INFO:werkzeug:192.168.122.240 - - [03/Mar/2016 14:28:14] "POST /v1/introspection/73ce15ee-6bea-4262-95e8-ff5a700f2097 HTTP/1
Mar 03 14:28:15 undercloud ironic-discoverd[507]: INFO:ironic_discoverd.introspect:Whitelisting MAC's [u'52:54:00:26:0b:f0'] for node 73ce15ee-6bea-4262-95e8-ff5a700f2097 on 
Mar 03 14:28:19 undercloud ironic-discoverd[507]: INFO:werkzeug:192.168.122.240 - - [03/Mar/2016 14:28:19] "GET /v1/introspection/f59938c4-ee1e-4527-8607-5fab74d82fad HTTP/1.
Mar 03 14:28:19 undercloud ironic-discoverd[507]: INFO:werkzeug:192.168.122.240 - - [03/Mar/2016 14:28:19] "GET /v1/introspection/03bc3679-e36e-4023-942c-c26fd994a409 HTTP/1.
Mar 03 14:28:19 undercloud ironic-discoverd[507]: INFO:werkzeug:192.168.122.240 - - [03/Mar/2016 14:28:19] "GET /v1/introspection/73ce15ee-6bea-4262-95e8-ff5a700f2097 HTTP/1.
Mar 03 14:28:20 undercloud dnsmasq-dhcp[2599]: no address range available for DHCP request via br-ctlplane
Mar 03 14:28:24 undercloud dnsmasq-dhcp[2599]: no address range available for DHCP request via br-ctlplane
Mar 03 14:28:28 undercloud dnsmasq-dhcp[2599]: no address range available for DHCP request via br-ctlplane

Comment 10 Ruchika K 2016-03-04 03:01:52 UTC
Created attachment 1133027 [details]
undercloud.conf for 7.3 deployment

Comment 11 Dmitry Tantsur 2016-03-04 10:25:48 UTC
Ruchika, hi!

First, your bug is a separate thing, lets keep this one closed.

Next, you don't seem to override discovery_iprange variable in your undercloud.conf, meaning it still is from network 192.0.2.0, which is not the same as your network on br-ctlplane. If you set e.g. discovery_iprange=172.16.0.121,172.16.0.250, it should fix your problem.

Comment 12 Ruchika K 2016-03-04 19:13:51 UTC
thank you
that was it I see progress on the discovery