Bug 1249139 - ironic: deployment fails on HP hardware with pxe_ipmitool
ironic: deployment fails on HP hardware with pxe_ipmitool
Status: CLOSED NOTABUG
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
Director
Unspecified Unspecified
high Severity medium
: ---
: 10.0 (Newton)
Assigned To: chris alfonso
Shai Revivo
https://blueprints.launchpad.net/nova...
upstream_milestone_none upstream_defi...
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-31 11:33 EDT by Nicolas Hicher
Modified: 2016-06-30 00:01 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-15 10:07:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
the json file used for the inspection (3.11 KB, text/plain)
2015-07-31 11:33 EDT, Nicolas Hicher
no flags Details
the undercloud-debug log file (738.24 KB, text/plain)
2015-07-31 11:36 EDT, Nicolas Hicher
no flags Details
nova-compute.log (3.31 MB, text/plain)
2015-08-25 08:31 EDT, Gaëtan Trellu
no flags Details
ironic-conductor.log (7.61 MB, text/plain)
2015-08-25 08:32 EDT, Gaëtan Trellu
no flags Details

  None (edit)
Description Nicolas Hicher 2015-07-31 11:33:30 EDT
Created attachment 1058073 [details]
the json file used for the inspection

Description of problem:

When I try to deploy the overcloud on  9 HP servers (6 ProLiant DL360 Gen9 for computes and controllers, 3 ProLiant DL380 Gen9 for ceph), the deployment fails.


Version-Release number of selected component (if applicable):


How reproducible:

everytime on this hardware


Steps to Reproduce:

1. import env.json (openstack baremetal import --json director-hp-pod/hp-pod-env.json)
2. start instrospection (openstack baremetal introspection bulk start)
3.matching

sudo yum install -y ahc-tools
sudo -i
mkdir -p /etc/ahc-tools
sed 's/\[discoverd/\[ironic/' /etc/ironic-discoverd/discoverd.conf > /etc/ahc-tools/ahc-tools.conf
chmod 0600 /etc/ahc-tools/ahc-tools.conf
# set specs and cmdb. after git it
cp ~stack/director-hp-pod/edeploy/* /etc/ahc-tools/edeploy/
instack-ironic-deployment --configure-nodes

# create flavor
openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 control
openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 compute
openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 ceph
openstack flavor create --id auto --ram 4096 --disk 40 --vcpus 1 baremetal

openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute" compute
openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="control" control
openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="ceph" ceph
openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" baremetal
 
neutron subnet-update $(neutron subnet-list | grep -o -P "\w{8}(-\w{4}){3}-\w{12}") --dns-nameserver 8.8.8.8
plan=$(openstack management plan list | grep -o -P "\w{8}(-\w{4}){3}-\w{12}")

openstack management plan set -F Compute-1=compute $plan
openstack management plan set -F Controller-1=control $plan
openstack management plan set -F Ceph-Storage-1=ceph $plan

sudo ahc-match

4. check if matching is ok:
[stack@instack-bm ~]$ for uuid in $(openstack baremetal list -c UUID | awk '{print $2}' | grep -ve UUID -ve ^$); do echo $uuid; ironic node-show $uuid | grep properties -A 1 | grep -oE "u'profile:.*,"; echo; done                                                                                                          
09dac700-7730-4b43-84fb-a13a2040348e
u'profile:compute,

fcb90164-1eb0-4635-a98f-472018933a5e
u'profile:compute,

479319cc-aae9-48ee-a49a-bb7dc58da2e2
u'profile:compute,

a472a01b-4437-4745-82eb-969b942f304c
u'profile:ceph,

9d35dd62-af5c-4a24-85ae-616abfa137d0
u'profile:ceph,

e1374019-7495-4c0b-8c6a-4457ce813220
u'profile:ceph,

4d15e390-091e-41d1-a614-818431b921c4
u'profile:control,

4aaf3cbf-1233-42d9-b89b-a182e7c87d6c
u'profile:control,

0aceb3c3-3755-4a51-a642-d27efd14a8a8
u'profile:control,

5. deploy

openstack overcloud deploy --control-scale 3 --compute-scale 3 --ceph-storage-scale 3 --libvirt-type kvm --plan overcloud --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /home/stack/director-hp-pod/environments/network_environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/director-hp-pod/environments/user_reg.yaml

Actual results:

The deployment fails. here the output at the beginning of the deployment:

[stack@instack-bm ~]$ ironic node-list
 
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| UUID                                 | Name | Instance UUID                        | Power State | Provision State | Maintenance |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| 09dac700-7730-4b43-84fb-a13a2040348e | None | 3328f883-d67f-4182-93e6-28a59f8b479b | power off   | deploying       | False       |
| fcb90164-1eb0-4635-a98f-472018933a5e | None | 0feec7e7-c8ab-4d32-b9ba-5fd8a67572c0 | power off   | deploying       | False       |
| 479319cc-aae9-48ee-a49a-bb7dc58da2e2 | None | e5ee48cc-f902-4915-bb7c-d9f25a437ff4 | power off   | deploying       | False       |
| a472a01b-4437-4745-82eb-969b942f304c | None | 5ce973ae-83d6-4ee1-b4dd-7f7440443a5e | power off   | deploying       | False       |
| 9d35dd62-af5c-4a24-85ae-616abfa137d0 | None | e1675d07-0bb0-4a73-8514-d826f3702180 | power off   | deploying       | False       |
| e1374019-7495-4c0b-8c6a-4457ce813220 | None | 641ef2a7-7a12-4034-8ac1-104b2c941525 | power off   | deploying       | False       |
| 4d15e390-091e-41d1-a614-818431b921c4 | None | 9db267ea-8792-41fb-93bc-b003d244bafc | power off   | deploying       | False       |
| 4aaf3cbf-1233-42d9-b89b-a182e7c87d6c | None | 49531767-74ab-4769-8a6d-5a539af8791a | power off   | deploying       | False       |
| 0aceb3c3-3755-4a51-a642-d27efd14a8a8 | None | 7f9754e3-0782-46c8-b0ba-c1bca0bb0753 | power off   | deploying       | False       |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+

[stack@instack-bm ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+------------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks               |
+--------------------------------------+-------------------------+--------+------------+-------------+------------------------+
| 641ef2a7-7a12-4034-8ac1-104b2c941525 | overcloud-cephstorage-0 | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.18 |
| 5ce973ae-83d6-4ee1-b4dd-7f7440443a5e | overcloud-cephstorage-1 | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.15 |
| e1675d07-0bb0-4a73-8514-d826f3702180 | overcloud-cephstorage-2 | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.16 |
| 3328f883-d67f-4182-93e6-28a59f8b479b | overcloud-compute-0     | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.6  |
| e5ee48cc-f902-4915-bb7c-d9f25a437ff4 | overcloud-compute-1     | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.5  |
| 0feec7e7-c8ab-4d32-b9ba-5fd8a67572c0 | overcloud-compute-2     | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.10 |
| 7f9754e3-0782-46c8-b0ba-c1bca0bb0753 | overcloud-controller-0  | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.11 |
| 49531767-74ab-4769-8a6d-5a539af8791a | overcloud-controller-1  | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.9  |
| 9db267ea-8792-41fb-93bc-b003d244bafc | overcloud-controller-2  | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.19 |
+--------------------------------------+-------------------------+--------+------------+-------------+------------------------+

After few minutes, one or more nodes are not associated in ironic view

[stack@instack-bm ~]$ ironic node-list; nova list
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| UUID                                 | Name | Instance UUID                        | Power State | Provision State | Maintenance |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| 09dac700-7730-4b43-84fb-a13a2040348e | None | 3328f883-d67f-4182-93e6-28a59f8b479b | power on    | deploying       | False       |
| fcb90164-1eb0-4635-a98f-472018933a5e | None | 0feec7e7-c8ab-4d32-b9ba-5fd8a67572c0 | power on    | deploying       | False       |
| 479319cc-aae9-48ee-a49a-bb7dc58da2e2 | None | e5ee48cc-f902-4915-bb7c-d9f25a437ff4 | power on    | deploying       | False       |
| a472a01b-4437-4745-82eb-969b942f304c | None | 5ce973ae-83d6-4ee1-b4dd-7f7440443a5e | power on    | deploying       | False       |
| 9d35dd62-af5c-4a24-85ae-616abfa137d0 | None | e1675d07-0bb0-4a73-8514-d826f3702180 | power on    | deploying       | False       |
| e1374019-7495-4c0b-8c6a-4457ce813220 | None | 641ef2a7-7a12-4034-8ac1-104b2c941525 | power on    | deploying       | False       |
| 4d15e390-091e-41d1-a614-818431b921c4 | None | 9db267ea-8792-41fb-93bc-b003d244bafc | power on    | deploying       | False       |
| 4aaf3cbf-1233-42d9-b89b-a182e7c87d6c | None | None                                 | power off   | available       | False       |
| 0aceb3c3-3755-4a51-a642-d27efd14a8a8 | None | 49531767-74ab-4769-8a6d-5a539af8791a | power on    | wait call-back  | False       |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
+--------------------------------------+-------------------------+--------+------------+-------------+------------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks               |
+--------------------------------------+-------------------------+--------+------------+-------------+------------------------+
| 641ef2a7-7a12-4034-8ac1-104b2c941525 | overcloud-cephstorage-0 | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.18 |
| 5ce973ae-83d6-4ee1-b4dd-7f7440443a5e | overcloud-cephstorage-1 | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.15 |
| e1675d07-0bb0-4a73-8514-d826f3702180 | overcloud-cephstorage-2 | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.16 |
| 3328f883-d67f-4182-93e6-28a59f8b479b | overcloud-compute-0     | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.6  |
| e5ee48cc-f902-4915-bb7c-d9f25a437ff4 | overcloud-compute-1     | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.5  |
| 0feec7e7-c8ab-4d32-b9ba-5fd8a67572c0 | overcloud-compute-2     | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.10 |
| 7f9754e3-0782-46c8-b0ba-c1bca0bb0753 | overcloud-controller-0  | BUILD  | spawning   | NOSTATE     |                        |
| 49531767-74ab-4769-8a6d-5a539af8791a | overcloud-controller-1  | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.12 |
| 9db267ea-8792-41fb-93bc-b003d244bafc | overcloud-controller-2  | BUILD  | spawning   | NOSTATE     | ctlplane=10.199.228.19 |
+--------------------------------------+-------------------------+--------+------------+-------------+------------------------+

If I try to install only one computes, the deployment is ok.

Expected results:

All the nodes are used


Additional info:
Comment 3 Nicolas Hicher 2015-07-31 11:36:39 EDT
Created attachment 1058077 [details]
the undercloud-debug log file
Comment 4 Gaëtan Trellu 2015-08-10 15:24:48 EDT
Hi,

I've the exact same issue with Dell hardware.
The Ironic error log: http://pastebin.test.redhat.com/30419
Comment 5 Gaëtan Trellu 2015-08-10 18:10:30 EDT
I tried with pxe_ipmitool and pxe_drac IPMI driver.
It's a blocker for me, i can't deploy.
Comment 6 Cyril Lopez 2015-08-10 18:15:01 EDT
In the both case, we are using hardware matching.
Comment 7 Gaëtan Trellu 2015-08-11 14:06:17 EDT
I tried t removed the profile maching but nothing changed.
We don't have this issue with virtual machines.
Comment 8 Gaëtan Trellu 2015-08-12 09:08:15 EDT
By reinstalling the undercloud, the issue disappeared...
Comment 9 Gaëtan Trellu 2015-08-20 14:16:26 EDT
Still having the issue on a fresh install.
Comment 10 Gaëtan Trellu 2015-08-24 17:12:52 EDT
The paste has expired so this is a new one: http://pastebin.test.redhat.com/307704
Comment 11 Dmitry Tantsur 2015-08-25 03:04:42 EDT
Hi! What is your nova version (yum info python-nova)? We have fixed one similar bug in 2015.1.0-17.el7ost as a 0day fix. If you have 16.el7ost, please update all Nova package, restart Nova and retry.

If you have this problem with 17.el7ost, then please attach both nova and ironic logs from /var/log/nova/nova-compute.log and /var/log/ironic/ironic-conductor.log.
Comment 12 Gaëtan Trellu 2015-08-25 08:31:39 EDT
Created attachment 1066833 [details]
nova-compute.log
Comment 13 Gaëtan Trellu 2015-08-25 08:32:14 EDT
Created attachment 1066834 [details]
ironic-conductor.log
Comment 14 Gaëtan Trellu 2015-08-25 08:33:20 EDT
Hi Dmitry !!

Thanks for your answer.
We are using the last python-nova version:

[root@undercloud ~]# yum info python-nova
Loaded plugins: product-id, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Installed Packages
Name        : python-nova
Arch        : noarch
Version     : 2015.1.0
Release     : 17.el7ost
Size        : 14 M
Repo        : installed
From repo   : rhelosp-7.0-puddle
Summary     : Nova Python libraries
URL         : http://openstack.org/projects/compute/
License     : ASL 2.0
Description : OpenStack Compute (codename Nova) is open source software designed to
            : provision and manage large networks of virtual machines, creating a
            : redundant and scalable cloud computing platform.
            : 
            : This package contains the nova Python library.

The logs have been attached.
Comment 15 Dmitry Tantsur 2015-08-25 09:54:15 EDT
Thank you! I have troubles figuring out relevant part of the log, could you please provide me an approximate timeline for your deployment, i.e. when did it start, when did it fail, did you retry, etc.
Comment 16 Gaëtan Trellu 2015-08-25 09:56:59 EDT
You can have a look around 18h18 on 24th August.

2015-08-24 18:18:05.228 1200 ERROR oslo_messaging.rpc.dispatcher [-] Exception during message handling: Node 702d0321-5e9d-49f2-b914-0831bcbdfebd is associated with instance ccc2dc87-81b1-4769-8019-ffa5e3ecb979.
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher     executor_callback))
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher     executor_callback)
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher     return func(*args, **kwargs)
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 405, in update_node
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher     node_obj.save()
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/objects/base.py", line 143, in wrapper
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher     return fn(self, ctxt, *args, **kwargs)
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 265, in save
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher     self.dbapi.update_node(self.uuid, updates)
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 338, in update_node
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher     return self._do_update_node(node_id, values)
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 364, in _do_update_node
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher     instance=ref.instance_uuid)
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher NodeAssociated: Node 702d0321-5e9d-49f2-b914-0831bcbdfebd is associated with instance ccc2dc87-81b1-4769-8019-ffa5e3ecb979.
2015-08-24 18:18:05.228 1200 TRACE oslo_messaging.rpc.dispatcher
Comment 17 Dmitry Tantsur 2015-08-25 11:28:04 EDT
Thanks! Ironic logs are concerning:

2015-08-22 18:18:02.010 1208 WARNING ironic.drivers.modules.ipmitool [-] IPMI power status failed for node 53b37f5d-b938-4c59-ac54-0614054da95c with error: Unexpected error while running command.
Command: ipmitool -I lanplus -H 10.9.37.36 -L ADMINISTRATOR -U root -R 12 -N 5 -f /tmp/tmp1fXGHH power status
Exit code: 1
Stdout: u''
Stderr: u'Error: Unable to establish IPMI v2 / RMCP+ session\nUnable to get Chassis Power Status\n'.
2015-08-22 18:18:02.010 1208 WARNING ironic.conductor.manager [-] During sync_power_state, could not get power state for node 53b37f5d-b938-4c59-ac54-0614054da95c. Error: IPMI call failed: power status..
2015-08-22 20:28:15.967 1208 ERROR ironic.drivers.modules.ipmitool [-] IPMI Error while attempting "ipmitool -I lanplus -H 10.9.37.52 -L ADMINISTRATOR -U root -R 12 -N 5 -f /tmp/tmpe8dN96 power status" for node 39d90cfb-9c0b-40ff-8781-e7786a43944d. Error: Unexpected error while running command.
Command: ipmitool -I lanplus -H 10.9.37.52 -L ADMINISTRATOR -U root -R 12 -N 5 -f /tmp/tmpe8dN96 power status
Exit code: 1
Stdout: u''
Stderr: u'Error: Unable to establish IPMI v2 / RMCP+ session\nUnable to get Chassis Power Status\n'

We've experienced problems with pxe_ipmitool on HP boxes previously. Any particular reason you didn't use pxe_ilo driver? If no, please retry your attempts with it.
Comment 18 Dmitry Tantsur 2015-08-25 11:31:55 EDT
Upd: pxe_drac driver can be used for DELL hardware.
Comment 19 Gaëtan Trellu 2015-08-25 16:06:39 EDT
Same issue with this driver.
Comment 20 Dmitry Tantsur 2015-08-26 03:31:26 EDT
Could you please elaborate? You can't have issue with ipmitool with a driver not using ipmitool... Logs would be nice to see.
Comment 21 Gaëtan Trellu 2015-08-31 14:27:40 EDT
Hi,

Reduce the concurrency (10 by default) to 4 solved our current issue.

/etc/nova/nova.conf
max_concurrent_builds = 4

/etc/ironic/ironic.conf
rpc_thread_pool_size=8
Comment 22 Nicolas Hicher 2015-10-19 10:42:49 EDT
fixed upstream
Comment 24 Cyril Lopez 2016-02-12 10:16:18 EST
Still here in 7.2 :
openstack-nova-common-2015.1.2-13.el7ost.noarch
openstack-ironic-common-2015.1.2-2.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-94.el7ost.noarch

with pxe_drac when you try to deploy more than 9 nodes

==> /var/log/nova/nova-conductor.log <==
2016-02-12 09:46:23.171 11384 ERROR nova.scheduler.utils [req-422dea75-c9f4-4085-938a-dbcc3ae1c091 855acd7da206441e969d27217d5c68b0 fae33274538a43b682e8d0de89017352 - - -] [instance: 4eb4b1b2-583b-4a21-a4e6-08167cdfbb0e] Error from last host: waed-undercloud (nod
e c11acf8b-2c4c-4964-bf62-9171e93686cd): [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2261, in _do_build_and_run_instance\n    filter_properties)\n', u'  File "/usr/lib/python2.7/site-packages
/nova/compute/manager.py", line 2406, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 4eb4b1b2-583b-4a21-a4e6-08167cdfbb0e was re-scheduled: Failed to provision instance 4eb4b1b2-583b-4a21-a4e6-08167cdfbb0e: None\n']

==> /var/log/ironic/ironic-conductor.log <==
2016-02-12 09:46:28.458 11395 ERROR oslo_messaging.rpc.dispatcher [-] Exception during message handling: Node fde81779-1626-4c12-b487-276a9c66c5db is associated with instance aff326c7-8ec9-45bf-86c4-502a787fdd99.
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher     executor_callback))
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher     executor_callback)
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher     return func(*args, **kwargs)
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 405, in update_node
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher     node_obj.save()
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/objects/base.py", line 143, in wrapper
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher     return fn(self, ctxt, *args, **kwargs)
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 265, in save
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher     self.dbapi.update_node(self.uuid, updates)
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 338, in update_node
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher     return self._do_update_node(node_id, values)
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 364, in _do_update_node
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher     instance=ref.instance_uuid)
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher NodeAssociated: Node fde81779-1626-4c12-b487-276a9c66c5db is associated with instance aff326c7-8ec9-45bf-86c4-502a787fdd99.
2016-02-12 09:46:28.458 11395 TRACE oslo_messaging.rpc.dispatcher
2016-02-12 09:46:28.459 11395 ERROR oslo_messaging._drivers.common [-] Returning exception Node fde81779-1626-4c12-b487-276a9c66c5db is associated with instance aff326c7-8ec9-45bf-86c4-502a787fdd99. to caller
2016-02-12 09:46:28.459 11395 ERROR oslo_messaging._drivers.common [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply\n    executor_callback))\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch\n    executor_callback)\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch\n    result = func(ctxt, **new_args)\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner\n    return func(*args, **kwargs)\n', '  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 405, in update_node\n    node_obj.save()\n', '  File "/usr/lib/python2.7/site-packages/ironic/objects/base.py", line 143, in wrapper\n    return fn(self, ctxt, *args, **kwargs)\n', '  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 265, in save\n    self.dbapi.update_node(self.uuid, updates)\n', '  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 338, in update_node\n    return self._do_update_node(node_id, values)\n', '  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 364, in _do_update_node\n    instance=ref.instance_uuid)\n', 'NodeAssociated: Node fde81779-1626-4c12-b487-276a9c66c5db is associated with instance aff326c7-8ec9-45bf-86c4-502a787fdd99.\n']
2016-02-12 09:46:31.573 11395 ERROR oslo_messaging.rpc.dispatcher [-] Exception during message handling: Node fde81779-1626-4c12-b487-276a9c66c5db is associated with instance aff326c7-8ec9-45bf-86c4-502a787fdd99.
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher     executor_callback))
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher     executor_callback)
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher     return func(*args, **kwargs)
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 405, in update_node
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher     node_obj.save()
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/objects/base.py", line 143, in wrapper
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher     return fn(self, ctxt, *args, **kwargs)
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 265, in save
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher     self.dbapi.update_node(self.uuid, updates)
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 338, in update_node
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher     return self._do_update_node(node_id, values)
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 364, in _do_update_node
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher     instance=ref.instance_uuid)
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher NodeAssociated: Node fde81779-1626-4c12-b487-276a9c66c5db is associated with instance aff326c7-8ec9-45bf-86c4-502a787fdd99.
2016-02-12 09:46:31.573 11395 TRACE oslo_messaging.rpc.dispatcher

2016-02-12 09:47:09.519 11395 ERROR oslo_messaging._drivers.common [-] Returning exception Node be5857db-ce80-452c-8404-b1a6d3bd4c17 is associated with instance 8f1981f6-fd76-486f-b96f-189637cda601. to caller
2016-02-12 09:47:09.519 11395 ERROR oslo_messaging._drivers.common [-] ['Traceback (most recent call last):\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply\n    executor_callback))\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch\n    executor_callback)\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch\n    result = func(ctxt, **new_args)\n', '  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner\n    return func(*args, **kwargs)\n', '  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 405, in update_node\n    node_obj.save()\n', '  File "/usr/lib/python2.7/site-packages/ironic/objects/base.py", line 143, in wrapper\n    return fn(self, ctxt, *args, **kwargs)\n', '  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 265, in save\n    self.dbapi.update_node(self.uuid, updates)\n', '  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 338, in update_node\n    return self._do_update_node(node_id, values)\n', '  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 364, in _do_update_node\n    instance=ref.instance_uuid)\n', 'NodeAssociated: Node be5857db-ce80-452c-8404-b1a6d3bd4c17 is associated with instance 8f1981f6-fd76-486f-b96f-189637cda601.\n']
Comment 25 Stephen Gordon 2016-02-19 07:43:28 EST
The root cause is being addressed in Nova right now as part of
https://blueprints.launchpad.net/nova/+spec/host-state-level-locking

The problems as I understand it comes from these 2 facts:

1. Nova and Ironic databases do not come in sync instantly (obviously,
these are different services).
2. Nova does not place any locks on the chosen node, relying on Ironic
to do that just before the deployment.

It results in Nova picking the same node for several instances. Then in
the Ironic driver only one actually succeed, an exception is raised for
the others. So scheduling repeats, and the same problem might appear
again, and so on until reaching the scheduling retry limit.
Comment 26 Mike Burns 2016-04-07 16:47:27 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Note You need to log in before you can comment on or make changes to this bug.