Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1233452 - 'Node is locked by host' error causing overcloud deploy to fail
'Node is locked by host' error causing overcloud deploy to fail
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
high Severity urgent
: ga
: 7.0 (Kilo)
Assigned To: Dmitry Tantsur
Marius Cornea
: Automation
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-06-18 21:06 EDT by Ronelle Landy
Modified: 2016-06-03 07:01 EDT (History)
12 users (show)

See Also:
Fixed In Version: python-ironicclient-0.5.1-8.el7ost openstack-ironic-2015.1.0-8.el7ost
Doc Type: Bug Fix
Doc Text:
Prior to this update, OpenStack Bare Metal Provisioning (Ironic) operations, such as 'Power off' held a lock on a node for longer than expected. Consequently, certain operations would fail to run while the node was still considered locked. This update adjusts the retry timeout to two minutes. As a result, no further node lock errors have been noted.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-08-05 09:27:49 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
undercloud logs (14.08 MB, application/x-gzip)
2015-06-26 07:45 EDT, Mike Burns
no flags Details
host0 logs (14.08 MB, application/x-gzip)
2015-06-26 07:46 EDT, Mike Burns
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 196020 None None None Never
OpenStack gerrit 196037 None None None Never
Red Hat Product Errata RHEA-2015:1548 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory 2015-08-05 13:07:06 EDT

  None (edit)
Description Ronelle Landy 2015-06-18 21:06:36 EDT
Description of problem:
Installing bits from the latest poodle/puddle and deploying the overcloud fails due to the 'Node xxx is locked by host' error:

>> source /home/stack/stackrc; if [ -f "/home/stack/deploy-overcloudrc" ]; then source /home/stack/deploy-overcloudrc; fi; openstack overcloud deploy --plan-uuid 1803970e-ddf9-41d9-a101-bd67afe667a9 --control-scale $CONTROLSCALE --compute-scale $COMPUTESCALE --ceph-storage-scale $CEPHSTORAGESCALE 


19:26:18 failed: [undercloud] => {"changed": true, "cmd": "source /home/stack/stackrc; if [ -f \"/home/stack/deploy-overcloudrc\" ]; then\n source /home/stack/deploy-overcloudrc;\n fi; openstack overcloud deploy --plan-uuid 1803970e-ddf9-41d9-a101-bd67afe667a9 --control-scale $CONTROLSCALE --compute-scale $COMPUTESCALE --ceph-storage-scale $CEPHSTORAGESCALE #Both swift and blockstorage are not supported downstream right now #--swift-storage-scale $SWIFTSTORAGESCALE #--block-storage-scale $BLOCKSTORAGESCALE;", "delta": "0:00:16.466670", "end": "2015-06-18 19:26:18.978551", "rc": 1, "start": "2015-06-18 19:26:02.511881", "warnings": []}
19:26:18 stderr: WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 1 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 2 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 3 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 4 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 5 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 ERROR: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 6 of 6
19:26:18 ERROR: openstack Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409)
19:26:18 stdout: The following templates will be written:
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_volume.pp
19:26:18 /tmp/tmpMG0VyV/hieradata/object.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/common.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Swift-Storage-1.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/net_ip_map.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Cinder-Storage-1.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Compute-1.yaml
19:26:18 /tmp/tmpMG0VyV/network/noop.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/bootstrap-config.yaml
19:26:18 /tmp/tmpMG0VyV/net-config-bridge.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Ceph-Storage-1.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/controller-post-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/cinder-storage-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_cephstorage.pp
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/object.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/controller-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_compute.pp
19:26:18 /tmp/tmpMG0VyV/puppet/cinder-storage-post.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/swift-storage-post.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Controller-1.yaml
19:26:18 /tmp/tmpMG0VyV/network/networks.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_object.pp
19:26:18 /tmp/tmpMG0VyV/hieradata/controller.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/ctlplane_vip.yaml
19:26:18 /tmp/tmpMG0VyV/hieradata/volume.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/compute-post-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/extraconfig/tasks/yum_update.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/swift-storage-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/extraconfig/tasks/yum_update.sh
19:26:18 /tmp/tmpMG0VyV/puppet/swift-devices-and-proxy-config.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/controller-config-pacemaker.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/compute-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/volume.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/ceph-storage-post-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/extraconfig/controller/noop.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/noop.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/ceph-storage-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/ceph.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/vip-config.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/controller.yaml
19:26:18 /tmp/tmpMG0VyV/plan.yaml
19:26:18 /tmp/tmpMG0VyV/environment.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/net_ip_list_map.yaml
19:26:18 /tmp/tmpMG0VyV/hieradata/compute.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/compute.yaml
19:26:18 /tmp/tmpMG0VyV/hieradata/ceph.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_controller_pacemaker.pp
19:26:18 /tmp/tmpMG0VyV/hieradata/common.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/ringbuilder.pp
19:26:18 /tmp/tmpMG0VyV/extraconfig/post_deploy/default.yaml
19:26:18 /tmp/tmpMG0VyV/net-config-noop.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/ceph-cluster-config.yaml
19:26:18 /tmp/tmpMG0VyV/firstboot/userdata_default.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/all-nodes-config.yaml

This error was logged previously during node discovery:
https://bugzilla.redhat.com/show_bug.cgi?id=1212134

Version-Release number of selected component (if applicable):

[root@host15 ~]# rpm -qa | grep openstack
openstack-tripleo-common-0.0.1.dev6-0.git49b57eb.el7ost.noarch
openstack-ceilometer-alarm-2015.1.0-2.el7ost.noarch
openstack-swift-account-2.3.0-1.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-2.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-3.el7ost.noarch
openstack-tripleo-0.0.6-0.1.git812abe0.el7ost.noarch
openstack-tuskar-0.4.18-2.el7ost.noarch
openstack-swift-2.3.0-1.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-10.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-heat-api-2015.1.0-3.el7ost.noarch
openstack-ceilometer-central-2015.1.0-2.el7ost.noarch
openstack-nova-scheduler-2015.1.0-10.el7ost.noarch
openstack-nova-cert-2015.1.0-10.el7ost.noarch
openstack-nova-common-2015.1.0-10.el7ost.noarch
openstack-tripleo-image-elements-0.9.6-1.el7ost.noarch
openstack-ceilometer-notification-2015.1.0-2.el7ost.noarch
openstack-ceilometer-collector-2015.1.0-2.el7ost.noarch
openstack-ironic-common-2015.1.0-4.el7ost.noarch
openstack-nova-compute-2015.1.0-10.el7ost.noarch
openstack-nova-conductor-2015.1.0-10.el7ost.noarch
openstack-neutron-openvswitch-2015.1.0-7.el7ost.noarch
openstack-swift-container-2.3.0-1.el7ost.noarch
openstack-nova-api-2015.1.0-10.el7ost.noarch
openstack-dashboard-theme-2015.1.0-10.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-1.el7ost.noarch
openstack-nova-console-2015.1.0-10.el7ost.noarch
openstack-neutron-common-2015.1.0-7.el7ost.noarch
openstack-neutron-2015.1.0-7.el7ost.noarch
openstack-heat-engine-2015.1.0-3.el7ost.noarch
openstack-ceilometer-common-2015.1.0-2.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-3.el7ost.noarch
openstack-ironic-conductor-2015.1.0-4.el7ost.noarch
openstack-ceilometer-api-2015.1.0-2.el7ost.noarch
openstack-ironic-api-2015.1.0-4.el7ost.noarch
openstack-swift-proxy-2.3.0-1.el7ost.noarch
openstack-puppet-modules-2015.1.5-1.el7ost.noarch
openstack-dashboard-2015.1.0-10.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-selinux-0.6.32-1.el7ost.noarch
openstack-tempest-kilo-20150507.2.el7ost.noarch
openstack-neutron-ml2-2015.1.0-7.el7ost.noarch
openstack-keystone-2015.1.0-1.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-9.el7ost.noarch
openstack-glance-2015.1.0-6.el7ost.noarch
python-openstackclient-1.0.3-2.el7ost.noarch
openstack-ironic-discoverd-1.1.0-3.el7ost.noarch
openstack-swift-object-2.3.0-1.el7ost.noarch
python-django-openstack-auth-1.2.0-2.el7ost.noarch
openstack-tuskar-ui-0.3.0-2.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-heat-common-2015.1.0-3.el7ost.noarch

How reproducible:

Often but not at every deploy. Redeploying the overcloud gets by the error (but CI fails out)

Steps to Reproduce:
1. Install bits from latest poodle/puddle 
2. openstack overcloud deploy --plan-uuid $ID --control-scale $CONTROLSCALE --compute-scale $COMPUTESCALE --ceph-storage-scale $CEPHSTORAGESCALE
3. See warnings and errors

Actual results:
Deploying overcloud fails

Expected results:
Overcloud deployed

Additional info:
Comment 4 Dmitry Tantsur 2015-06-23 03:17:56 EDT
Oh... please provide ironic conductor and API logs around failure time (sudo journalctl -u openstack-ironic-api -u openstack-ironic-conductor)
Comment 5 Dmitry Tantsur 2015-06-23 07:29:33 EDT
I've started an rdo-list thread to discuss the issue: https://www.redhat.com/archives/rdo-list/2015-June/msg00149.html
Comment 6 Ronelle Landy 2015-06-23 12:07:09 EDT
Will copy logs and journalctl output when we hit the error again - it's sporadic.
Comment 7 Dmitry Tantsur 2015-06-24 07:34:03 EDT
https://review.gerrithub.io/#/c/237471/ is an instack-undercloud patch to bump retry interval for Ironic globally. I'm still interested in logs, however.
Comment 8 Mike Burns 2015-06-26 07:44:21 EDT
This occurred in CI again on Dell BM.  will pull the logs from the job and post here
Comment 9 Mike Burns 2015-06-26 07:45:43 EDT
Created attachment 1043486 [details]
undercloud logs
Comment 10 Mike Burns 2015-06-26 07:46:39 EDT
Created attachment 1043487 [details]
host0 logs
Comment 11 Dmitry Tantsur 2015-06-26 08:17:42 EDT
Upstream patch to bump retry interval: https://review.openstack.org/#/c/196020/
I intend to backport it asap.
Comment 12 Dmitry Tantsur 2015-06-26 08:51:07 EDT
I also suggest backporting https://review.openstack.org/#/c/194619/ for ga or for later to make such problems debugging simpler.
Comment 15 Marius Cornea 2015-07-22 05:33:18 EDT
I couldn't reproduce this neither on virtual nor baremetal environment.
Comment 17 errata-xmlrpc 2015-08-05 09:27:49 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1548

Note You need to log in before you can comment on or make changes to this bug.