Bug 1233452 - 'Node is locked by host' error causing overcloud deploy to fail
Summary: 'Node is locked by host' error causing overcloud deploy to fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ga
: 7.0 (Kilo)
Assignee: Dmitry Tantsur
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-19 01:06 UTC by Ronelle Landy
Modified: 2023-02-22 23:02 UTC (History)
11 users (show)

Fixed In Version: python-ironicclient-0.5.1-8.el7ost openstack-ironic-2015.1.0-8.el7ost
Doc Type: Bug Fix
Doc Text:
Prior to this update, OpenStack Bare Metal Provisioning (Ironic) operations, such as 'Power off' held a lock on a node for longer than expected. Consequently, certain operations would fail to run while the node was still considered locked. This update adjusts the retry timeout to two minutes. As a result, no further node lock errors have been noted.
Clone Of:
Environment:
Last Closed: 2015-08-05 13:27:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
undercloud logs (14.08 MB, application/x-gzip)
2015-06-26 11:45 UTC, Mike Burns
no flags Details
host0 logs (14.08 MB, application/x-gzip)
2015-06-26 11:46 UTC, Mike Burns
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 196020 0 None ABANDONED Bump default retry timeout to 2 minutes 2021-01-04 18:01:10 UTC
OpenStack gerrit 196037 0 None ABANDONED Make task_manager logging more helpful 2021-01-04 18:00:34 UTC
Red Hat Product Errata RHEA-2015:1548 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement Advisory 2015-08-05 17:07:06 UTC

Description Ronelle Landy 2015-06-19 01:06:36 UTC
Description of problem:
Installing bits from the latest poodle/puddle and deploying the overcloud fails due to the 'Node xxx is locked by host' error:

>> source /home/stack/stackrc; if [ -f "/home/stack/deploy-overcloudrc" ]; then source /home/stack/deploy-overcloudrc; fi; openstack overcloud deploy --plan-uuid 1803970e-ddf9-41d9-a101-bd67afe667a9 --control-scale $CONTROLSCALE --compute-scale $COMPUTESCALE --ceph-storage-scale $CEPHSTORAGESCALE 


19:26:18 failed: [undercloud] => {"changed": true, "cmd": "source /home/stack/stackrc; if [ -f \"/home/stack/deploy-overcloudrc\" ]; then\n source /home/stack/deploy-overcloudrc;\n fi; openstack overcloud deploy --plan-uuid 1803970e-ddf9-41d9-a101-bd67afe667a9 --control-scale $CONTROLSCALE --compute-scale $COMPUTESCALE --ceph-storage-scale $CEPHSTORAGESCALE #Both swift and blockstorage are not supported downstream right now #--swift-storage-scale $SWIFTSTORAGESCALE #--block-storage-scale $BLOCKSTORAGESCALE;", "delta": "0:00:16.466670", "end": "2015-06-18 19:26:18.978551", "rc": 1, "start": "2015-06-18 19:26:02.511881", "warnings": []}
19:26:18 stderr: WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 1 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 2 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 3 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 4 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 5 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 ERROR: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 6 of 6
19:26:18 ERROR: openstack Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409)
19:26:18 stdout: The following templates will be written:
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_volume.pp
19:26:18 /tmp/tmpMG0VyV/hieradata/object.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/common.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Swift-Storage-1.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/net_ip_map.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Cinder-Storage-1.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Compute-1.yaml
19:26:18 /tmp/tmpMG0VyV/network/noop.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/bootstrap-config.yaml
19:26:18 /tmp/tmpMG0VyV/net-config-bridge.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Ceph-Storage-1.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/controller-post-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/cinder-storage-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_cephstorage.pp
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/object.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/controller-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_compute.pp
19:26:18 /tmp/tmpMG0VyV/puppet/cinder-storage-post.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/swift-storage-post.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Controller-1.yaml
19:26:18 /tmp/tmpMG0VyV/network/networks.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_object.pp
19:26:18 /tmp/tmpMG0VyV/hieradata/controller.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/ctlplane_vip.yaml
19:26:18 /tmp/tmpMG0VyV/hieradata/volume.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/compute-post-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/extraconfig/tasks/yum_update.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/swift-storage-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/extraconfig/tasks/yum_update.sh
19:26:18 /tmp/tmpMG0VyV/puppet/swift-devices-and-proxy-config.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/controller-config-pacemaker.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/compute-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/volume.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/ceph-storage-post-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/extraconfig/controller/noop.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/noop.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/ceph-storage-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/ceph.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/vip-config.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/controller.yaml
19:26:18 /tmp/tmpMG0VyV/plan.yaml
19:26:18 /tmp/tmpMG0VyV/environment.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/net_ip_list_map.yaml
19:26:18 /tmp/tmpMG0VyV/hieradata/compute.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/compute.yaml
19:26:18 /tmp/tmpMG0VyV/hieradata/ceph.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_controller_pacemaker.pp
19:26:18 /tmp/tmpMG0VyV/hieradata/common.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/ringbuilder.pp
19:26:18 /tmp/tmpMG0VyV/extraconfig/post_deploy/default.yaml
19:26:18 /tmp/tmpMG0VyV/net-config-noop.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/ceph-cluster-config.yaml
19:26:18 /tmp/tmpMG0VyV/firstboot/userdata_default.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/all-nodes-config.yaml

This error was logged previously during node discovery:
https://bugzilla.redhat.com/show_bug.cgi?id=1212134

Version-Release number of selected component (if applicable):

[root@host15 ~]# rpm -qa | grep openstack
openstack-tripleo-common-0.0.1.dev6-0.git49b57eb.el7ost.noarch
openstack-ceilometer-alarm-2015.1.0-2.el7ost.noarch
openstack-swift-account-2.3.0-1.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-2.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-3.el7ost.noarch
openstack-tripleo-0.0.6-0.1.git812abe0.el7ost.noarch
openstack-tuskar-0.4.18-2.el7ost.noarch
openstack-swift-2.3.0-1.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-10.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-heat-api-2015.1.0-3.el7ost.noarch
openstack-ceilometer-central-2015.1.0-2.el7ost.noarch
openstack-nova-scheduler-2015.1.0-10.el7ost.noarch
openstack-nova-cert-2015.1.0-10.el7ost.noarch
openstack-nova-common-2015.1.0-10.el7ost.noarch
openstack-tripleo-image-elements-0.9.6-1.el7ost.noarch
openstack-ceilometer-notification-2015.1.0-2.el7ost.noarch
openstack-ceilometer-collector-2015.1.0-2.el7ost.noarch
openstack-ironic-common-2015.1.0-4.el7ost.noarch
openstack-nova-compute-2015.1.0-10.el7ost.noarch
openstack-nova-conductor-2015.1.0-10.el7ost.noarch
openstack-neutron-openvswitch-2015.1.0-7.el7ost.noarch
openstack-swift-container-2.3.0-1.el7ost.noarch
openstack-nova-api-2015.1.0-10.el7ost.noarch
openstack-dashboard-theme-2015.1.0-10.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-1.el7ost.noarch
openstack-nova-console-2015.1.0-10.el7ost.noarch
openstack-neutron-common-2015.1.0-7.el7ost.noarch
openstack-neutron-2015.1.0-7.el7ost.noarch
openstack-heat-engine-2015.1.0-3.el7ost.noarch
openstack-ceilometer-common-2015.1.0-2.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-3.el7ost.noarch
openstack-ironic-conductor-2015.1.0-4.el7ost.noarch
openstack-ceilometer-api-2015.1.0-2.el7ost.noarch
openstack-ironic-api-2015.1.0-4.el7ost.noarch
openstack-swift-proxy-2.3.0-1.el7ost.noarch
openstack-puppet-modules-2015.1.5-1.el7ost.noarch
openstack-dashboard-2015.1.0-10.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-selinux-0.6.32-1.el7ost.noarch
openstack-tempest-kilo-20150507.2.el7ost.noarch
openstack-neutron-ml2-2015.1.0-7.el7ost.noarch
openstack-keystone-2015.1.0-1.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-9.el7ost.noarch
openstack-glance-2015.1.0-6.el7ost.noarch
python-openstackclient-1.0.3-2.el7ost.noarch
openstack-ironic-discoverd-1.1.0-3.el7ost.noarch
openstack-swift-object-2.3.0-1.el7ost.noarch
python-django-openstack-auth-1.2.0-2.el7ost.noarch
openstack-tuskar-ui-0.3.0-2.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-heat-common-2015.1.0-3.el7ost.noarch

How reproducible:

Often but not at every deploy. Redeploying the overcloud gets by the error (but CI fails out)

Steps to Reproduce:
1. Install bits from latest poodle/puddle 
2. openstack overcloud deploy --plan-uuid $ID --control-scale $CONTROLSCALE --compute-scale $COMPUTESCALE --ceph-storage-scale $CEPHSTORAGESCALE
3. See warnings and errors

Actual results:
Deploying overcloud fails

Expected results:
Overcloud deployed

Additional info:

Comment 4 Dmitry Tantsur 2015-06-23 07:17:56 UTC
Oh... please provide ironic conductor and API logs around failure time (sudo journalctl -u openstack-ironic-api -u openstack-ironic-conductor)

Comment 5 Dmitry Tantsur 2015-06-23 11:29:33 UTC
I've started an rdo-list thread to discuss the issue: https://www.redhat.com/archives/rdo-list/2015-June/msg00149.html

Comment 6 Ronelle Landy 2015-06-23 16:07:09 UTC
Will copy logs and journalctl output when we hit the error again - it's sporadic.

Comment 7 Dmitry Tantsur 2015-06-24 11:34:03 UTC
https://review.gerrithub.io/#/c/237471/ is an instack-undercloud patch to bump retry interval for Ironic globally. I'm still interested in logs, however.

Comment 8 Mike Burns 2015-06-26 11:44:21 UTC
This occurred in CI again on Dell BM.  will pull the logs from the job and post here

Comment 9 Mike Burns 2015-06-26 11:45:43 UTC
Created attachment 1043486 [details]
undercloud logs

Comment 10 Mike Burns 2015-06-26 11:46:39 UTC
Created attachment 1043487 [details]
host0 logs

Comment 11 Dmitry Tantsur 2015-06-26 12:17:42 UTC
Upstream patch to bump retry interval: https://review.openstack.org/#/c/196020/
I intend to backport it asap.

Comment 12 Dmitry Tantsur 2015-06-26 12:51:07 UTC
I also suggest backporting https://review.openstack.org/#/c/194619/ for ga or for later to make such problems debugging simpler.

Comment 15 Marius Cornea 2015-07-22 09:33:18 UTC
I couldn't reproduce this neither on virtual nor baremetal environment.

Comment 17 errata-xmlrpc 2015-08-05 13:27:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1548


Note You need to log in before you can comment on or make changes to this bug.