Bug 1233452

Summary: 'Node is locked by host' error causing overcloud deploy to fail
Product: Red Hat OpenStack Reporter: Ronelle Landy <rlandy>
Component: openstack-ironicAssignee: Dmitry Tantsur <dtantsur>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: urgent Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: calfonso, dtantsur, jslagle, mburns, mlopes, morazi, rhel-osp-director-maint, rlandy, rrosa, whayutin, yeylon
Target Milestone: gaKeywords: Automation
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-ironicclient-0.5.1-8.el7ost openstack-ironic-2015.1.0-8.el7ost Doc Type: Bug Fix
Doc Text:
Prior to this update, OpenStack Bare Metal Provisioning (Ironic) operations, such as 'Power off' held a lock on a node for longer than expected. Consequently, certain operations would fail to run while the node was still considered locked. This update adjusts the retry timeout to two minutes. As a result, no further node lock errors have been noted.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-05 13:27:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
undercloud logs
none
host0 logs none

Description Ronelle Landy 2015-06-19 01:06:36 UTC
Description of problem:
Installing bits from the latest poodle/puddle and deploying the overcloud fails due to the 'Node xxx is locked by host' error:

>> source /home/stack/stackrc; if [ -f "/home/stack/deploy-overcloudrc" ]; then source /home/stack/deploy-overcloudrc; fi; openstack overcloud deploy --plan-uuid 1803970e-ddf9-41d9-a101-bd67afe667a9 --control-scale $CONTROLSCALE --compute-scale $COMPUTESCALE --ceph-storage-scale $CEPHSTORAGESCALE 


19:26:18 failed: [undercloud] => {"changed": true, "cmd": "source /home/stack/stackrc; if [ -f \"/home/stack/deploy-overcloudrc\" ]; then\n source /home/stack/deploy-overcloudrc;\n fi; openstack overcloud deploy --plan-uuid 1803970e-ddf9-41d9-a101-bd67afe667a9 --control-scale $CONTROLSCALE --compute-scale $COMPUTESCALE --ceph-storage-scale $CEPHSTORAGESCALE #Both swift and blockstorage are not supported downstream right now #--swift-storage-scale $SWIFTSTORAGESCALE #--block-storage-scale $BLOCKSTORAGESCALE;", "delta": "0:00:16.466670", "end": "2015-06-18 19:26:18.978551", "rc": 1, "start": "2015-06-18 19:26:02.511881", "warnings": []}
19:26:18 stderr: WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 1 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 2 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 3 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 4 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 WARNING: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 5 of 6
19:26:18 WARNING: ironicclient.common.http Request returned failure status.
19:26:18 ERROR: ironicclient.common.http Error contacting Ironic server: Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409). Attempt 6 of 6
19:26:18 ERROR: openstack Node 62e2f991-c2fe-4d5b-9b9e-cb8fecf1fbed is locked by host host15.beaker.tripleo.lab.eng.rdu2.redhat.com, please retry after the current operation is completed. (HTTP 409)
19:26:18 stdout: The following templates will be written:
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_volume.pp
19:26:18 /tmp/tmpMG0VyV/hieradata/object.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/common.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Swift-Storage-1.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/net_ip_map.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Cinder-Storage-1.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Compute-1.yaml
19:26:18 /tmp/tmpMG0VyV/network/noop.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/bootstrap-config.yaml
19:26:18 /tmp/tmpMG0VyV/net-config-bridge.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Ceph-Storage-1.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/controller-post-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/cinder-storage-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_cephstorage.pp
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/object.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/controller-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_compute.pp
19:26:18 /tmp/tmpMG0VyV/puppet/cinder-storage-post.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/swift-storage-post.yaml
19:26:18 /tmp/tmpMG0VyV/provider-Controller-1.yaml
19:26:18 /tmp/tmpMG0VyV/network/networks.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_object.pp
19:26:18 /tmp/tmpMG0VyV/hieradata/controller.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/ctlplane_vip.yaml
19:26:18 /tmp/tmpMG0VyV/hieradata/volume.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/compute-post-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/extraconfig/tasks/yum_update.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/swift-storage-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/extraconfig/tasks/yum_update.sh
19:26:18 /tmp/tmpMG0VyV/puppet/swift-devices-and-proxy-config.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/controller-config-pacemaker.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/compute-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/volume.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/ceph-storage-post-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/extraconfig/controller/noop.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/noop.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/ceph-storage-puppet.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/ceph.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/vip-config.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/controller.yaml
19:26:18 /tmp/tmpMG0VyV/plan.yaml
19:26:18 /tmp/tmpMG0VyV/environment.yaml
19:26:18 /tmp/tmpMG0VyV/network/ports/net_ip_list_map.yaml
19:26:18 /tmp/tmpMG0VyV/hieradata/compute.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/hieradata/compute.yaml
19:26:18 /tmp/tmpMG0VyV/hieradata/ceph.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/overcloud_controller_pacemaker.pp
19:26:18 /tmp/tmpMG0VyV/hieradata/common.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/manifests/ringbuilder.pp
19:26:18 /tmp/tmpMG0VyV/extraconfig/post_deploy/default.yaml
19:26:18 /tmp/tmpMG0VyV/net-config-noop.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/ceph-cluster-config.yaml
19:26:18 /tmp/tmpMG0VyV/firstboot/userdata_default.yaml
19:26:18 /tmp/tmpMG0VyV/puppet/all-nodes-config.yaml

This error was logged previously during node discovery:
https://bugzilla.redhat.com/show_bug.cgi?id=1212134

Version-Release number of selected component (if applicable):

[root@host15 ~]# rpm -qa | grep openstack
openstack-tripleo-common-0.0.1.dev6-0.git49b57eb.el7ost.noarch
openstack-ceilometer-alarm-2015.1.0-2.el7ost.noarch
openstack-swift-account-2.3.0-1.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-2.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-3.el7ost.noarch
openstack-tripleo-0.0.6-0.1.git812abe0.el7ost.noarch
openstack-tuskar-0.4.18-2.el7ost.noarch
openstack-swift-2.3.0-1.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-10.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-heat-api-2015.1.0-3.el7ost.noarch
openstack-ceilometer-central-2015.1.0-2.el7ost.noarch
openstack-nova-scheduler-2015.1.0-10.el7ost.noarch
openstack-nova-cert-2015.1.0-10.el7ost.noarch
openstack-nova-common-2015.1.0-10.el7ost.noarch
openstack-tripleo-image-elements-0.9.6-1.el7ost.noarch
openstack-ceilometer-notification-2015.1.0-2.el7ost.noarch
openstack-ceilometer-collector-2015.1.0-2.el7ost.noarch
openstack-ironic-common-2015.1.0-4.el7ost.noarch
openstack-nova-compute-2015.1.0-10.el7ost.noarch
openstack-nova-conductor-2015.1.0-10.el7ost.noarch
openstack-neutron-openvswitch-2015.1.0-7.el7ost.noarch
openstack-swift-container-2.3.0-1.el7ost.noarch
openstack-nova-api-2015.1.0-10.el7ost.noarch
openstack-dashboard-theme-2015.1.0-10.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-1.el7ost.noarch
openstack-nova-console-2015.1.0-10.el7ost.noarch
openstack-neutron-common-2015.1.0-7.el7ost.noarch
openstack-neutron-2015.1.0-7.el7ost.noarch
openstack-heat-engine-2015.1.0-3.el7ost.noarch
openstack-ceilometer-common-2015.1.0-2.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-3.el7ost.noarch
openstack-ironic-conductor-2015.1.0-4.el7ost.noarch
openstack-ceilometer-api-2015.1.0-2.el7ost.noarch
openstack-ironic-api-2015.1.0-4.el7ost.noarch
openstack-swift-proxy-2.3.0-1.el7ost.noarch
openstack-puppet-modules-2015.1.5-1.el7ost.noarch
openstack-dashboard-2015.1.0-10.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-selinux-0.6.32-1.el7ost.noarch
openstack-tempest-kilo-20150507.2.el7ost.noarch
openstack-neutron-ml2-2015.1.0-7.el7ost.noarch
openstack-keystone-2015.1.0-1.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-9.el7ost.noarch
openstack-glance-2015.1.0-6.el7ost.noarch
python-openstackclient-1.0.3-2.el7ost.noarch
openstack-ironic-discoverd-1.1.0-3.el7ost.noarch
openstack-swift-object-2.3.0-1.el7ost.noarch
python-django-openstack-auth-1.2.0-2.el7ost.noarch
openstack-tuskar-ui-0.3.0-2.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-heat-common-2015.1.0-3.el7ost.noarch

How reproducible:

Often but not at every deploy. Redeploying the overcloud gets by the error (but CI fails out)

Steps to Reproduce:
1. Install bits from latest poodle/puddle 
2. openstack overcloud deploy --plan-uuid $ID --control-scale $CONTROLSCALE --compute-scale $COMPUTESCALE --ceph-storage-scale $CEPHSTORAGESCALE
3. See warnings and errors

Actual results:
Deploying overcloud fails

Expected results:
Overcloud deployed

Additional info:

Comment 4 Dmitry Tantsur 2015-06-23 07:17:56 UTC
Oh... please provide ironic conductor and API logs around failure time (sudo journalctl -u openstack-ironic-api -u openstack-ironic-conductor)

Comment 5 Dmitry Tantsur 2015-06-23 11:29:33 UTC
I've started an rdo-list thread to discuss the issue: https://www.redhat.com/archives/rdo-list/2015-June/msg00149.html

Comment 6 Ronelle Landy 2015-06-23 16:07:09 UTC
Will copy logs and journalctl output when we hit the error again - it's sporadic.

Comment 7 Dmitry Tantsur 2015-06-24 11:34:03 UTC
https://review.gerrithub.io/#/c/237471/ is an instack-undercloud patch to bump retry interval for Ironic globally. I'm still interested in logs, however.

Comment 8 Mike Burns 2015-06-26 11:44:21 UTC
This occurred in CI again on Dell BM.  will pull the logs from the job and post here

Comment 9 Mike Burns 2015-06-26 11:45:43 UTC
Created attachment 1043486 [details]
undercloud logs

Comment 10 Mike Burns 2015-06-26 11:46:39 UTC
Created attachment 1043487 [details]
host0 logs

Comment 11 Dmitry Tantsur 2015-06-26 12:17:42 UTC
Upstream patch to bump retry interval: https://review.openstack.org/#/c/196020/
I intend to backport it asap.

Comment 12 Dmitry Tantsur 2015-06-26 12:51:07 UTC
I also suggest backporting https://review.openstack.org/#/c/194619/ for ga or for later to make such problems debugging simpler.

Comment 15 Marius Cornea 2015-07-22 09:33:18 UTC
I couldn't reproduce this neither on virtual nor baremetal environment.

Comment 17 errata-xmlrpc 2015-08-05 13:27:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1548