rhos-director: deployment with custom ironic role: cleaning fails KeystoneFailure: 'NoneType' object has no attribute 'find' Environment: openstack-ironic-conductor-6.2.1-0.20170126214641.2427c7b.el7ost.noarch openstack-ironic-inspector-4.2.1-0.20170126202204.d557080.el7ost.noarch instack-undercloud-6.0.0-0.20170127055514.317db76.el7ost.noarch puppet-ironic-10.1.0-0.20170126121442.d991f28.el7ost.noarch openstack-tripleo-heat-templates-6.0.0-0.20170127041112.ce54697.el7ost.1.noarch openstack-ironic-common-6.2.1-0.20170126214641.2427c7b.el7ost.noarch python-ironic-lib-2.5.2-0.20170123115307.ace87b6.el7ost.noarch openstack-ironic-api-6.2.1-0.20170126214641.2427c7b.el7ost.noarch python-ironicclient-1.10.0-0.20170120194459.808a4cb.el7ost.noarch python-ironic-inspector-client-1.10.0-0.20161219133602.0eae82e.el7ost.noarch Steps to reproduce: 1) Successfully deploy overcloud with custom role for ironic: - name: Ironic CountDefault: 1 HostnameFormatDefault: '%stackname%-ironic-%index%' disable_upgrade_deployment: True ServicesDefault: - OS::TripleO::Services::IronicConductor - OS::TripleO::Services::CACerts - OS::TripleO::Services::Timezone - OS::TripleO::Services::Ntp - OS::TripleO::Services::Snmp - OS::TripleO::Services::Kernel - OS::TripleO::Services::TripleoPackages - OS::TripleO::Services::TripleoFirewall - OS::TripleO::Services::SensuClient - OS::TripleO::Services::FluentdClient This was the deployment command: openstack overcloud deploy --templates -r /home/stack/roles_data.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e virt/ceph.yaml -e /home/stack/network-isolation.yaml -e virt/network/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml -e ironic.yaml -e flat_networks.yaml [stack@undercloud-0 ~]$ cat ironic.yaml parameter_defaults: IronicEnabledDrivers: - pxe_ssh NovaSchedulerDefaultFilters: - RetryFilter - AggregateInstanceExtraSpecsFilter - AvailabilityZoneFilter - RamFilter - DiskFilter - ComputeFilter - ComputeCapabilitiesFilter - ImagePropertiesFilter IronicCleaningDiskErase: metadata IronicIPXEEnabled: true ControllerExtraConfig: ironic::drivers::ssh::libvirt_uri: 'qemu:///system' [stack@undercloud-0 ~]$ cat flat_networks.yaml parameter_defaults: NeutronBridgeMappings: datacentre:br-ex,baremetal:br-baremetal NeutronFlatNetworks: datacentre,baremetal [stack@undercloud-0 ~]$ heat stack-list && nova list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+-----------------+----------------------+--------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+-----------------+----------------------+--------------+ | 904bca90-52f8-40ad-a259-061d6d5f34d8 | overcloud | CREATE_COMPLETE | 2017-02-11T22:07:45Z | None | +--------------------------------------+------------+-----------------+----------------------+--------------+ +--------------------------------------+-------------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------+--------+------------+-------------+------------------------+ | 5c3cbc5d-b888-4510-8f39-29edd91f9f8d | overcloud-cephstorage-0 | ACTIVE | - | Running | ctlplane=192.168.24.7 | | 00a2760b-453d-48e8-8a6e-ba101585c338 | overcloud-cephstorage-1 | ACTIVE | - | Running | ctlplane=192.168.24.10 | | 599ad661-bb1a-43a8-b987-07e0ca7a1601 | overcloud-cephstorage-2 | ACTIVE | - | Running | ctlplane=192.168.24.13 | | 4b05c551-049e-489d-967f-bc15b96f11a7 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.17 | | 96b4afed-977a-4cbb-b64c-cc2f062ca321 | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | 9c38e69b-2074-451e-b711-81c7df048b47 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.14 | | 3a74c89f-51ce-40e1-a2f3-f09da4309d34 | overcloud-ironic-0 | ACTIVE | - | Running | ctlplane=192.168.24.16 | | b96cd88e-8e96-4978-850f-5f805f3436c5 | overcloud-ironic-1 | ACTIVE | - | Running | ctlplane=192.168.24.8 | | ba6a1cac-7c20-4629-9d65-7e8e4c8c1433 | overcloud-novacompute-0 | ACTIVE | - | Running | ctlplane=192.168.24.15 | +--------------------------------------+-------------------------+--------+------------+-------------+------------------------+ Attempt to create BM instances in overcloud following this guide: http://docs.openstack.org/developer/tripleo-docs/advanced_deployment/baremetal_overcloud.html "Note: The baremetal node provide command makes a node go through cleaning procedure, so it might take some time depending on the configuration." Result: The cleaning fails right away - the node doesn't even boot. [stack@undercloud-0 ~]$ ironic node-list +--------------------------------------+------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+------+---------------+-------------+--------------------+-------------+ | f0f253ef-1644-4116-b5e4-bc483256b81b | bm-0 | None | power off | clean failed | True | | 7c835f19-deab-4b5d-aef0-d38463672f46 | bm-1 | None | power off | clean failed | True | +--------------------------------------+------+---------------+-------------+--------------------+-------------+ Note: This works on the same release without moving "OS::TripleO::Services::IronicConductor" to a seperate role. Do we need to move more services to ironic for this to work?
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
I think this may be because of the hiera interpolation here: https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/ironic-base.yaml#L64 To confirm, please can you provide the output from the following command, run on one of the controllers, and on the custom ironic role: sudo hiera tripleo::profile::base::database::mysql::client_bind_address I suspect this is nil on the ironic role node(s), so we need to define that data differently in the heat template (e.g via the EndpointMap not hiera interpolation).
You could also collect "sudo hiera ironic::database_connection" in the failing and working cases, and/or inspect the related ironic.conf entry to see if it's malformed.
You are right Steven, the connection is malformed due to tripleo::profile::base::database::mysql::client_bind_address: being nil in hiera. [heat-admin@overcloud-ironic-0 ~]$ sudo grep -w ^connection /etc/ironic/ironic.conf connection = mysql+pymysql://ironic:u6EECMV8Uv2NT4qWnTkx698TE.1.20/ironic?bind_address= [heat-admin@overcloud-ironic-0 ~]$ sudo hiera ironic::database_connection mysql+pymysql://ironic:u6EECMV8Uv2NT4qWnTkx698TE.1.20/ironic?bind_address= [heat-admin@overcloud-ironic-0 ~]$ sudo hiera tripleo::profile::base::database::mysql::client_bind_address nil In a controller it's added fine: [root@overcloud-controller-1 ~]# sudo hiera tripleo::profile::base::database::mysql::client_bind_address 172.17.1.19 [root@overcloud-controller-1 ~]# grep ^connection /etc/ironic/ironic.conf connection = mysql+pymysql://ironic:u6EECMV8Uv2NT4qWnTkx698TE.1.20/ironic?bind_address=172.17.1.19
Ok, so we need to replace the "%{hiera('tripleo::profile::base::database::mysql::client_bind_address')}"" line with a reference to the local bind IP from the ServiceNetMap. So it should probably look like: {get_param: [ServiceNetMap, MysqlNetwork]} Same as in tht/puppet/services/database/mysql.yaml I'm not quite sure why this was done using hiera interpolation but the same problem exists in various services which will all break in the same way: $ grep -R client_bind_address ./* | grep hiera | cut -d: -f1 | sort | uniq ./aodh-base.yaml ./barbican-api.yaml ./ceilometer-base.yaml ./cinder-base.yaml ./ec2-api.yaml ./glance-api.yaml ./glance-registry.yaml.bak ./gnocchi-base.yaml ./heat-engine.yaml ./ironic-base.yaml ./keystone.yaml ./manila-base.yaml ./mistral-base.yaml ./neutron-api.yaml ./neutron-plugin-plumgrid.yaml ./nova-base.yaml ./octavia-api.yaml ./panko-base.yaml ./sahara-base.yaml
Clearing needinfo as comment #7 provides the information needed to proceed with a fix.
Upstream bug raised: https://bugs.launchpad.net/tripleo/+bug/1664524
Upstream patch pushed, needs further testing/review: https://review.openstack.org/433607
This should have been resolved with https://review.openstack.org/#/c/431425/ as we switched out how we were doing this configuration to use a file instead of trying to pass the ip address as params.
https://review.openstack.org/#/c/431425/ landed which I verified by inspection is in openstack-tripleo-heat-templates-6.0.0-0.20170222195630.46117f4.el7ost.noarch.rpm
Environment: openstack-tripleo-heat-templates-6.0.0-0.20170222195630.46117f4.el7ost.noarch What happens now is that the node changes status to "cleaning": [stack@undercloud-0 ~]$ ironic node-list +--------------------------------------+------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+------+---------------+-------------+--------------------+-------------+ | 0c471098-3720-4e3f-91af-2c7c5f21ad34 | bm-0 | None | power off | cleaning | False | +--------------------------------------+------+---------------+-------------+--------------------+-------------+ But it doesn't start and If I check the node: [stack@undercloud-0 ~]$ ironic node-show bm-0 +------------------------+--------------------------------------------------------------------------+ | Property | Value | +------------------------+--------------------------------------------------------------------------+ | chassis_uuid | None | | clean_step | {} | | console_enabled | False | | created_at | 2017-03-03T21:38:21+00:00 | | driver | pxe_ssh | | driver_info | {u'ssh_username': u'stack', u'deploy_kernel': u'71884d90-73ff-4e5d-bf3f- | | | 7152af9c95ea', u'deploy_ramdisk': u'e9ca53aa-c54b- | | | 4b63-a579-11da687ddcfa', u'ssh_key_contents': u'******', | | | u'ssh_virt_type': u'virsh', u'ssh_address': u'172.16.0.1'} | | driver_internal_info | {} | | extra | {} | | inspection_finished_at | None | | inspection_started_at | None | | instance_info | {} | | instance_uuid | None | | last_error | Async execution of _do_node_clean failed with error: 'NoneType' object | | | has no attribute 'find' | | maintenance | False | | maintenance_reason | None | | name | bm-0 | | network_interface | | | power_state | power off | | properties | {u'memory_mb': u'7822', u'cpu_arch': u'x86_64', u'local_gb': u'41', | | | u'cpus': u'1'} | | provision_state | cleaning | | provision_updated_at | 2017-03-03T21:38:35+00:00 | | raid_config | | | reservation | None | | resource_class | | | target_power_state | None | | target_provision_state | available | | target_raid_config | | | updated_at | 2017-03-03T21:38:52+00:00 | | uuid | 0c471098-3720-4e3f-91af-2c7c5f21ad34 | +------------------------+--------------------------------------------------------------------------+ Note the failure: Async execution of _do_node_clean failed with error: 'NoneType' object has no attribute 'find'
Hi Sasha, could you do the checks done in comment #7 to see if we are still missing the bind address in the database connection information?
Seems like it: [stack@undercloud-0 ~]$ nova list +--------------------------------------+-------------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------+--------+------------+-------------+------------------------+ | cdd0655b-4f6e-4e88-be5b-5ee8390430d3 | overcloud-cephstorage-0 | ACTIVE | - | Running | ctlplane=192.168.24.9 | | bbbfe9e2-97de-4c10-b6b1-0bc0293aa5e6 | overcloud-cephstorage-1 | ACTIVE | - | Running | ctlplane=192.168.24.15 | | 3a7af452-f386-432d-aeb5-96f0e0e14be2 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.19 | | 569b1b9d-dedf-434b-a33b-bca90c92215f | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.10 | | 5fae9050-afa0-4e98-b5a5-9767047b293d | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.18 | | a33390dd-5cdd-4b69-a0ed-532fc37ded2f | overcloud-ironic-0 | ACTIVE | - | Running | ctlplane=192.168.24.22 | | 943b2aaa-9d77-400f-a250-2202133b4b00 | overcloud-ironic-1 | ACTIVE | - | Running | ctlplane=192.168.24.11 | | a094afb3-5c6c-4d23-9182-98554ea2e8c7 | overcloud-novacompute-0 | ACTIVE | - | Running | ctlplane=192.168.24.16 | +--------------------------------------+-------------------------+--------+------------+-------------+------------------------+ n[stack@undercloud-0 ~]$ ssh heat-admin.24.10 [heat-admin@overcloud-controller-1 ~]$ sudo -i [root@overcloud-controller-1 ~]# sudo hiera tripleo::profile::base::database::mysql::client_bind_address 172.17.1.13 [root@overcloud-controller-1 ~]# grep ^connection /etc/ironic/ironic.conf connection = mysql+pymysql://ironic:9dkU8tBq4XAFydN4tpRaZFVvr.1.14/ironic?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo [stack@undercloud-0 ~]$ ssh heat-admin.24.22 [heat-admin@overcloud-ironic-0 ~]$ sudo hiera tripleo::profile::base::database::mysql::client_bind_address nil [heat-admin@overcloud-ironic-0 ~]$ sudo hiera ironic::database_connection mysql+pymysql://ironic:9dkU8tBq4XAFydN4tpRaZFVvr.1.14/ironic?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo [heat-admin@overcloud-ironic-0 ~]$ sudo grep -w ^connection /etc/ironic/ironic.conf connection = mysql+pymysql://ironic:9dkU8tBq4XAFydN4tpRaZFVvr.1.14/ironic?read_default_file=/etc/my.cnf.d/tripleo.cnf&read_default_group=tripleo
Environment: instack-undercloud-6.0.0-2.el7ost.noarch openstack-puppet-modules-10.0.0-0.20170307021643.0333c73.el7ost.noarch openstack-tripleo-heat-templates-6.0.0-0.20170307170102.3134785.0rc2.el7ost.noarch Was able to deploy and run the clean successfully. nova list: +--------------------------------------+-------------------------+--------+------------+-------------+------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------------------------+--------+------------+-------------+------------------------+ | a3c941a1-dac0-455c-a9ff-3140d2fa0530 | overcloud-cephstorage-0 | ACTIVE | - | Running | ctlplane=192.168.24.15 | | 2591fe42-8491-4bd6-b824-3ed91fa5ca17 | overcloud-cephstorage-1 | ACTIVE | - | Running | ctlplane=192.168.24.13 | | 25ab1cc1-ab8c-424e-aa7e-f84d1888c7a8 | overcloud-controller-0 | ACTIVE | - | Running | ctlplane=192.168.24.12 | | b30b51ed-2e71-4fd3-af5a-3c0abc9bae05 | overcloud-controller-1 | ACTIVE | - | Running | ctlplane=192.168.24.17 | | c38cf70a-e39e-47c3-a745-3eb03ea82cb8 | overcloud-controller-2 | ACTIVE | - | Running | ctlplane=192.168.24.6 | | cca7b9d0-f40a-4776-9cf7-731ba142fe3f | overcloud-ironic-0 | ACTIVE | - | Running | ctlplane=192.168.24.9 | | 197867aa-aad7-4e48-8b9c-b5b20c5fd763 | overcloud-ironic-1 | ACTIVE | - | Running | ctlplane=192.168.24.7 | | 4793374e-549f-4a63-b2c0-e025c0f43f0c | overcloud-novacompute-0 | ACTIVE | - | Running | ctlplane=192.168.24.10 | +--------------------------------------+-------------------------+--------+------------+-------------+------------------------+ - name: Ironic CountDefault: 1 HostnameFormatDefault: '%stackname%-ironic-%index%' disable_upgrade_deployment: True ServicesDefault: - OS::TripleO::Services::IronicConductor - OS::TripleO::Services::IronicApi - OS::TripleO::Services::CACerts - OS::TripleO::Services::Timezone - OS::TripleO::Services::Ntp - OS::TripleO::Services::Snmp - OS::TripleO::Services::Kernel - OS::TripleO::Services::TripleoPackages - OS::TripleO::Services::TripleoFirewall - OS::TripleO::Services::SensuClient - OS::TripleO::Services::FluentdClient
The read_default_group=tripleo in comment #16 shows https://review.openstack.org/#/c/431425 landed, I don't think the ironic issues were related to the bugfix under test here, moving back on ON_QA
Verified based on Comment #17.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245