Bug 1640472
| Summary: | overcloud instance creation fails with :No valid host was found" | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | pkomarov |
| Component: | openstack-tripleo-heat-templates | Assignee: | Michele Baldessari <michele> |
| Status: | CLOSED ERRATA | QA Contact: | pkomarov |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 14.0 (Rocky) | CC: | berrange, chjones, dasmith, eglynn, jhakimra, kchamart, mburns, michele, pkomarov, sbauza, sgordon, therve, vromanso |
| Target Milestone: | beta | Keywords: | Regression, Triaged |
| Target Release: | 14.0 (Rocky) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-9.0.1-0.20181013060874.el7ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-01-11 11:54:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
pkomarov
2018-10-18 07:44:19 UTC
Description of problem: On a HA+Instance-ha deployment of osp14 puddle : 2018-10-08.4 Overcloud instance creation fails with : No valid host was found How reproducible: There is automation for this test at : https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/pidone/view/instance-ha/job/DFG-pidone-instance-ha-14_director-rhel-virthost-3cont_2comp-ipv4-vxlan-instance-ha-test-suite/ overcloud and undercloud SOS-reports are at : http://rhos-release.virt.bos.redhat.com/log/pkomarov_sosreports/BZ1640472/ Additional info : Some docker containers in perticular nova-api are unhealthy, and openstack-nova-compute are stuck in restarting : (undercloud) [stack@undercloud-0 ~]$ ansible overcloud -mshell -b -a'docker ps |grep "unhealthy\|Restarting"' [WARNING]: Found both group and host with same name: undercloud overcloud-novacomputeiha-0 | SUCCESS | rc=0 >> 1c200146064d 192.168.24.1:8787/rhosp14/openstack-nova-compute:2018-10-08.4 "kolla_start" 47 hours ago Restarting (1) 17 hours ago nova_compute overcloud-novacomputeiha-1 | SUCCESS | rc=0 >> 570f57b4c17f 192.168.24.1:8787/rhosp14/openstack-nova-compute:2018-10-08.4 "kolla_start" 47 hours ago Restarting (1) 17 hours ago nova_compute controller-0 | SUCCESS | rc=0 >> 9c51edf48fbc 192.168.24.1:8787/rhosp14/openstack-nova-api:2018-10-08.4 "kolla_start" 47 hours ago Up 47 hours (unhealthy) nova_metadata ae15fc9581ec 192.168.24.1:8787/rhosp14/openstack-heat-api-cfn:2018-10-08.4 "kolla_start" 47 hours ago Up 47 hours (unhealthy) heat_api_cfn cc3b59e3e95b 192.168.24.1:8787/rhosp14/openstack-heat-api:2018-10-08.4 "kolla_start" 47 hours ago Up 47 hours (unhealthy) heat_api controller-2 | SUCCESS | rc=0 >> 414f46d5bb00 192.168.24.1:8787/rhosp14/openstack-nova-api:2018-10-08.4 "kolla_start" 47 hours ago Up 47 hours (unhealthy) nova_metadata 17f772c3968c 192.168.24.1:8787/rhosp14/openstack-heat-api-cfn:2018-10-08.4 "kolla_start" 47 hours ago Up 47 hours (unhealthy) heat_api_cfn cf347ef00ecd 192.168.24.1:8787/rhosp14/openstack-neutron-server:2018-10-08.4 "kolla_start" 47 hours ago Up 47 hours (unhealthy) neutron_api 0db2c0b1afe8 192.168.24.1:8787/rhosp14/openstack-heat-api:2018-10-08.4 "kolla_start" 47 hours ago Up 47 hours (unhealthy) heat_api controller-1 | SUCCESS | rc=0 >> 552e905d6cbb 192.168.24.1:8787/rhosp14/openstack-nova-api:2018-10-08.4 "kolla_start" 47 hours ago Up 47 hours (unhealthy) nova_metadata e06c85a37b8c 192.168.24.1:8787/rhosp14/openstack-heat-api-cfn:2018-10-08.4 "kolla_start" 47 hours ago Up 47 hours (unhealthy) heat_api_cfn d8487659d15d 192.168.24.1:8787/rhosp14/openstack-neutron-server:2018-10-08.4 "kolla_start" 47 hours ago Up 25 hours (unhealthy) neutron_api bbe41eac5d07 192.168.24.1:8787/rhosp14/openstack-heat-api:2018-10-08.4 "kolla_start" 47 hours ago Up 47 hours (unhealthy) heat_api Issue seems due to this: Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: ++ cat /run_command Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + CMD='/var/lib/nova/instanceha/check-run-nova-compute ' Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + ARGS= Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + [[ ! -n '' ]] Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + . kolla_extend_start Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: ++ [[ ! -d /var/log/kolla/nova ]] Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: +++ stat -c %a /var/log/kolla/nova Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: ++ [[ 2755 != \7\5\5 ]] Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: ++ chmod 755 /var/log/kolla/nova Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: ++ . /usr/local/bin/kolla_nova_extend_start Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: +++ [[ ! -d /var/lib/nova/instances ]] Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: Running command: '/var/lib/nova/instanceha/check-run-nova-compute ' Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + echo 'Running command: '\''/var/lib/nova/instanceha/check-run-nova-compute '\''' Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + exec /var/lib/nova/instanceha/check-run-nova-compute Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]: Traceback (most recent call last): Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]: File "/var/lib/nova/instanceha/check-run-nova-compute", line 191, in <module> Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]: connection = create_nova_connection(config.sections["placement"]) Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]: File "/var/lib/nova/instanceha/check-run-nova-compute", line 147, in create_nova_connection Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]: region_name=options["os_region_name"][0], Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]: KeyError: 'os_region_name' OSP13: [root@compute-0 nova]# crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf placement os_region_name regionOne OSP14: [root@overcloud-novacompute-0 nova]# crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf placement os_region_name Parameter not found: os_region_name Indeed in osp14 it is commented out: [root@overcloud-novacompute-0 nova]# grep -ir os_region nova.conf:#os_region_name=<None> This broke because os_region_name is now deprecated and we need to use:
commit f2e72352b1376ce719614e9cad4e4c71a3f9c3d8
Author: Juan Antonio Osorio Robles <jaosorior>
Date: Thu Oct 4 15:52:40 2018 +0300
Fix placement region setting
We were using a deprecated interfce to set this value. This uses the
correct one.
Closes-Bug: #1793665
Change-Id: Ib7717911aba3267f855ac6682b0144bfe92034fb
diff --git a/puppet/services/nova-base.yaml b/puppet/services/nova-base.yaml
index f12b0d816dea..3e43b8cf7477 100644
--- a/puppet/services/nova-base.yaml
+++ b/puppet/services/nova-base.yaml
@@ -260,7 +260,7 @@ outputs:
nova::placement::project_name: 'service'
nova::placement::password: {get_param: NovaPassword}
nova::placement::auth_url: {get_param: [EndpointMap, KeystoneInternal, uri_no_suffix]}
- nova::placement::os_region_name: {get_param: KeystoneRegion}
+ nova::placement::region_name: {get_param: KeystoneRegion}
nova::placement::os_interface: {get_param: NovaPlacementAPIInterface}
nova::database_connection:
make_url:
We need to use region_name:
- OSP14
[root@overcloud-novacompute-0 nova]# crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf placement region_name
regionOne
- OSP13
[root@compute-0 nova]# crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf placement region_name
Parameter not found: region_name
linked patch verified, (https://review.openstack.org/611551) #Redeployed using the attached patch: [root@undercloud-0 ~]# diff /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/instanceha/check-run-nova-compute ./check-run-nova-compute_ORG 114,120d113 < if 'region_name' in options: < region = options['region_name'][0] < elif 'os_region_name' in options: < region = options['os_region_name'][0] < else: # We actually try to make a client call even with an empty region < region = None < 146c139 < region_name=region, --- > region_name=options["os_region_name"][0], 154c147 < region_name=region, --- > region_name=options["os_region_name"][0], #Now instance creation is succesfull: (overcloud) [stack@undercloud-0 ~]$ openstack server create --flavor m1.nano --image cirros-0.3.4-x86_64-disk --wait osvm +-------------------------------------+-----------------------------------------------------------------+ | Field | Value | +-------------------------------------+-----------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | overcloud-novacomputeiha-0.localdomain | | OS-EXT-SRV-ATTR:hypervisor_hostname | overcloud-novacomputeiha-0.localdomain | | OS-EXT-SRV-ATTR:instance_name | instance-00000002 | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | ...//... #And nova_compute dockers on the computes are in healthy state: (overcloud) [stack@undercloud-0 ~]$ ansible compute -mshell -b -a'docker ps |grep nova_compute' [WARNING]: Found both group and host with same name: undercloud overcloud-novacomputeiha-1 | SUCCESS | rc=0 >> 37652a3d01f2 192.168.24.1:8787/rhosp14/openstack-nova-compute:2018-10-10.3 "kolla_start" 4 hours ago Up 4 hours (healthy) nova_compute overcloud-novacomputeiha-0 | SUCCESS | rc=0 >> abebac17239f 192.168.24.1:8787/rhosp14/openstack-nova-compute:2018-10-10.3 "kolla_start" 4 hours ago Up 4 hours (healthy) nova_compute Verified - comment 7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045 |