Bug 1640472 - overcloud instance creation fails with :No valid host was found"
Summary: overcloud instance creation fails with :No valid host was found"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 14.0 (Rocky)
Assignee: Michele Baldessari
QA Contact: pkomarov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-18 07:44 UTC by pkomarov
Modified: 2019-09-09 14:26 UTC (History)
13 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.0.1-0.20181013060874.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-11 11:54:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1798560 0 None None None 2018-10-18 09:43:07 UTC
OpenStack gerrit 613258 0 None None None 2018-10-25 10:13:21 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:54:18 UTC

Description pkomarov 2018-10-18 07:44:19 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 pkomarov 2018-10-18 08:09:33 UTC
Description of problem:

On a HA+Instance-ha deployment of osp14 puddle : 2018-10-08.4
Overcloud instance creation fails with : No valid host was found

How reproducible:
There is automation for this test at :
 https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/pidone/view/instance-ha/job/DFG-pidone-instance-ha-14_director-rhel-virthost-3cont_2comp-ipv4-vxlan-instance-ha-test-suite/

overcloud and undercloud SOS-reports are at : 
http://rhos-release.virt.bos.redhat.com/log/pkomarov_sosreports/BZ1640472/


Additional info : 

Some docker containers in perticular nova-api are unhealthy, 
and openstack-nova-compute are stuck in restarting : 

(undercloud) [stack@undercloud-0 ~]$ ansible overcloud -mshell -b -a'docker ps |grep "unhealthy\|Restarting"'
 [WARNING]: Found both group and host with same name: undercloud

overcloud-novacomputeiha-0 | SUCCESS | rc=0 >>
1c200146064d        192.168.24.1:8787/rhosp14/openstack-nova-compute:2018-10-08.4                "kolla_start"       47 hours ago        Restarting (1) 17 hours ago                       nova_compute

overcloud-novacomputeiha-1 | SUCCESS | rc=0 >>
570f57b4c17f        192.168.24.1:8787/rhosp14/openstack-nova-compute:2018-10-08.4                "kolla_start"       47 hours ago        Restarting (1) 17 hours ago                       nova_compute

controller-0 | SUCCESS | rc=0 >>
9c51edf48fbc        192.168.24.1:8787/rhosp14/openstack-nova-api:2018-10-08.4                    "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       nova_metadata
ae15fc9581ec        192.168.24.1:8787/rhosp14/openstack-heat-api-cfn:2018-10-08.4                "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       heat_api_cfn
cc3b59e3e95b        192.168.24.1:8787/rhosp14/openstack-heat-api:2018-10-08.4                    "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       heat_api

controller-2 | SUCCESS | rc=0 >>
414f46d5bb00        192.168.24.1:8787/rhosp14/openstack-nova-api:2018-10-08.4                    "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       nova_metadata
17f772c3968c        192.168.24.1:8787/rhosp14/openstack-heat-api-cfn:2018-10-08.4                "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       heat_api_cfn
cf347ef00ecd        192.168.24.1:8787/rhosp14/openstack-neutron-server:2018-10-08.4              "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       neutron_api
0db2c0b1afe8        192.168.24.1:8787/rhosp14/openstack-heat-api:2018-10-08.4                    "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       heat_api

controller-1 | SUCCESS | rc=0 >>
552e905d6cbb        192.168.24.1:8787/rhosp14/openstack-nova-api:2018-10-08.4                    "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       nova_metadata
e06c85a37b8c        192.168.24.1:8787/rhosp14/openstack-heat-api-cfn:2018-10-08.4                "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       heat_api_cfn
d8487659d15d        192.168.24.1:8787/rhosp14/openstack-neutron-server:2018-10-08.4              "kolla_start"            47 hours ago        Up 25 hours (unhealthy)                       neutron_api
bbe41eac5d07        192.168.24.1:8787/rhosp14/openstack-heat-api:2018-10-08.4                    "kolla_start"            47 hours ago        Up 47 hours (unhealthy)                       heat_api

Comment 2 Michele Baldessari 2018-10-18 08:44:04 UTC
Issue seems due to this:
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: ++ cat /run_command
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + CMD='/var/lib/nova/instanceha/check-run-nova-compute '
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + ARGS=
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + [[ ! -n '' ]]
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + . kolla_extend_start
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: ++ [[ ! -d /var/log/kolla/nova ]]
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: +++ stat -c %a /var/log/kolla/nova
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: ++ [[ 2755 != \7\5\5 ]]
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: ++ chmod 755 /var/log/kolla/nova
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: ++ . /usr/local/bin/kolla_nova_extend_start
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: +++ [[ ! -d /var/lib/nova/instances ]]
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: Running command: '/var/lib/nova/instanceha/check-run-nova-compute '
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + echo 'Running command: '\''/var/lib/nova/instanceha/check-run-nova-compute '\'''
Oct 16 09:08:11 overcloud-novacomputeiha-0 dockerd-current[14673]: + exec /var/lib/nova/instanceha/check-run-nova-compute
Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]: Traceback (most recent call last): 
Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]:   File "/var/lib/nova/instanceha/check-run-nova-compute", line 191, in <module>
Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]:     connection = create_nova_connection(config.sections["placement"])
Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]:   File "/var/lib/nova/instanceha/check-run-nova-compute", line 147, in create_nova_connection
Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]:     region_name=options["os_region_name"][0],
Oct 16 09:08:12 overcloud-novacomputeiha-0 dockerd-current[14673]: KeyError: 'os_region_name'

Comment 3 Michele Baldessari 2018-10-18 08:54:21 UTC
OSP13:
[root@compute-0 nova]# crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf placement os_region_name
regionOne

OSP14:
[root@overcloud-novacompute-0 nova]# crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf placement os_region_name
Parameter not found: os_region_name

Indeed in osp14 it is commented out:
[root@overcloud-novacompute-0 nova]# grep -ir os_region
nova.conf:#os_region_name=<None>

Comment 4 Michele Baldessari 2018-10-18 09:06:27 UTC
This broke because os_region_name is now deprecated and we need to use:
commit f2e72352b1376ce719614e9cad4e4c71a3f9c3d8
Author: Juan Antonio Osorio Robles <jaosorior>
Date:   Thu Oct 4 15:52:40 2018 +0300

    Fix placement region setting
    
    We were using a deprecated interfce to set this value. This uses the
    correct one.
    
    Closes-Bug: #1793665
    Change-Id: Ib7717911aba3267f855ac6682b0144bfe92034fb

diff --git a/puppet/services/nova-base.yaml b/puppet/services/nova-base.yaml
index f12b0d816dea..3e43b8cf7477 100644
--- a/puppet/services/nova-base.yaml
+++ b/puppet/services/nova-base.yaml
@@ -260,7 +260,7 @@ outputs:
           nova::placement::project_name: 'service'
           nova::placement::password: {get_param: NovaPassword}
           nova::placement::auth_url: {get_param: [EndpointMap, KeystoneInternal, uri_no_suffix]}
-          nova::placement::os_region_name: {get_param: KeystoneRegion}
+          nova::placement::region_name: {get_param: KeystoneRegion}
           nova::placement::os_interface: {get_param: NovaPlacementAPIInterface}
           nova::database_connection:
             make_url:

We need to use region_name:
- OSP14
[root@overcloud-novacompute-0 nova]# crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf placement region_name
regionOne

- OSP13
[root@compute-0 nova]# crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf placement region_name
Parameter not found: region_name

Comment 7 pkomarov 2018-10-19 20:02:06 UTC
linked patch verified, 
(https://review.openstack.org/611551) 

#Redeployed using the attached patch: 

[root@undercloud-0 ~]# diff /usr/share/openstack-tripleo-heat-templates/extraconfig/tasks/instanceha/check-run-nova-compute ./check-run-nova-compute_ORG 
114,120d113
<     if 'region_name' in options:
<         region = options['region_name'][0]
<     elif 'os_region_name' in options:
<         region = options['os_region_name'][0]
<     else: # We actually try to make a client call even with an empty region
<         region = None
< 
146c139
<                                  region_name=region,
---
>                                  region_name=options["os_region_name"][0],
154c147
<                                  region_name=region,
---
>                                  region_name=options["os_region_name"][0],


#Now instance creation is succesfull: 

(overcloud) [stack@undercloud-0 ~]$ openstack server create --flavor  m1.nano --image cirros-0.3.4-x86_64-disk --wait osvm

+-------------------------------------+-----------------------------------------------------------------+
| Field                               | Value                                                           |
+-------------------------------------+-----------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                          |
| OS-EXT-AZ:availability_zone         | nova                                                            |
| OS-EXT-SRV-ATTR:host                | overcloud-novacomputeiha-0.localdomain                          |
| OS-EXT-SRV-ATTR:hypervisor_hostname | overcloud-novacomputeiha-0.localdomain                          |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000002                                               |
| OS-EXT-STS:power_state              | Running                                                         |
| OS-EXT-STS:task_state               | None                                                            |
| OS-EXT-STS:vm_state                 | active                                                          |
...//...

#And nova_compute dockers on the computes are in healthy state:

(overcloud) [stack@undercloud-0 ~]$  ansible compute -mshell -b -a'docker ps |grep nova_compute'
 [WARNING]: Found both group and host with same name: undercloud

overcloud-novacomputeiha-1 | SUCCESS | rc=0 >>
37652a3d01f2        192.168.24.1:8787/rhosp14/openstack-nova-compute:2018-10-10.3                "kolla_start"       4 hours ago         Up 4 hours (healthy)                       nova_compute

overcloud-novacomputeiha-0 | SUCCESS | rc=0 >>
abebac17239f        192.168.24.1:8787/rhosp14/openstack-nova-compute:2018-10-10.3                "kolla_start"       4 hours ago         Up 4 hours (healthy)                       nova_compute

Comment 12 pkomarov 2018-11-13 13:19:34 UTC
Verified - comment 7

Comment 15 errata-xmlrpc 2019-01-11 11:54:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.