Bug 1480326

Summary: openstack-nova: failed to launch instance on OC: "Host 'overcloud-compute-1.localdomain' is not mapped to any cell", "code": 400, "created": "2017-08-10T13:16:11Z"}
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: openstack-tripleo-heat-templatesAssignee: Ollie Walsh <owalsh>
Status: CLOSED ERRATA QA Contact: awaugama
Severity: high Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: ahrechan, awaugama, berrange, dasmith, dprince, eglynn, itbrown, jschluet, kchamart, lyarwood, mburns, owalsh, rhel-osp-director-maint, sasha, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone: rcKeywords: AutomationBlocker, Reopened, Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.3-6.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
N/A
Last Closed: 2017-12-13 21:52:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
mistral logs from UC.
none
nova logs from controller2
none
nova logs from controller1
none
nova logs from controller0 none

Description Alexander Chuzhoy 2017-08-10 17:27:07 UTC
openstack-nova: failed to launch instance on OC: "Host 'overcloud-compute-1.localdomain' is not mapped to any cell", "code": 400, "created": "2017-08-10T13:16:11Z"}

Environment:
openstack-tripleo-heat-templates-7.0.0-0.20170805163045.el7ost.noarch
instack-undercloud-7.2.1-0.20170729010705.el7ost.noarch
openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch

libvirt-daemon-driver-storage-rbd-3.2.0-14.el7.x86_64
openstack-nova-scheduler-16.0.0-0.20170805120344.5971dde.el7ost.noarch
libvirt-daemon-driver-nwfilter-3.2.0-14.el7.x86_64
libvirt-daemon-driver-lxc-3.2.0-14.el7.x86_64
libvirt-daemon-driver-qemu-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-iscsi-3.2.0-14.el7.x86_64
python-nova-16.0.0-0.20170805120344.5971dde.el7ost.noarch
openstack-nova-console-16.0.0-0.20170805120344.5971dde.el7ost.noarch
libvirt-libs-3.2.0-14.el7.x86_64
libvirt-daemon-driver-interface-3.2.0-14.el7.x86_64
libvirt-client-3.2.0-14.el7.x86_64
libvirt-daemon-driver-nodedev-3.2.0-14.el7.x86_64
puppet-nova-11.3.0-0.20170805105252.30a205c.el7ost.noarch
libvirt-daemon-driver-storage-core-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-logical-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-disk-3.2.0-14.el7.x86_64
openstack-nova-conductor-16.0.0-0.20170805120344.5971dde.el7ost.noarch
openstack-nova-novncproxy-16.0.0-0.20170805120344.5971dde.el7ost.noarch
libvirt-daemon-driver-network-3.2.0-14.el7.x86_64
libvirt-python-3.2.0-3.el7.x86_64
libvirt-daemon-config-nwfilter-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-3.2.0-14.el7.x86_64
openstack-nova-compute-16.0.0-0.20170805120344.5971dde.el7ost.noarch
openstack-nova-migration-16.0.0-0.20170805120344.5971dde.el7ost.noarch
openstack-nova-api-16.0.0-0.20170805120344.5971dde.el7ost.noarch
libvirt-3.2.0-14.el7.x86_64
libvirt-daemon-driver-storage-scsi-3.2.0-14.el7.x86_64
libvirt-daemon-kvm-3.2.0-14.el7.x86_64
openstack-nova-common-16.0.0-0.20170805120344.5971dde.el7ost.noarch
libvirt-daemon-3.2.0-14.el7.x86_64
libvirt-daemon-driver-secret-3.2.0-14.el7.x86_64
python-novaclient-9.1.0-0.20170804194758.0a53d19.el7ost.noarch
libvirt-daemon-driver-storage-mpath-3.2.0-14.el7.x86_64
openstack-nova-placement-api-16.0.0-0.20170805120344.5971dde.el7ost.noarch
libvirt-daemon-config-network-3.2.0-14.el7.x86_64


Steps to reproduce:
1. Deploy OC (used external ceph)
2. Try to launch an instance

Result:
(overcloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+--------+--------+------------+-------------+----------+
| ID                                   | Name   | Status | Task State | Power State | Networks |
+--------------------------------------+--------+--------+------------+-------------+----------+
| 59799ce9-c4b7-48ca-9419-7a81a9f6b549 | nisim1 | ERROR  | -          | NOSTATE     |          |
+--------------------------------------+--------+--------+------------+-------------+----------+



Checking the fault:

| fault                                | {"message": "Host 'overcloud-compute-1.localdomain' is not mapped to any cell", "code": 400, "created": "2017-08-10T13:16:11Z"} |

Comment 1 Alexander Chuzhoy 2017-08-10 17:27:40 UTC
Created attachment 1311851 [details]
mistral logs from UC.

Comment 3 Ollie Walsh 2017-08-17 16:40:18 UTC
(In reply to Alexander Chuzhoy from comment #1)
> Created attachment 1311851 [details]
> mistral logs from UC.

Not relevant for the overcloud nodes BTW

Comment 5 Ollie Walsh 2017-08-18 13:33:46 UTC
Could you try to reproduce to get logs?

Comment 6 Alexander Chuzhoy 2017-08-21 18:36:57 UTC
Reproduced on a deployment with external ceph (only).
Collecting sosreports.

Comment 8 Lee Yarwood 2017-08-22 09:15:35 UTC
(In reply to Alexander Chuzhoy from comment #0)
> Steps to reproduce:
> 1. Deploy OC (used external ceph)
> 2. Try to launch an instance

Fun, the sosrepors don't contain the overcloud deploy log on the undercloud or any nova logs in the overcloud. Can you attach both the deploy log and /var/log/containers/nova*/nova-manage logs from each of the overcloud controllers?

For a pure containers deployment we should see the discover hosts command issued as part of the following n-api setup :

https://github.com/openstack/tripleo-heat-templates/blob/84ef6a5342b113a9807f2c5c9587178d4cbf02ef/docker/services/nova-api.yaml#L208

Comment 9 Alexander Chuzhoy 2017-08-22 20:17:02 UTC
The only line in the deployment log is:
(undercloud) [stack@undercloud-0 ~]$ cat overcloud_deployment_0.log
2017-08-21 11:34:46.817 8855 WARNING tripleoclient.plugin [  admin] Waiting for messages on queue '2719e72f-8b59-459b-9d80-7628eb756d92' with no timeout.

Adding the nova logs from controllers.

Comment 10 Alexander Chuzhoy 2017-08-22 20:17:49 UTC
Created attachment 1316790 [details]
nova logs from controller2

Comment 11 Alexander Chuzhoy 2017-08-22 20:18:42 UTC
Created attachment 1316791 [details]
nova logs from controller1

Comment 12 Alexander Chuzhoy 2017-08-22 20:19:40 UTC
Created attachment 1316792 [details]
nova logs from controller0

Comment 13 Ollie Walsh 2017-08-22 20:40:52 UTC
/var/log/messages from the controllers is what I'm after to see the nova-manage command output

Comment 14 Ollie Walsh 2017-08-22 20:48:19 UTC
actually I don't see any output /var/log/messages. Dan, do we log the output from the nova_api_discover_hosts container?

Comment 15 Ollie Walsh 2017-08-25 13:18:20 UTC
Dan, I think this was broken by https://review.openstack.org/465551. The bootstrap_host command fails when I run it manually:

[root@overcloud-controller-0 heat-admin]# docker run -e "TRIPLEO_CONFIG_HASH=5c47cd23408c0828a337c6ebb3302060"  -e "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"  -e "KOLLA_BASE_DISTRO=cens"  -e "KOLLA_INSTALL_TYPE=binary"  -e "KOLLA_INSTALL_METATYPE=rdo"  -e "PS1=$(tput bold)($(printenv KOLLA_SERVICE_NAME))$(tput sgr0)[$(id -un)@$(hostname -s) $(pwd)]$ "   -v /etc/puppet:/etc/puppet:ro -v /dev/log:/dev/log -v /var/lib/config-data/nova/etc/nova:/etc/nova:ro -v /var/log/containers/nova:/var/log/nova -v /etc/hosts:/etc/hosts:ro -v /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro -v /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro -v /etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro -v /etc/localtime:/etc/localtime:ro -v /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro -v /var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro -v /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro -ti 192.168.24.1:8787/tripleoupstream/centos-binary-nova-api:latest /usr/bin/bootstrap_host_exec nova_api su nova -s /bin/bash -c '/usr/bin/nova-manage cell_v2 discover_hosts'
Failed to start Hiera: Errno::EACCES: Permission denied - /etc/puppet/hiera.yaml

And when I remove the bootstrap_host command it then fails to su:

[root@overcloud-controller-0 heat-admin]# docker run -e "TRIPLEO_CONFIG_HASH=5c47cd23408c0828a337c6ebb3302060"  -e "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"  -e "KOLLA_BASE_DISTRO=centos"  -e "KOLLA_INSTALL_TYPE=binary"  -e "KOLLA_INSTALL_METATYPE=rdo"  -e "PS1=$(tput bold)($(printenv KOLLA_SERVICE_NAME))$(tput sgr0)[$(id -un)@$(hostname -s) $(pwd)]$ "   -v /etc/puppet:/etc/puppet:ro -v /dev/log:/dev/log -v /var/lib/config-data/nova/etc/nova:/etc/nova:ro -v /var/log/containers/nova:/var/log/nova -v /etc/hosts:/etc/hosts:ro -v /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro -v /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro -v /etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro -v /etc/localtime:/etc/localtime:ro -v /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro -v /var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro -v /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro -ti 192.168.24.1:8787/tripleoupstream/centos-binary-nova-api:latest su nova -s /bin/bash -c '/usr/bin/nova-manage cell_v2 discover_hosts'
Password: 
su: Authentication failure

Comment 16 Ollie Walsh 2017-08-25 13:52:34 UTC
Seems the instructions in https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/tips_tricks.html#debugging-container-failures are not correct in this case. When I query paunch and run the command I hit a quoting issue instead:

[root@overcloud-controller-0 heat-admin]# paunch debug --file /var/lib/tripleo-config/hashed-docker-container-startup-config-step_5.json --container nova_api_discover_hosts --action print-cmd
docker run --name nova_api_discover_hosts-567af21w --env=TRIPLEO_CONFIG_HASH=5c47cd23408c0828a337c6ebb3302060 --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro --volume=/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro --volume=/var/log/containers/nova:/var/log/nova 192.168.24.1:8787/tripleoupstream/centos-binary-nova-api:latest /usr/bin/bootstrap_host_exec nova_api su nova -s /bin/bash -c '/usr/bin/nova-manage cell_v2 discover_hosts'
[root@overcloud-controller-0 heat-admin]# docker run --name nova_api_discover_hosts-567af21w --env=TRIPLEO_CONFIG_HASH=5c47cd23408c0828a337c6ebb3302060 --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro --volume=/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro --volume=/var/log/containers/nova:/var/log/nova 192.168.24.1:8787/tripleoupstream/centos-binary-nova-api:latest /usr/bin/bootstrap_host_exec nova_api su nova -s /bin/bash -c '/usr/bin/nova-manage cell_v2 discover_hosts'
usage: nova-manage [-h] [--config-dir DIR] [--config-file PATH] [--debug]
                   [--log-config-append PATH] [--log-date-format DATE_FORMAT]
                   [--log-dir LOG_DIR] [--log-file PATH] [--nodebug]
                   [--nopost-mortem] [--nouse-journal] [--nouse-syslog]
                   [--nowatch-log-file] [--post-mortem]
                   [--syslog-log-facility SYSLOG_LOG_FACILITY] [--use-journal]
                   [--use-syslog] [--version] [--watch-log-file]
                   [--remote_debug-host REMOTE_DEBUG_HOST]
                   [--remote_debug-port REMOTE_DEBUG_PORT]
                   
                   {version,bash-completion,shell,logs,cell_v2,db,quota,agent,host,floating,api_db,project,account,network,cell}
                   ...
nova-manage: error: too few arguments

Works when I escape the quotes:
[root@overcloud-controller-0 heat-admin]# docker run --rm --name nova_api_discover_hosts-567af21w --env=TRIPLEO_CONFIG_HASH=5c47cd23408c0828a337c6ebb3302060 --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro --volume=/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro --volume=/var/log/containers/nova:/var/log/nova 192.168.24.1:8787/tripleoupstream/centos-binary-nova-api:latest /usr/bin/bootstrap_host_exec nova_api su nova -s /bin/bash -c \'/usr/bin/nova-manage cell_v2 discover_hosts\'
[root@overcloud-controller-0 heat-admin]#

Comment 17 Dan Prince 2017-08-25 14:36:53 UTC
(In reply to Ollie Walsh from comment #14)
> actually I don't see any output /var/log/messages. Dan, do we log the output
> from the nova_api_discover_hosts container?

I don't think we cleanup this container. I think you should still see it via 'docker ps -a' on the overcloud controller where it executes. If there is logging to stdout you should be able to obtain that via 'docker logs <container_name>'.

If the logs go to a file then those would show up in /var/log/containers/nova I think.

Comment 18 Ollie Walsh 2017-08-25 15:17:51 UTC
We have nothing. I've proposed https://review.openstack.org/497955 to run with --verbose so we get something on stdout.

If it's possible to reproduce can you try with this change?

Comment 19 Artem Hrechanychenko 2017-09-06 13:47:12 UTC
(In reply to Ollie Walsh from comment #18)
> We have nothing. I've proposed https://review.openstack.org/497955 to run
> with --verbose so we get something on stdout.
> 
> If it's possible to reproduce can you try with this change?

I'll check this

Comment 20 Alexander Chuzhoy 2017-09-07 00:32:07 UTC
Seems like the issue doesn't reproduce now.
I was able to launch instance using external ceph.

Comment 21 Ollie Walsh 2017-09-08 09:45:42 UTC
Closing this as it can't be reproduced but adding some logging to help debug any issues in future.

Comment 22 Ollie Walsh 2017-10-02 15:28:30 UTC
This has been reproduced with verbose output:

When cell_v2 discover_hosts ran it didn't pickup the compute:

controller-0:/var/log/messages
    Oct  2 07:39:26 localhost journal: Found 2 cell mappings.
    Oct  2 07:39:26 localhost journal: Skipping cell0 since it does not contain hosts.
    Oct  2 07:39:26 localhost journal: Getting compute nodes from cell 'default': 9860ec3b-3012-4b37-b7b6-2c8a74ad38c6
    Oct  2 07:39:26 localhost journal: Found 0 unmapped computes in cell: 9860ec3b-3012-4b37-b7b6-2c8a74ad38c6

It seems nova-compute came up before nova-conductor:

compute-0:/var/log/containers/nova/nova-compute.log
    2017-10-02 11:39:01.066 1 WARNING nova.conductor.api [req-0a8c71b9-2c03-4e0e-977b-175bdc737306 - - - - -] Timed out waiting for nova-conductor.  Is it running? Or did this service start before nova-conductor?  Reattempting establishment of nova-conductor connection...: MessagingTimeout: Timed out waiting for a reply to message ID e25a231083754fef8696f80d5e489866
    2017-10-02 11:40:31.618 1 INFO nova.compute.resource_tracker [req-a46137f5-46ae-4058-8888-764290a5a4e3 - - - - -] Compute node record created for compute-0.localdomain:compute-0.localdomain with uuid: 69c218cf-5dcb-40cf-8ff2-6d6bb4eeaa19

controller-0:/var/log/containers/nova/nova-conductor.log
    2017-10-02 11:38:37.896 35 DEBUG nova.servicegroup.drivers.db [req-8b4a6575-3ca2-428b-a4db-1755d067f595 - - - - -] DB_Driver: join new ServiceGroup member controller-0.localdomain to the conductor group, service = <Service: host=controller-0.localdomain, binary=nova-conductor, manager_class_name=nova.conductor.manager.ConductorManager> join /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:47
    2017-10-02 12:07:39.456 34 DEBUG oslo_db.sqlalchemy.engines [req-ec8097c1-8f21-4bbb-8b8c-58135ad51e30 4b5f1470c25449f1b55e4e9aa7be0601 fa4e8f5a1440408aba0feee4be738fb6 - default default] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:285
    2017-10-02 12:09:09.489 34 WARNING nova.scheduler.utils [req-ec8097c1-8f21-4bbb-8b8c-58135ad51e30 4b5f1470c25449f1b55e4e9aa7be0601 fa4e8f5a1440408aba0feee4be738fb6 - default default] Retrying select_destinations after a MessagingTimeout, attempt 1 of 2.: MessagingTimeout: Timed out waiting for a reply to message ID b67f96a057bc4940aee6ccd77ed41623
    2017-10-02 12:09:11.006 34 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : b67f96a057bc4940aee6ccd77ed41623


Manually running cell_v2 discover_hosts after deployment picks up the computes:
    ()[root@controller-0 /]# su nova -s /bin/bash -c '/usr/bin/nova-manage cell_v2 discover_hosts --verbose'
    Found 2 cell mappings.
    Skipping cell0 since it does not contain hosts.
    Getting compute nodes from cell 'default': 9860ec3b-3012-4b37-b7b6-2c8a74ad38c6
    Found 1 unmapped computes in cell: 9860ec3b-3012-4b37-b7b6-2c8a74ad38c6
    Checking host mapping for compute host 'compute-0.localdomain': 69c218cf-5dcb-40cf-8ff2-6d6bb4eeaa19
    Creating host mapping for compute host 'compute-0.localdomain': 69c218cf-5dcb-40cf-8ff2-6d6bb4eeaa19

Comment 23 Ollie Walsh 2017-11-11 12:57:03 UTC
https://review.openstack.org/518546 has merged

Comment 29 errata-xmlrpc 2017-12-13 21:52:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462