openstack-nova: failed to launch instance on OC: "Host 'overcloud-compute-1.localdomain' is not mapped to any cell", "code": 400, "created": "2017-08-10T13:16:11Z"} Environment: openstack-tripleo-heat-templates-7.0.0-0.20170805163045.el7ost.noarch instack-undercloud-7.2.1-0.20170729010705.el7ost.noarch openstack-puppet-modules-10.0.0-0.20170315222135.0333c73.el7.1.noarch libvirt-daemon-driver-storage-rbd-3.2.0-14.el7.x86_64 openstack-nova-scheduler-16.0.0-0.20170805120344.5971dde.el7ost.noarch libvirt-daemon-driver-nwfilter-3.2.0-14.el7.x86_64 libvirt-daemon-driver-lxc-3.2.0-14.el7.x86_64 libvirt-daemon-driver-qemu-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-iscsi-3.2.0-14.el7.x86_64 python-nova-16.0.0-0.20170805120344.5971dde.el7ost.noarch openstack-nova-console-16.0.0-0.20170805120344.5971dde.el7ost.noarch libvirt-libs-3.2.0-14.el7.x86_64 libvirt-daemon-driver-interface-3.2.0-14.el7.x86_64 libvirt-client-3.2.0-14.el7.x86_64 libvirt-daemon-driver-nodedev-3.2.0-14.el7.x86_64 puppet-nova-11.3.0-0.20170805105252.30a205c.el7ost.noarch libvirt-daemon-driver-storage-core-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-logical-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-disk-3.2.0-14.el7.x86_64 openstack-nova-conductor-16.0.0-0.20170805120344.5971dde.el7ost.noarch openstack-nova-novncproxy-16.0.0-0.20170805120344.5971dde.el7ost.noarch libvirt-daemon-driver-network-3.2.0-14.el7.x86_64 libvirt-python-3.2.0-3.el7.x86_64 libvirt-daemon-config-nwfilter-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-gluster-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-3.2.0-14.el7.x86_64 openstack-nova-compute-16.0.0-0.20170805120344.5971dde.el7ost.noarch openstack-nova-migration-16.0.0-0.20170805120344.5971dde.el7ost.noarch openstack-nova-api-16.0.0-0.20170805120344.5971dde.el7ost.noarch libvirt-3.2.0-14.el7.x86_64 libvirt-daemon-driver-storage-scsi-3.2.0-14.el7.x86_64 libvirt-daemon-kvm-3.2.0-14.el7.x86_64 openstack-nova-common-16.0.0-0.20170805120344.5971dde.el7ost.noarch libvirt-daemon-3.2.0-14.el7.x86_64 libvirt-daemon-driver-secret-3.2.0-14.el7.x86_64 python-novaclient-9.1.0-0.20170804194758.0a53d19.el7ost.noarch libvirt-daemon-driver-storage-mpath-3.2.0-14.el7.x86_64 openstack-nova-placement-api-16.0.0-0.20170805120344.5971dde.el7ost.noarch libvirt-daemon-config-network-3.2.0-14.el7.x86_64 Steps to reproduce: 1. Deploy OC (used external ceph) 2. Try to launch an instance Result: (overcloud) [stack@undercloud-0 ~]$ nova list +--------------------------------------+--------+--------+------------+-------------+----------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------+--------+------------+-------------+----------+ | 59799ce9-c4b7-48ca-9419-7a81a9f6b549 | nisim1 | ERROR | - | NOSTATE | | +--------------------------------------+--------+--------+------------+-------------+----------+ Checking the fault: | fault | {"message": "Host 'overcloud-compute-1.localdomain' is not mapped to any cell", "code": 400, "created": "2017-08-10T13:16:11Z"} |
Created attachment 1311851 [details] mistral logs from UC.
(In reply to Alexander Chuzhoy from comment #1) > Created attachment 1311851 [details] > mistral logs from UC. Not relevant for the overcloud nodes BTW
Could you try to reproduce to get logs?
Reproduced on a deployment with external ceph (only). Collecting sosreports.
(In reply to Alexander Chuzhoy from comment #0) > Steps to reproduce: > 1. Deploy OC (used external ceph) > 2. Try to launch an instance Fun, the sosrepors don't contain the overcloud deploy log on the undercloud or any nova logs in the overcloud. Can you attach both the deploy log and /var/log/containers/nova*/nova-manage logs from each of the overcloud controllers? For a pure containers deployment we should see the discover hosts command issued as part of the following n-api setup : https://github.com/openstack/tripleo-heat-templates/blob/84ef6a5342b113a9807f2c5c9587178d4cbf02ef/docker/services/nova-api.yaml#L208
The only line in the deployment log is: (undercloud) [stack@undercloud-0 ~]$ cat overcloud_deployment_0.log 2017-08-21 11:34:46.817 8855 WARNING tripleoclient.plugin [ admin] Waiting for messages on queue '2719e72f-8b59-459b-9d80-7628eb756d92' with no timeout. Adding the nova logs from controllers.
Created attachment 1316790 [details] nova logs from controller2
Created attachment 1316791 [details] nova logs from controller1
Created attachment 1316792 [details] nova logs from controller0
/var/log/messages from the controllers is what I'm after to see the nova-manage command output
actually I don't see any output /var/log/messages. Dan, do we log the output from the nova_api_discover_hosts container?
Dan, I think this was broken by https://review.openstack.org/465551. The bootstrap_host command fails when I run it manually: [root@overcloud-controller-0 heat-admin]# docker run -e "TRIPLEO_CONFIG_HASH=5c47cd23408c0828a337c6ebb3302060" -e "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" -e "KOLLA_BASE_DISTRO=cens" -e "KOLLA_INSTALL_TYPE=binary" -e "KOLLA_INSTALL_METATYPE=rdo" -e "PS1=$(tput bold)($(printenv KOLLA_SERVICE_NAME))$(tput sgr0)[$(id -un)@$(hostname -s) $(pwd)]$ " -v /etc/puppet:/etc/puppet:ro -v /dev/log:/dev/log -v /var/lib/config-data/nova/etc/nova:/etc/nova:ro -v /var/log/containers/nova:/var/log/nova -v /etc/hosts:/etc/hosts:ro -v /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro -v /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro -v /etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro -v /etc/localtime:/etc/localtime:ro -v /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro -v /var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro -v /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro -ti 192.168.24.1:8787/tripleoupstream/centos-binary-nova-api:latest /usr/bin/bootstrap_host_exec nova_api su nova -s /bin/bash -c '/usr/bin/nova-manage cell_v2 discover_hosts' Failed to start Hiera: Errno::EACCES: Permission denied - /etc/puppet/hiera.yaml And when I remove the bootstrap_host command it then fails to su: [root@overcloud-controller-0 heat-admin]# docker run -e "TRIPLEO_CONFIG_HASH=5c47cd23408c0828a337c6ebb3302060" -e "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" -e "KOLLA_BASE_DISTRO=centos" -e "KOLLA_INSTALL_TYPE=binary" -e "KOLLA_INSTALL_METATYPE=rdo" -e "PS1=$(tput bold)($(printenv KOLLA_SERVICE_NAME))$(tput sgr0)[$(id -un)@$(hostname -s) $(pwd)]$ " -v /etc/puppet:/etc/puppet:ro -v /dev/log:/dev/log -v /var/lib/config-data/nova/etc/nova:/etc/nova:ro -v /var/log/containers/nova:/var/log/nova -v /etc/hosts:/etc/hosts:ro -v /etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro -v /etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro -v /etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro -v /etc/localtime:/etc/localtime:ro -v /etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro -v /var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro -v /etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro -ti 192.168.24.1:8787/tripleoupstream/centos-binary-nova-api:latest su nova -s /bin/bash -c '/usr/bin/nova-manage cell_v2 discover_hosts' Password: su: Authentication failure
Seems the instructions in https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/tips_tricks.html#debugging-container-failures are not correct in this case. When I query paunch and run the command I hit a quoting issue instead: [root@overcloud-controller-0 heat-admin]# paunch debug --file /var/lib/tripleo-config/hashed-docker-container-startup-config-step_5.json --container nova_api_discover_hosts --action print-cmd docker run --name nova_api_discover_hosts-567af21w --env=TRIPLEO_CONFIG_HASH=5c47cd23408c0828a337c6ebb3302060 --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro --volume=/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro --volume=/var/log/containers/nova:/var/log/nova 192.168.24.1:8787/tripleoupstream/centos-binary-nova-api:latest /usr/bin/bootstrap_host_exec nova_api su nova -s /bin/bash -c '/usr/bin/nova-manage cell_v2 discover_hosts' [root@overcloud-controller-0 heat-admin]# docker run --name nova_api_discover_hosts-567af21w --env=TRIPLEO_CONFIG_HASH=5c47cd23408c0828a337c6ebb3302060 --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro --volume=/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro --volume=/var/log/containers/nova:/var/log/nova 192.168.24.1:8787/tripleoupstream/centos-binary-nova-api:latest /usr/bin/bootstrap_host_exec nova_api su nova -s /bin/bash -c '/usr/bin/nova-manage cell_v2 discover_hosts' usage: nova-manage [-h] [--config-dir DIR] [--config-file PATH] [--debug] [--log-config-append PATH] [--log-date-format DATE_FORMAT] [--log-dir LOG_DIR] [--log-file PATH] [--nodebug] [--nopost-mortem] [--nouse-journal] [--nouse-syslog] [--nowatch-log-file] [--post-mortem] [--syslog-log-facility SYSLOG_LOG_FACILITY] [--use-journal] [--use-syslog] [--version] [--watch-log-file] [--remote_debug-host REMOTE_DEBUG_HOST] [--remote_debug-port REMOTE_DEBUG_PORT] {version,bash-completion,shell,logs,cell_v2,db,quota,agent,host,floating,api_db,project,account,network,cell} ... nova-manage: error: too few arguments Works when I escape the quotes: [root@overcloud-controller-0 heat-admin]# docker run --rm --name nova_api_discover_hosts-567af21w --env=TRIPLEO_CONFIG_HASH=5c47cd23408c0828a337c6ebb3302060 --net=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/var/lib/config-data/nova/etc/my.cnf.d/tripleo.cnf:/etc/my.cnf.d/tripleo.cnf:ro --volume=/var/lib/config-data/nova/etc/nova/:/etc/nova/:ro --volume=/var/log/containers/nova:/var/log/nova 192.168.24.1:8787/tripleoupstream/centos-binary-nova-api:latest /usr/bin/bootstrap_host_exec nova_api su nova -s /bin/bash -c \'/usr/bin/nova-manage cell_v2 discover_hosts\' [root@overcloud-controller-0 heat-admin]#
(In reply to Ollie Walsh from comment #14) > actually I don't see any output /var/log/messages. Dan, do we log the output > from the nova_api_discover_hosts container? I don't think we cleanup this container. I think you should still see it via 'docker ps -a' on the overcloud controller where it executes. If there is logging to stdout you should be able to obtain that via 'docker logs <container_name>'. If the logs go to a file then those would show up in /var/log/containers/nova I think.
We have nothing. I've proposed https://review.openstack.org/497955 to run with --verbose so we get something on stdout. If it's possible to reproduce can you try with this change?
(In reply to Ollie Walsh from comment #18) > We have nothing. I've proposed https://review.openstack.org/497955 to run > with --verbose so we get something on stdout. > > If it's possible to reproduce can you try with this change? I'll check this
Seems like the issue doesn't reproduce now. I was able to launch instance using external ceph.
Closing this as it can't be reproduced but adding some logging to help debug any issues in future.
This has been reproduced with verbose output: When cell_v2 discover_hosts ran it didn't pickup the compute: controller-0:/var/log/messages Oct 2 07:39:26 localhost journal: Found 2 cell mappings. Oct 2 07:39:26 localhost journal: Skipping cell0 since it does not contain hosts. Oct 2 07:39:26 localhost journal: Getting compute nodes from cell 'default': 9860ec3b-3012-4b37-b7b6-2c8a74ad38c6 Oct 2 07:39:26 localhost journal: Found 0 unmapped computes in cell: 9860ec3b-3012-4b37-b7b6-2c8a74ad38c6 It seems nova-compute came up before nova-conductor: compute-0:/var/log/containers/nova/nova-compute.log 2017-10-02 11:39:01.066 1 WARNING nova.conductor.api [req-0a8c71b9-2c03-4e0e-977b-175bdc737306 - - - - -] Timed out waiting for nova-conductor. Is it running? Or did this service start before nova-conductor? Reattempting establishment of nova-conductor connection...: MessagingTimeout: Timed out waiting for a reply to message ID e25a231083754fef8696f80d5e489866 2017-10-02 11:40:31.618 1 INFO nova.compute.resource_tracker [req-a46137f5-46ae-4058-8888-764290a5a4e3 - - - - -] Compute node record created for compute-0.localdomain:compute-0.localdomain with uuid: 69c218cf-5dcb-40cf-8ff2-6d6bb4eeaa19 controller-0:/var/log/containers/nova/nova-conductor.log 2017-10-02 11:38:37.896 35 DEBUG nova.servicegroup.drivers.db [req-8b4a6575-3ca2-428b-a4db-1755d067f595 - - - - -] DB_Driver: join new ServiceGroup member controller-0.localdomain to the conductor group, service = <Service: host=controller-0.localdomain, binary=nova-conductor, manager_class_name=nova.conductor.manager.ConductorManager> join /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:47 2017-10-02 12:07:39.456 34 DEBUG oslo_db.sqlalchemy.engines [req-ec8097c1-8f21-4bbb-8b8c-58135ad51e30 4b5f1470c25449f1b55e4e9aa7be0601 fa4e8f5a1440408aba0feee4be738fb6 - default default] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:285 2017-10-02 12:09:09.489 34 WARNING nova.scheduler.utils [req-ec8097c1-8f21-4bbb-8b8c-58135ad51e30 4b5f1470c25449f1b55e4e9aa7be0601 fa4e8f5a1440408aba0feee4be738fb6 - default default] Retrying select_destinations after a MessagingTimeout, attempt 1 of 2.: MessagingTimeout: Timed out waiting for a reply to message ID b67f96a057bc4940aee6ccd77ed41623 2017-10-02 12:09:11.006 34 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : b67f96a057bc4940aee6ccd77ed41623 Manually running cell_v2 discover_hosts after deployment picks up the computes: ()[root@controller-0 /]# su nova -s /bin/bash -c '/usr/bin/nova-manage cell_v2 discover_hosts --verbose' Found 2 cell mappings. Skipping cell0 since it does not contain hosts. Getting compute nodes from cell 'default': 9860ec3b-3012-4b37-b7b6-2c8a74ad38c6 Found 1 unmapped computes in cell: 9860ec3b-3012-4b37-b7b6-2c8a74ad38c6 Checking host mapping for compute host 'compute-0.localdomain': 69c218cf-5dcb-40cf-8ff2-6d6bb4eeaa19 Creating host mapping for compute host 'compute-0.localdomain': 69c218cf-5dcb-40cf-8ff2-6d6bb4eeaa19
https://review.openstack.org/518546 has merged
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462