openstack-nova: nova commands against overcloud get stuck and exiit with: Unknown Error (HTTP 504) Environment: openstack-nova-cert-15.0.0-0.20170129152957.f9d7b38.el7ost.noarch openstack-nova-compute-15.0.0-0.20170129152957.f9d7b38.el7ost.noarch python-nova-15.0.0-0.20170129152957.f9d7b38.el7ost.noarch instack-undercloud-6.0.0-0.20170130174946.5388cd1.el7ost.noarch openstack-nova-api-15.0.0-0.20170129152957.f9d7b38.el7ost.noarch openstack-nova-conductor-15.0.0-0.20170129152957.f9d7b38.el7ost.noarch openstack-nova-common-15.0.0-0.20170129152957.f9d7b38.el7ost.noarch puppet-nova-10.2.1-0.20170130234756.84cc5b0.el7ost.noarch openstack-nova-placement-api-15.0.0-0.20170129152957.f9d7b38.el7ost.noarch openstack-nova-scheduler-15.0.0-0.20170129152957.f9d7b38.el7ost.noarch python-novaclient-6.0.0-0.20170125131648.25117fa.el7ost.noarch Steps to reproduce: 1. Deploy overcloud with ironic services: openstack overcloud deploy --debug --templates --libvirt-type kvm -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml -e virt/ceph.yaml -e virt/hostnames.yml -e virt/network/network-environment.yaml -e ironic.yaml -e flat_networks.yaml -e vxlan_args_osp11 --log-file overcloud_deployment_48.log [stack@undercloud-0 ~]$ cat ironic.yaml parameter_defaults: NtpServer: ["clock.redhat.com","clock2.redhat.com"] ComputeCount: 2 ControllerCount: 3 CephStorageCount: 2 OvercloudControlFlavor: controller OvercloudComputeFlavor: compute OvercloudCephStorageFlavor: ceph IronicEnabledDrivers: - pxe_ssh NovaSchedulerDefaultFilters: - RetryFilter - AggregateInstanceExtraSpecsFilter - AvailabilityZoneFilter - RamFilter - DiskFilter - ComputeFilter - ComputeCapabilitiesFilter - ImagePropertiesFilter IronicCleaningDiskErase: metadata IronicIPXEEnabled: true ControllerExtraConfig: ironic::drivers::ssh::libvirt_uri: 'qemu:///system' [stack@undercloud-0 ~]$ cat flat_networks.yaml parameter_defaults: NeutronBridgeMappings: datacentre:br-ex,baremetal:br-baremetal NeutronFlatNetworks: datacentre,baremetal 2. Attempt to run "nova list" or "openstack server list" against overcloud. Result- gets stuck: [stack@undercloud-0 ~]$ nova list --debug DEBUG (extension:169) found extension EntryPoint.parse('v1password = swiftclient.authv1:PasswordLoader') DEBUG (extension:169) found extension EntryPoint.parse('gnocchi-basic = gnocchiclient.auth:GnocchiBasicLoader') DEBUG (extension:169) found extension EntryPoint.parse('gnocchi-noauth = gnocchiclient.auth:GnocchiNoAuthLoader') DEBUG (extension:169) found extension EntryPoint.parse('token_endpoint = openstackclient.api.auth_plugin:TokenEndpoint') DEBUG (extension:169) found extension EntryPoint.parse('v2token = keystoneauth1.loading._plugins.identity.v2:Token') DEBUG (extension:169) found extension EntryPoint.parse('v3oauth1 = keystoneauth1.extras.oauth1._loading:V3OAuth1') DEBUG (extension:169) found extension EntryPoint.parse('admin_token = keystoneauth1.loading._plugins.admin_token:AdminToken') DEBUG (extension:169) found extension EntryPoint.parse('v3oidcauthcode = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectAuthorizationCode') DEBUG (extension:169) found extension EntryPoint.parse('v2password = keystoneauth1.loading._plugins.identity.v2:Password') DEBUG (extension:169) found extension EntryPoint.parse('v3samlpassword = keystoneauth1.extras._saml2._loading:Saml2Password') DEBUG (extension:169) found extension EntryPoint.parse('v3password = keystoneauth1.loading._plugins.identity.v3:Password') DEBUG (extension:169) found extension EntryPoint.parse('v3oidcaccesstoken = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectAccessToken') DEBUG (extension:169) found extension EntryPoint.parse('v3oidcpassword = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectPassword') DEBUG (extension:169) found extension EntryPoint.parse('v3kerberos = keystoneauth1.extras.kerberos._loading:Kerberos') DEBUG (extension:169) found extension EntryPoint.parse('token = keystoneauth1.loading._plugins.identity.generic:Token') DEBUG (extension:169) found extension EntryPoint.parse('v3oidcclientcredentials = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectClientCredentials') DEBUG (extension:169) found extension EntryPoint.parse('v3tokenlessauth = keystoneauth1.loading._plugins.identity.v3:TokenlessAuth') DEBUG (extension:169) found extension EntryPoint.parse('v3token = keystoneauth1.loading._plugins.identity.v3:Token') DEBUG (extension:169) found extension EntryPoint.parse('v3totp = keystoneauth1.loading._plugins.identity.v3:TOTP') DEBUG (extension:169) found extension EntryPoint.parse('password = keystoneauth1.loading._plugins.identity.generic:Password') DEBUG (extension:169) found extension EntryPoint.parse('v3fedkerb = keystoneauth1.extras.kerberos._loading:MappedKerberos') DEBUG (extension:169) found extension EntryPoint.parse('aodh-noauth = aodhclient.noauth:AodhNoAuthLoader') DEBUG (session:347) REQ: curl -g -i -X GET http://10.0.0.105:5000/v2.0 -H "Accept: application/json" -H "User-Agent: nova keystoneauth1/2.18.0 python-requests/2.10.0 CPython/2.7.5" INFO (connectionpool:213) Starting new HTTP connection (1): 10.0.0.105 DEBUG (connectionpool:395) "GET /v2.0 HTTP/1.1" 200 227 DEBUG (session:395) RESP: [200] Date: Mon, 13 Feb 2017 22:19:57 GMT Server: Apache Vary: X-Auth-Token,Accept-Encoding x-openstack-request-id: req-5571d868-8568-4b72-b42e-2518a3e76a1d Content-Encoding: gzip Content-Length: 227 Content-Type: application/json RESP BODY: {"version": {"status": "deprecated", "updated": "2016-08-04T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links": [{"href": "http://10.0.0.105:5000/v2.0/", "rel": "self"}, {"href": "http://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}} DEBUG (session:640) GET call to None for http://10.0.0.105:5000/v2.0 used request id req-5571d868-8568-4b72-b42e-2518a3e76a1d DEBUG (v2:63) Making authentication request to http://10.0.0.105:5000/v2.0/tokens DEBUG (connectionpool:395) "POST /v2.0/tokens HTTP/1.1" 200 1196 REQ: curl -g -i -X GET http://10.0.0.105:8774/v2.1 -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}588c7a54ee9b44fc194b3c79f172c10b7ddbcc78" DEBUG (session:347) REQ: curl -g -i -X GET http://10.0.0.105:8774/v2.1 -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}588c7a54ee9b44fc194b3c79f172c10b7ddbcc78" INFO (connectionpool:213) Starting new HTTP connection (1): 10.0.0.105 Checking nova and httpd logs on all controllers I see this error: ==> /var/log/nova/nova-compute.log <== 2017-02-13 22:30:40.021 99539 ERROR nova.compute.manager [req-2892d6f2-0fbf-450c-a25a-5e0ff02d9ce3 - - - - -] No compute node record for host controller-2.localdomain If I restart httpd on all controllers, nova list starts working , but gets stuck after a very short period of time (like few seconds). Below is the output from when it was temporarily working: [stack@undercloud-0 ~]$ nova list +----+------+--------+------------+-------------+----------+ | ID | Name | Status | Task State | Power State | Networks | +----+------+--------+------------+-------------+----------+ +----+------+--------+------------+-------------+----------+ [stack@undercloud-0 ~]$ nova service-list +----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+ | 12 | nova-conductor | controller-1.localdomain | internal | enabled | up | 2017-02-13T22:35:42.000000 | - | | 18 | nova-scheduler | controller-1.localdomain | internal | enabled | up | 2017-02-13T22:35:49.000000 | - | | 21 | nova-consoleauth | controller-1.localdomain | internal | enabled | up | 2017-02-13T22:35:48.000000 | - | | 24 | nova-compute | controller-1.localdomain | nova | enabled | up | 2017-02-13T22:35:50.000000 | - | | 27 | nova-conductor | controller-2.localdomain | internal | enabled | up | 2017-02-13T22:35:48.000000 | - | | 36 | nova-scheduler | controller-2.localdomain | internal | enabled | up | 2017-02-13T22:35:42.000000 | - | | 39 | nova-consoleauth | controller-2.localdomain | internal | enabled | up | 2017-02-13T22:35:51.000000 | - | | 42 | nova-compute | controller-2.localdomain | nova | enabled | up | 2017-02-13T22:35:48.000000 | - | | 48 | nova-compute | compute-0.localdomain | nova | enabled | up | 2017-02-13T22:35:48.000000 | - | | 51 | nova-compute | compute-1.localdomain | nova | enabled | up | 2017-02-13T22:35:50.000000 | - | | 60 | nova-conductor | controller-0.localdomain | internal | enabled | up | 2017-02-13T22:35:50.000000 | - | | 69 | nova-scheduler | controller-0.localdomain | internal | enabled | up | 2017-02-13T22:35:45.000000 | - | | 72 | nova-consoleauth | controller-0.localdomain | internal | enabled | up | 2017-02-13T22:35:48.000000 | - | | 75 | nova-compute | controller-0.localdomain | nova | enabled | up | 2017-02-13T22:35:50.000000 | - | +----+------------------+--------------------------+----------+---------+-------+----------------------------+-----------------+
The issue reproduced even when the deployment was done without ironic in overcloud: openstack overcloud deploy --debug --templates --libvirt-type kvm \ -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e virt/ceph.yaml \ -e virt/hostnames.yml \ -e virt/network/network-environment.yaml \ -e flat_networks.yaml \ -e vxlan_args_osp11 [stack@undercloud-0 ~]$ cat flat_networks.yaml parameter_defaults: NeutronBridgeMappings: datacentre:br-ex,baremetal:br-baremetal NeutronFlatNetworks: datacentre,baremetal NtpServer: ["clock.redhat.com","clock2.redhat.com"] ComputeCount: 2 ControllerCount: 3 CephStorageCount: 2 OvercloudControlFlavor: controller OvercloudComputeFlavor: compute OvercloudCephStorageFlavor: ceph [stack@undercloud-0 ~]$ cat vxlan_args_osp11 parameter_defaults: NeutronNetworkType: 'vxlan' NeutronTunnelTypes: 'vxlan'
Blocks our Automation scripts.
Fix was merged upstream
It worked for me after applying this patch: https://review.openstack.org/#/c/430183 The patch is applied before deploying the overcloud.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245