Bug 1393346

Summary: IPv6 deployment fails with Error: Could not prefetch keystone_endpoint provider 'openstack': Command: 'openstack ["endpoint", "list", "--quiet", "--format", "csv", []]' has been running for more than 40 seconds (tried 4, for a total of 170 seconds)
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: puppet-tripleoAssignee: Jiri Stransky <jstransk>
Status: CLOSED ERRATA QA Contact: Omri Hochman <ohochman>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: dbecker, jcoufal, jjoyce, jschluet, jslagle, mburns, morazi, ohochman, rhel-osp-director-maint, sasha, slinaber, tvignaud
Target Milestone: rcKeywords: AutomationBlocker, Regression, Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 16:31:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2016-11-09 11:31:34 UTC
Description of problem:
IPv6 deployments fail with the following error on overcloud.AllNodesDeploySteps.ControllerDeployment_Step3.0 step:

stdout: overcloud.AllNodesDeploySteps.ControllerDeployment_Step3.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: f92ea5bc-eaed-42b6-b07f-be1b67abcc78
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
  deploy_stdout: |
    ...
    Notice: /Firewall[998 log all]: Dependency Keystone_tenant[admin] has failures: true
    Notice: /Firewall[998 log all]: Dependency Keystone_role[admin] has failures: true
    Notice: /Firewall[998 log all]: Dependency Keystone_user[admin] has failures: true
    Notice: /Firewall[998 log all]: Dependency Keystone_service[keystone::identity] has failures: true
    Notice: /Firewall[999 drop all]: Dependency Keystone_tenant[service] has failures: true
    Notice: /Firewall[999 drop all]: Dependency Keystone_tenant[admin] has failures: true
    Notice: /Firewall[999 drop all]: Dependency Keystone_role[admin] has failures: true
    Notice: /Firewall[999 drop all]: Dependency Keystone_user[admin] has failures: true
    Notice: /Firewall[999 drop all]: Dependency Keystone_service[keystone::identity] has failures: true
    Notice: Finished catalog run in 1480.04 seconds
    (truncated, view all with --long)
  deploy_stderr: |
    ...
    Error: Command: 'openstack ["user", "create", "--format", "shell", ["admin", "--enable", "--password", "2s2qPyRKmh3banKG3WJzk32Kj", "--email", "admin", "--domain", "Default"]]' has been running for more than 170 seconds
    Error: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]/ensure: change from absent to present failed: Command: 'openstack ["user", "create", "--format", "shell", ["admin", "--enable", "--password", "2s2qPyRKmh3banKG3WJzk32Kj", "--email", "admin", "--domain", "Default"]]' has been running for more than 170 seconds
    Warning: /Stage[main]/Keystone::Roles::Admin/Keystone_user_role[admin@admin]: Skipping because of failed dependencies
    Error: Could not prefetch keystone_service provider 'openstack': Command: 'openstack ["service", "list", "--quiet", "--format", "csv", "--long"]' has been running for more than 40 seconds (tried 4, for a total of 170 seconds)
    Error: Not managing Keystone_service[keystone] due to earlier Keystone API failures.
    Error: /Stage[main]/Keystone::Endpoint/Keystone::Resource::Service_identity[keystone]/Keystone_service[keystone::identity]/ensure: change from absent to present failed: Not managing Keystone_service[keystone] due to earlier Keystone API failures.
    Error: Could not prefetch keystone_endpoint provider 'openstack': Command: 'openstack ["endpoint", "list", "--quiet", "--format", "csv", []]' has been running for more than 40 seconds (tried 4, for a total of 170 seconds)
    Warning: /Stage[main]/Keystone::Endpoint/Keystone::Resource::Service_identity[keystone]/Keystone_endpoint[regionOne/keystone::identity]: Skipping because of failed dependencies
    Warning: /Firewall[998 log all]: Skipping because of failed dependencies
    Warning: /Firewall[999 drop all]: Skipping because of failed dependencies
    (truncated, view all with --long)


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-5.0.0-1.4.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy IPv6 enabled overcloud

Actual results:
Deployment fails.

Expected results:
Deployment succeds.

Additional info:
I'm currently bringing up an environment for investigation. Will get back with the credentials.

Comment 1 Marius Cornea 2016-11-09 13:07:29 UTC
On the first controller we can see Keystone is unable to reach rabbitmq:

[root@overcloud-controller-0 nova]# tail -f /var/log/keystone/keystone.log 
2016-11-09 13:05:11.377 20114 ERROR oslo.messaging._drivers.impl_rabbit [req-3ac7ecec-f0b6-4cc6-b8b7-20fbbf11008d - - - - -] [a650a24c-23a9-438f-b2dc-15b38de1182e] AMQP server on fd00:fd00:fd00:2000::1a:5672:5672 is unreachable: [Errno 113] No route to host. Trying again in 24 seconds. Client port: None
2016-11-09 13:05:25.403 20110 ERROR oslo.messaging._drivers.impl_rabbit [req-500a0c0f-090c-48bd-8b1c-da87dbbdaf70 - - - - -] [3b5f29cf-5382-43f8-9fbc-012499f6068d] AMQP server on fd00:fd00:fd00:2000::13:5672:5672 is unreachable: [Errno 113] No route to host. Trying again in 1 seconds. Client port: None

Comment 2 Marius Cornea 2016-11-09 13:08:26 UTC
[root@overcloud-controller-0 keystone]# grep rabbit keystone.conf | grep -v ^#
rpc_backend = rabbit
[oslo_messaging_rabbit]
rabbit_hosts = fd00:fd00:fd00:2000::19:5672,fd00:fd00:fd00:2000::1a:5672,fd00:fd00:fd00:2000::13:5672
rabbit_use_ssl = False
rabbit_userid = guest
rabbit_password = XxE7RWRHCM2wbDnJe2TzrsnMQ
rabbit_ha_queues = True

Comment 3 James Slagle 2016-11-09 13:22:05 UTC
is this possibly fixed by https://review.openstack.org/#/c/395104/ ?

Comment 4 Marius Cornea 2016-11-09 13:28:35 UTC
(In reply to James Slagle from comment #3)
> is this possibly fixed by https://review.openstack.org/#/c/395104/ ?

Yes, the LP bug: https://bugs.launchpad.net/tripleo/+bug/1639881 shows the same issue I'm seeing on my environment.

Comment 5 Jiri Stransky 2016-11-09 14:00:12 UTC
Yes, we changed rabbit_hosts in keystone.conf from:

rabbit_hosts = fd00:fd00:fd00:2000::19:5672,fd00:fd00:fd00:2000::1a:5672,fd00:fd00:fd00:2000::13:5672

to

rabbit_hosts = [fd00:fd00:fd00:2000::19]:5672,[fd00:fd00:fd00:2000::1a]:5672,[fd00:fd00:fd00:2000::13]:5672

and restarted httpd, and the errors in keystone.log went away, so it's very likely that the linked patch fixes the issue.

Comment 6 Omri Hochman 2016-11-10 15:41:28 UTC
*** Bug 1393880 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2016-12-14 16:31:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html