Bug 1622338

Summary: Fresh overcloud deployment fails when tls is enabled.
Product: Red Hat OpenStack Reporter: Alfredo Pizarro <apizarro>
Component: openstack-tripleo-heat-templatesAssignee: Juan Antonio Osorio <josorior>
Status: CLOSED ERRATA QA Contact: Pavan <pkesavar>
Severity: high Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: apizarro, ealcaniz, hrybacki, jagee, josorior, jschluet, lmarsh, mburns
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)Flags: lmarsh: needinfo-
lmarsh: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.7-2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-13 22:28:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Overcloud template files
none
overcloud event list output none

Description Alfredo Pizarro 2018-08-26 13:01:53 UTC
Created attachment 1478771 [details]
Overcloud template files

Description of problem:

I'm deploying a basic OSP13 overcloud with 1 controller and 1 node. When I deploy the overcloud without tls enabled it works fine, but when I enable tls the deployment fails:

overcloud.AllNodesDeploySteps.ControllerDeployment_Step4.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: cda24974-cd2d-4cc6-8792-81679a4ca4c8
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "stdout: 28116606bb48e20bb94517d9a276a35933b0d4db243f0253b8b00c210d6fa410", 
            "stdout: 993a89884fb133691d67b7d0de9f243cb367c754d8f2171bec6e7a027d24df06", 
            "stdout: 420e723717f4d34b25d5315248cff41ba6ef15470971486b5954d7c9bab57162"
        ]
    }
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/a72e0e73-6968-40ee-beac-bd92ccfba549_playbook.retry
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=6    changed=2    unreachable=0    failed=1   
    
    (truncated, view all with --long)
  deploy_stderr: |


In the controller node, I can check that the digital certificate for the public IP was correctly installed and CA certificates as well:

[root@control1 ~]# curl https://overcloud.testlab.local
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://overcloud.testlab.local/dashboard">here</a>.</p>
</body></html>

In the  controller and try to find related errors and it is always something like this:

Aug 26 03:24:46 control1 os-collect-config[2997]: quests/sessions.py\\\", line 518, in request\", \n        \"    resp = self.send(prep, **send_kwargs)\", \n        \"  File \\\"/usr/lib/python2.7/site-packages/requests/sessions.py\\\", line 639, in send\", \n        \"    r = adapter.send(request, **kwargs)\", \n        \"  File \\\"/usr/lib/python2.7/site-packages/requests/adapters.py\\\", line 488, in send\", \n        \"    raise ConnectionError(err, request=request)\", \n        \"ConnectionError: ('Connection aborted.', BadStatusLine(\\\"''\\\",))\",

This error  makes me think that a  python script is trying to connect using http instead https.

I have tried this implementation several times, recreating digital certificates, changing settings about ip or dns public certificates and the result and the error is always the same and in the same stage.

Here is my overcloud script:

#!/bin/bash
#-e /home/stack/templates/environments/inject-trust-anchor.yaml \

openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates \
-r /home/stack/templates/roles_data.yaml \
-e /home/stack/templates/environments/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/neutron-ovn-dvr-ha.yaml \
-e /home/stack/templates/environments/node-info.yaml \
-e /home/stack/templates/environments/overcloud_images.yaml \
-e /home/stack/templates/enable-tls.yaml \
-e /home/stack/templates/environments/inject-trust-anchor.yaml \
-e /home/stack/templates/inject-trust-anchor-hiera.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml \
-e /home/stack/templates/cloudname.yaml \
-e /home/stack/templates/environments/hostname-map.yaml \
-e /home/stack/templates/dashboard-ip.yaml

I'm attaching my overcloud template files to this ticket. 

Version-Release number of selected component (if applicable):
OSP 13: 
openstack-tripleo-heat-templates-8.0.2-43.el7ost.noarch


How reproducible:


Steps to Reproduce:
1.Install a basic OSP 13 overcloud (1 node / 1 controller) with enable-tls option enabled. 
2. Try the same overcloud settings without enable-tls options and it should work fine.
3.

Actual results:
It fails with this same error:
overcloud.AllNodesDeploySteps.ControllerDeployment_Step4.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: cda24974-cd2d-4cc6-8792-81679a4ca4c8
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "stdout: 28116606bb48e20bb94517d9a276a35933b0d4db243f0253b8b00c210d6fa410", 
            "stdout: 993a89884fb133691d67b7d0de9f243cb367c754d8f2171bec6e7a027d24df06", 
            "stdout: 420e723717f4d34b25d5315248cff41ba6ef15470971486b5954d7c9bab57162"
        ]
    }
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/a72e0e73-6968-40ee-beac-bd92ccfba549_playbook.retry
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=6    changed=2    unreachable=0    failed=1   
    
    (truncated, view all with --long)
  deploy_stderr: |


Expected results:
It should work 

Additional info:

Comment 1 Juan Antonio Osorio 2018-08-29 16:33:35 UTC
Can you pass the output of

openstack stack failures list --long overcloud

?

Comment 2 Alfredo Pizarro 2018-08-29 23:19:50 UTC
Created attachment 1479605 [details]
overcloud event list output

Attaching the required output.

Comment 3 Juan Antonio Osorio 2018-09-06 14:12:45 UTC
Alfredo, the gnocchi update issue is fixed by this commit https://review.openstack.org/#/c/594003/ is it included already? It merged two weeks ago.

Comment 4 Alfredo Pizarro 2018-09-06 14:51:07 UTC
Juan Antonio, I'm afraid it is not included in this setup. I will update the containers images and I will try again. I will let you know ASAP.

Comment 5 Alfredo Pizarro 2018-09-07 14:00:35 UTC
Hello Juan Antonio,  I updated all the containers images (including gnocchi) and the error persisted, the deployment failed with the same error, but I noticed that I was missing an update for openstack-tripleo-heat-templates in director.  I updated heat templates and now it's working:
Original version: openstack-tripleo-heat-templates-8.0.2-43.el7ost.noarch
Update: openstack-tripleo-heat-templates-8.0.4-20.el7ost.noarch

I deployed again with the updated version of openstack-tripleo-heat-templates using the same setup and it finished successfully.

Comment 16 errata-xmlrpc 2018-11-13 22:28:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3587