Bug 1564654

Summary: OSP13: Overcloud deployment fails when using capital letters in customized stack name ( --stack TEST-STACK34 ).
Product: Red Hat OpenStack Reporter: Omri Hochman <ohochman>
Component: puppet-tripleoAssignee: RHOS Maint <rhos-maint>
Status: CLOSED ERRATA QA Contact: nlevinki <nlevinki>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: agurenko, aschultz, bdobreli, chjones, dciabrin, jcoufal, jjoyce, joflynn, jschluet, mburns, mcornea, mflusche, michele, mkrcmari, rhos-maint, rscarazz, sathlang, slinaber, tvignaud
Target Milestone: z2Keywords: Regression, ReleaseNotes, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: puppet-tripleo-8.3.4-2.el7ost openstack-tripleo-common-8.6.3-2.el7ost Doc Type: Bug Fix
Doc Text:
If you used uppercase letters in the stack name, the deployment failed. Fixes have been introduced to ensure that a stack name with upper case letters leads to a successful deployment. Specifically, the bootstrap_host scripts inside the containers now convert strings to lowercase correctly and the same happens for pacemaker properties.
Story Points: ---
Clone Of:
: 1585189 (view as bug list) Environment:
Last Closed: 2018-08-29 16:35:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1576148, 1585189    

Description Omri Hochman 2018-04-06 19:29:15 UTC
OSP13: Overcloud deployment fails when using capital letters in customized stack name ( --stack TEST-STACK34 ). 


Steps:
-------
- Attempt to deploy overcloud with customized stack name that contains capital letters. 


Example :
---------
[stack@undercloud75 ~]$ cat overcloud_deploy.sh
#!/bin/bash
openstack overcloud deploy \
--stack TEST-STACK34 \
--templates /usr/share/openstack-tripleo-heat-templates \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /home/stack/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /home/stack/ceph.yaml \
-e /home/stack/tripleo-overcloud-passwords.yaml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/docker-images.yaml \
-e /home/stack/dns/dns.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-tls.yaml \
-e /home/stack/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/inject-trust-anchor-hiera.yaml \
--log-file overcloud_deployment_62.log

Results ( Overcloud failed to deploy) : 
----------------------------------------
2018-02-23 20:21:55Z [TEST-STACK34.AllNodesDeploySteps.ControllerDeployment_Step2.0]: SIGNAL_IN_PROGRESS  Signal: deployment de93099f-56e4-4e1d-90ba-33c5149e1c61 failed (2)
2018-02-23 20:21:56Z [TEST-STACK34.AllNodesDeploySteps.ControllerDeployment_Step2.0]: CREATE_FAILED  Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
2018-02-23 20:21:56Z [TEST-STACK34.AllNodesDeploySteps.ControllerDeployment_Step2]: CREATE_FAILED  Resource CREATE failed: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
2018-02-23 20:21:57Z [TEST-STACK34.AllNodesDeploySteps.ControllerDeployment_Step2]: CREATE_FAILED  Error: resources.ControllerDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2018-02-23 20:21:57Z [TEST-STACK34.AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: Error: resources.ControllerDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2018-02-23 20:21:58Z [TEST-STACK34.AllNodesDeploySteps]: CREATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2018-02-23 20:21:58Z [TEST-STACK34]: CREATE_FAILED  Resource CREATE failed: Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2

 Stack TEST-STACK34 CREATE_FAILED

TEST-STACK34.AllNodesDeploySteps.ControllerDeployment_Step2.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: de93099f-56e4-4e1d-90ba-33c5149e1c61
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "Warning: /Stage[main]/Tripleo::Profile::Pacemaker::Haproxy_bundle/Tripleo::Pacemaker::Haproxy_with_vip[haproxy_and_storage_vip]/Pacemaker::Constraint::Colocation[storage_vip-with-haproxy]/Pcmk_constraint[colo-ip-10.19.105.18-haproxy-bundle]: Skipping because of failed dependencies",
            "Warning: /Stage[main]/Tripleo::Profile::Pacemaker::Haproxy_bundle/Tripleo::Pacemaker::Haproxy_with_vip[haproxy_and_storage_mgmt_vip]/Pacemaker::Constraint::Order[storage_mgmt_vip-then-haproxy]/Pcmk_constraint[order-ip-192.168.200.11-haproxy-bundle]: Skipping because of failed dependencies",
            "Warning: /Stage[main]/Tripleo::Profile::Pacemaker::Haproxy_bundle/Tripleo::Pacemaker::Haproxy_with_vip[haproxy_and_storage_mgmt_vip]/Pacemaker::Constraint::Colocation[storage_mgmt_vip-with-haproxy]/Pcmk_constraint[colo-ip-192.168.200.11-haproxy-bundle]: Skipping because of failed dependencies"
        ]
    }
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/ea964ea2-5e48-4244-9d7e-76aac6acadec_playbook.retry

    PLAY RECAP *********************************************************************
    localhost                  : ok=6    changed=2    unreachable=0    failed=1

    (truncated, view all with --long)
  deploy_stderr: |

Heat Stack create failed.
Heat Stack create failed.




nova list: 
-----------
+--------------------------------------+----------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                       | Status | Task State | Power State | Networks              |
+--------------------------------------+----------------------------+--------+------------+-------------+-----------------------+
| 88a9844d-5640-40b6-ba56-bee597b71b81 | TEST-STACK34-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.9  |
| dc776893-f836-4f4d-a52f-18d5835a415d | TEST-STACK34-cephstorage-1 | ACTIVE | -          | Running     | ctlplane=192.168.0.13 |
| 84f23ee9-8554-4102-80a9-f9d683aa479c | TEST-STACK34-cephstorage-2 | ACTIVE | -          | Running     | ctlplane=192.168.0.8  |
| 404c77cd-5230-4021-b2e1-409bdcbbaa28 | TEST-STACK34-controller-0  | ACTIVE | -          | Running     | ctlplane=192.168.0.7  |
| 29096e5c-50e7-4cab-ba52-78a75e92206f | TEST-STACK34-controller-1  | ACTIVE | -          | Running     | ctlplane=192.168.0.17 |
| 68649a72-6fde-4c06-a41b-f44e00105a0b | TEST-STACK34-controller-2  | ACTIVE | -          | Running     | ctlplane=192.168.0.11 |
| adc4e0ae-fffb-4083-9a7f-33984192a067 | TEST-STACK34-novacompute-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.16 |
+--------------------------------------+----------------------------+--------+------------+-------------+-----------------------+


(undercloud) [stack@undercloud75 ~]$ ssh heat-admin.0.7
[heat-admin@test-stack34-controller-0 ~]$ 

**** Note: when SSH the name of the node is lower case letter^  ***

[heat-admin@test-stack34-controller-0 ~]$ sudo pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: test-stack34-controller-2 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Fri Apr  6 19:25:22 2018
Last change: Thu Apr  5 19:31:04 2018 by root via cibadmin on test-stack34-controller-0

3 nodes configured
6 resources configured

Online: [ test-stack34-controller-0 test-stack34-controller-1 test-stack34-controller-2 ]

Full list of resources:

 ip-192.168.0.15	(ocf::heartbeat:IPaddr2):	Stopped
 ip-10.19.184.160	(ocf::heartbeat:IPaddr2):	Stopped
 ip-10.19.104.19	(ocf::heartbeat:IPaddr2):	Stopped
 ip-10.19.104.17	(ocf::heartbeat:IPaddr2):	Stopped
 ip-10.19.105.18	(ocf::heartbeat:IPaddr2):	Stopped
 ip-192.168.200.11	(ocf::heartbeat:IPaddr2):	Stopped

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[heat-admin@test-stack34-controller-0 ~]$

Comment 1 Omri Hochman 2018-04-06 19:31:44 UTC
(Thanks Marius) 

Debug: try 15/20: /usr/sbin/pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20180405-8-1sqw3dc property set --node TEST-STACK34-controller-1 redis-role=true
Debug: Error: Error: unable to set attribute redis-role
Could not map name=TEST-STACK34-controller-1 to a UUID
while the name in the cluster is test-stack34-controller-1

the stack name has capital letters vs lower case how the node is registered in the cluster


------------------------------------------------------------------------

A possible solution would be to block on the CLI level the option to name stack with capital letters.

Comment 2 Omri Hochman 2018-04-06 20:17:03 UTC
Your sosreport has been generated and saved in:
  /var/tmp/sosreport-undercloud75-20180224155142.tar.xz

The checksum is: fbf9f2ce310d1020e3bea43bd64203d5

Please send this file to your support representative.

Copying the results to the publicly available URL
The reports should be available here: http://rhos-release.virt.bos.redhat.com/log/bz1564654

Comment 4 Bogdan Dobrelya 2018-04-10 14:17:12 UTC
Node names in tripleo puppet are mostly casted via downcase(). This should be a puppet configuration issue for the corosync cluster/pacemaker bundles probably.

Comment 5 Omri Hochman 2018-05-10 14:25:45 UTC
This Bz got potential to cause Major-Upgrade to OSP13 to fail in-case there was use of capital letters on the stack name of the previous version.

Comment 6 Alex Schultz 2018-05-10 14:29:58 UTC
*** Bug 1575517 has been marked as a duplicate of this bug. ***

Comment 7 Alex Schultz 2018-05-10 14:31:43 UTC
Dropping the DF because it's not related to the the framework. It's failing in the pacemaker configuration.

Comment 8 Alex Schultz 2018-05-24 12:02:51 UTC
*** Bug 1575752 has been marked as a duplicate of this bug. ***

Comment 9 Damien Ciabrini 2018-05-24 16:16:38 UTC
Quick update on the failure:

Unlike previous version of OSP, in OSP13 the pcs command that is being invoked to set up cluster node properties is being given a hostname with capitals.

We deployed a OSP10 with Raoul earlier today and it succeeded with capital letters in the stack name.

The generated hiera keys both have capital in their name in OSP10 and OSP13, but
one notable difference between our OSP10 and OSP13 deploys is that the hostname's FQDN now includes capitals in OSP13:

# hostname
uppercaseovercloud-controller-0
# hostname -f
UPPERCASEOverCloud-controller-0.localdomain

whereas it was all lowercase in OSP10

Alex, is that expected? If so, would that change cause issue with upgrades?

Anyway we'll downcase the name we use to invoke the pcs command, and see whether that fixes the deployment or if other issues arise.

Comment 10 Alex Schultz 2018-05-24 16:24:42 UTC
This is likely an issue with our pacemaker implementation with docker.  In <OSP12 since we used puppet, $::hostname was always lowercased as facter was switching it out. This is why we had to use downcase for bootstrap node checks in puppet-tripleo.  If we're relying on fqdn for anything, it will need to continue to be lower cased to match the previous implementations.

Comment 11 Michele Baldessari 2018-05-24 23:52:53 UTC
Ok so there are a bunch of aspects to this:
1) The pcs property stuff which we are fixing via https://review.openstack.org/570413. I tested that part specifically and it works.

2) The other aspect is that in tripleo-common we have a bunch of bootstrap_* scripts which also break this assumption:
HOSTNAME=$(/bin/hostname -s)
SERVICE_NODEID=$(/bin/hiera -c /etc/puppet/hiera.yaml "${SERVICE_NAME}_short_bootstrap_node_name")
if [[ "$HOSTNAME" == "$SERVICE_NODEID" ]]; then
  eval $*
else
  echo "Skipping execution since this is not the bootstrap node for this service."
fi

So all bootstrap task will fail to run because we never match the hostname == service_nodeid

We will need another patch fixing tripleo-common as well.

Comment 12 Michele Baldessari 2018-05-25 00:09:23 UTC
3) Also I'll add that this does not work since OSP12 and is not really OSP13 specific.

Comment 14 Alex Schultz 2018-05-25 14:04:16 UTC
So it's likely that the the tripleo-common scripts are fine because hostname -s will return the capital letters. 

$ hostname -s
UNDERCLOUD

This primarily affects anything puppet related where $::hostname from facter is lowercase.

Comment 15 Damien Ciabrini 2018-05-25 15:50:09 UTC
Actually it's not, because from comment #9, the host's shortname is all lowercase on the overcloud controller nodes.

After applying https://review.openstack.org/#/c/570413/ , the deploy continues but all the {service}_sync_db containers do nothing, and the subsequent service container fail to run.

I've manually updated all the containers locally to force lowercase comparison like what is proposed in https://review.openstack.org/#/c/570484/, and the deploy finishes successfully.

So fixing this bz requires:
 . a fix in puppet-tripleo https://review.openstack.org/#/c/570413
 . a fix in tripleo-common https://review.openstack.org/#/c/570484
 . rebuilding all container images with the updated tripleo-common

Comment 23 Marius Cornea 2018-07-06 14:08:17 UTC
Can we please get the fixes for this bug into a downstream puddle?

Comment 37 Alex Schultz 2018-08-06 14:44:03 UTC
*** Bug 1610498 has been marked as a duplicate of this bug. ***

Comment 38 Joanne O'Flynn 2018-08-15 08:06:42 UTC
This bug is marked for inclusion in the errata but does not currently contain draft documentation text. To ensure the timely release of this advisory please provide draft documentation text for this bug as soon as possible.

If you do not think this bug requires errata documentation, set the requires_doc_text flag to "-".


To add draft documentation text:

* Select the documentation type from the "Doc Type" drop down field.

* A template will be provided in the "Doc Text" field based on the "Doc Type" value selected. Enter draft text in the "Doc Text" field.

Comment 40 errata-xmlrpc 2018-08-29 16:35:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2574