Bug 1600449

Summary: OSP13 deployment failed at step5
Product: Red Hat OpenStack Reporter: Chen <cchen>
Component: puppet-tripleoAssignee: Michele Baldessari <michele>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: abeekhof, abishop, amcleod, aschultz, batkisso, cchen, cschwede, dbecker, gfidente, jjoyce, jschluet, k-akatsuka, lmarsh, lmiccini, mburns, michele, morazi, pgrist, pkomarov, rhel-osp-bz, slinaber, tshefi, tvignaud, wlehman
Target Milestone: z5Keywords: TestOnly, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-8.3.4-8.el7ost Doc Type: Bug Fix
Doc Text:
Previously, deployments could fail when deploying the Overcloud with a BlockStorage role and setting a pacemaker property on nodes that belong to the BlockStorage role. With this update, the pacemaker-managed cinder-volume resource starts only on nodes that pacemaker manages. As a result, Overcloud deployments with a BlockStorage role succeed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-14 13:54:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1581780    

Description Chen 2018-07-12 09:41:42 UTC
Description of problem:

OSP13 deployment failed at step5
[Thu Jul 12 17:02:41.161 2018] 2018-07-12 08:02:39Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step5.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[Thu Jul 12 17:02:41.161 2018] 2018-07-12 08:02:39Z [overcloud]: CREATE_FAILED  Resource CREATE failed: Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step5.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[Thu Jul 12 17:02:41.161 2018] 
[Thu Jul 12 17:02:41.161 2018]  Stack overcloud CREATE_FAILED 
[Thu Jul 12 17:02:41.164 2018] 
[Thu Jul 12 17:02:45.342 2018] overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.0:
[Thu Jul 12 17:02:45.342 2018]   resource_type: OS::Heat::StructuredDeployment
[Thu Jul 12 17:02:45.342 2018]   physical_resource_id: 95e83d30-b57b-45c7-aed6-54a20104420f
[Thu Jul 12 17:02:45.342 2018]   status: CREATE_FAILED
[Thu Jul 12 17:02:45.342 2018]   status_reason: |
[Thu Jul 12 17:02:45.342 2018]     Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
[Thu Jul 12 17:02:45.342 2018]   deploy_stdout: |
[Thu Jul 12 17:02:45.342 2018]     ...
[Thu Jul 12 17:02:45.342 2018]             "stderr: /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:332: NotSupportedWarning: Configuration option(s) ['use_tpool'] not supported", 
[Thu Jul 12 17:02:45.342 2018]             "  exception.NotSupportedWarning", 
[Thu Jul 12 17:02:45.342 2018]             "stdout: "
[Thu Jul 12 17:02:45.342 2018]         ]
[Thu Jul 12 17:02:45.342 2018]     }
[Thu Jul 12 17:02:45.342 2018]     	to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/de0114ad-3d64-4461-88f9-2ff392315bb4_playbook.retry
[Thu Jul 12 17:02:45.342 2018]     
[Thu Jul 12 17:02:45.342 2018]     PLAY RECAP *********************************************************************
[Thu Jul 12 17:02:45.342 2018]     localhost                  : ok=6    changed=2    unreachable=0    failed=1   

Version-Release number of selected component (if applicable):

OSP13 GA

How reproducible:

100% in customer's site

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Cédric Jeanneret 2018-07-12 12:39:40 UTC
Hello Chen,

We apparently don't have the deploy log, located on the Director, in /home/stack/log directory - care to ask for it and add it to the case? That would help discovering where the issue comes from.

Thank you!

Bests,

Cédric

Comment 26 Alan Bishop 2018-09-17 18:29:09 UTC
*** Bug 1591251 has been marked as a duplicate of this bug. ***

Comment 36 Tzach Shefi 2019-01-17 12:33:43 UTC
Hey Michele, 

Could you help me with configuration procedure/yamls? 
Never done this before, trying to figure it out now from OPSD guide.

Comment 37 Alan Bishop 2019-01-17 12:45:23 UTC
Tzach, basically what you need to do is create an overcloud with an extra "block storage" node. This will result in a cinder-volume service running under pacemaker on the controller(s), plus another cinder-volume service running on the block-storage node that is *not* under pacemaker. If you get it to successfully deploy, then you've verified the BZ. Previously (the bug), the deployment would fail because pacemaker thought it should add the c-vol on the block-storage node to the pacemaker cluster, which would fail (because it's not supposed to do that).

You can ping me on irc for more deployment details.

Comment 38 Tzach Shefi 2019-02-05 12:39:58 UTC
Verified on:
puppet-tripleo-8.3.6-7.el7ost.noarch

The steps used, documented for future reference:

1. On infrared (internal deployment system) deploy with 5 controllers and 2 computes, break point before OC deploy. 
I had set 5 + 2 and later "reused" one of the controllers as block storage node, the other controller was unused. 
OC I ended up with contained:  3 controllers + 2 computes +1 block storage +1 unused free VM/node. 


2. Create a blockstorage flavor, it doesn't need these exact CPU/RAM/DIsk it's just what I chose as I had such a "free" vm (virt deployment). 

$ openstack flavor create --id auto --ram 26384 --disk 27 --vcpus 7 blockstorage
$ openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="blockstorage" blockstorage

3. Tagged the free node with this role
openstack baremetal node set --property capabilities='profile:blockstorage,boot_option:local' d0d9800f-3e2a-49e7-9b3b-cd3fde9340c0

4. 
vi virt/nodes_data.yaml 
parameter_defaults:
    ControllerCount: 3
    OvercloudControlFlavor: controller
    ComputeCount: 1
    OvercloudComputeFlavor: compute
    BlockStorageCount: 1
    OvercloudBlockStorageFlavor: blockstorage

Just added the two last lines. 


5. Deploy overcloud, the resulting output:

(undercloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+--------------------------+--------+------------+-------------+------------------------+
| ID                                   | Name                     | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------------------+--------+------------+-------------+------------------------+
| b72561c2-e95f-45c8-a358-e7e813adf26a | compute-0                | ACTIVE | -          | Running     | ctlplane=192.168.24.20 |
| d291a285-d9a7-4df5-8bf8-6c1d5331f4d3 | controller-0             | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
| f87b5d46-4640-4307-93e3-4505cf1a8084 | controller-1             | ACTIVE | -          | Running     | ctlplane=192.168.24.19 |
| 99d7784a-6c10-4fe1-892b-fcb84e46d6c6 | controller-2             | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| aff2c9aa-7b98-42be-b35e-2601fda65574 | overcloud-blockstorage-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.9  |
+--------------------------------------+--------------------------+--------+------------+-------------+------------------------+

Now lets check controller-0, we have 1 c-vol (openstack-cinder-volume:pcmklatest) 
[root@controller-0 ~]# docker ps | grep cinder
e9827c029f06        192.168.24.1:8787/rhosp13/openstack-cinder-volume:pcmklatest                            "/bin/bash /usr/lo..."   14 minutes ago      Up 14 minutes                                 openstack-cinder-volume-docker-0
00d93e10ee86        192.168.24.1:8787/rhosp13/openstack-cinder-api:2019-02-01.1-cve-grades                  "kolla_start"            16 minutes ago      Up 16 minutes                                 cinder_api_cron
8c7ce868a5fc        192.168.24.1:8787/rhosp13/openstack-cinder-scheduler:2019-02-01.1-cve-grades            "kolla_start"            17 minutes ago      Up 17 minutes (healthy)                       cinder_scheduler
0f92fac565da        192.168.24.1:8787/rhosp13/openstack-cinder-api:2019-02-01.1-cve-grades                  "kolla_start"            17 minutes ago      Up 17 minutes (healthy)                       cinder_api


Now lets check status on overcloud-blockstorage-0, we stop a second c-vol (openstack-cinder-volume):
[root@overcloud-blockstorage-0 ~]# docker ps
CONTAINER ID        IMAGE                                                                       COMMAND             CREATED             STATUS                    PORTS               NAMES
8947e6ff8e2a        192.168.24.1:8787/rhosp13/openstack-cron:2019-02-01.1-cve-grades            "kolla_start"       19 minutes ago      Up 19 minutes                                 logrotate_crond
a382c9621666        192.168.24.1:8787/rhosp13/openstack-cinder-volume:2019-02-01.1-cve-grades   "kolla_start"       19 minutes ago      Up 19 minutes (healthy)                       cinder_volume
4f72d409766a        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-02-01.1-cve-grades          "kolla_start"       24 minutes ago      Up 24 minutes (healthy)                       iscsid


Looking good, as we have two c-vols running,
one on pacemaker/controllers and the other on our added BlockStorage node, verified.  
 
Used this guide
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/advanced_overcloud_customization/roles#sect-Creating_a_Generic_Node_with_No_Services
Despite the fact that it suggest creating a role yaml and calling it during OC deploy, which I didn't do yet things still worked. 
Guessing it worked as I used a built-in role (blockstorage) instead of a custom role.

Comment 40 errata-xmlrpc 2019-03-14 13:54:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0448