1600449 – OSP13 deployment failed at step5

Bug 1600449 - OSP13 deployment failed at step5

Summary: OSP13 deployment failed at step5

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	puppet-tripleo
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z5
Target Release:	13.0 (Queens)
Assignee:	Michele Baldessari
QA Contact:	Tzach Shefi
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1591251 (view as bug list)
Depends On:
Blocks:	1581780
TreeView+	depends on / blocked

Reported:	2018-07-12 09:41 UTC by Chen
Modified:	2022-03-13 15:13 UTC (History)
CC List:	24 users (show)
Fixed In Version:	puppet-tripleo-8.3.4-8.el7ost
Doc Type:	Bug Fix
Doc Text:	Previously, deployments could fail when deploying the Overcloud with a BlockStorage role and setting a pacemaker property on nodes that belong to the BlockStorage role. With this update, the pacemaker-managed cinder-volume resource starts only on nodes that pacemaker manages. As a result, Overcloud deployments with a BlockStorage role succeed.
Clone Of:
Environment:
Last Closed:	2019-03-14 13:54:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1786412	None	None	None	2018-08-10 05:49:47 UTC
OpenStack gerrit	593373	'None'	MERGED	Force cinder properties to be set on ly on nodes with pcmk on it	2020-11-25 15:34:39 UTC
Red Hat Issue Tracker	OSP-9301	None	None	None	2021-12-10 16:48:29 UTC
Red Hat Product Errata	RHBA-2019:0448	None	None	None	2019-03-14 13:54:58 UTC

Description Chen 2018-07-12 09:41:42 UTC

Description of problem:

OSP13 deployment failed at step5
[Thu Jul 12 17:02:41.161 2018] 2018-07-12 08:02:39Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED  Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step5.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[Thu Jul 12 17:02:41.161 2018] 2018-07-12 08:02:39Z [overcloud]: CREATE_FAILED  Resource CREATE failed: Error: resources.AllNodesDeploySteps.resources.ControllerDeployment_Step5.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
[Thu Jul 12 17:02:41.161 2018] 
[Thu Jul 12 17:02:41.161 2018]  Stack overcloud CREATE_FAILED 
[Thu Jul 12 17:02:41.164 2018] 
[Thu Jul 12 17:02:45.342 2018] overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.0:
[Thu Jul 12 17:02:45.342 2018]   resource_type: OS::Heat::StructuredDeployment
[Thu Jul 12 17:02:45.342 2018]   physical_resource_id: 95e83d30-b57b-45c7-aed6-54a20104420f
[Thu Jul 12 17:02:45.342 2018]   status: CREATE_FAILED
[Thu Jul 12 17:02:45.342 2018]   status_reason: |
[Thu Jul 12 17:02:45.342 2018]     Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
[Thu Jul 12 17:02:45.342 2018]   deploy_stdout: |
[Thu Jul 12 17:02:45.342 2018]     ...
[Thu Jul 12 17:02:45.342 2018]             "stderr: /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:332: NotSupportedWarning: Configuration option(s) ['use_tpool'] not supported", 
[Thu Jul 12 17:02:45.342 2018]             "  exception.NotSupportedWarning", 
[Thu Jul 12 17:02:45.342 2018]             "stdout: "
[Thu Jul 12 17:02:45.342 2018]         ]
[Thu Jul 12 17:02:45.342 2018]     }
[Thu Jul 12 17:02:45.342 2018]     	to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/de0114ad-3d64-4461-88f9-2ff392315bb4_playbook.retry
[Thu Jul 12 17:02:45.342 2018]     
[Thu Jul 12 17:02:45.342 2018]     PLAY RECAP *********************************************************************
[Thu Jul 12 17:02:45.342 2018]     localhost                  : ok=6    changed=2    unreachable=0    failed=1   

Version-Release number of selected component (if applicable):

OSP13 GA

How reproducible:

100% in customer's site

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Cédric Jeanneret 2018-07-12 12:39:40 UTC

Hello Chen,

We apparently don't have the deploy log, located on the Director, in /home/stack/log directory - care to ask for it and add it to the case? That would help discovering where the issue comes from.

Thank you!

Bests,

Cédric

Comment 26 Alan Bishop 2018-09-17 18:29:09 UTC

*** Bug 1591251 has been marked as a duplicate of this bug. ***

Comment 36 Tzach Shefi 2019-01-17 12:33:43 UTC

Hey Michele, 

Could you help me with configuration procedure/yamls? 
Never done this before, trying to figure it out now from OPSD guide.

Comment 37 Alan Bishop 2019-01-17 12:45:23 UTC

Tzach, basically what you need to do is create an overcloud with an extra "block storage" node. This will result in a cinder-volume service running under pacemaker on the controller(s), plus another cinder-volume service running on the block-storage node that is *not* under pacemaker. If you get it to successfully deploy, then you've verified the BZ. Previously (the bug), the deployment would fail because pacemaker thought it should add the c-vol on the block-storage node to the pacemaker cluster, which would fail (because it's not supposed to do that).

You can ping me on irc for more deployment details.

Comment 38 Tzach Shefi 2019-02-05 12:39:58 UTC

Verified on:
puppet-tripleo-8.3.6-7.el7ost.noarch

The steps used, documented for future reference:

1. On infrared (internal deployment system) deploy with 5 controllers and 2 computes, break point before OC deploy. 
I had set 5 + 2 and later "reused" one of the controllers as block storage node, the other controller was unused. 
OC I ended up with contained:  3 controllers + 2 computes +1 block storage +1 unused free VM/node. 


2. Create a blockstorage flavor, it doesn't need these exact CPU/RAM/DIsk it's just what I chose as I had such a "free" vm (virt deployment). 

$ openstack flavor create --id auto --ram 26384 --disk 27 --vcpus 7 blockstorage
$ openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="blockstorage" blockstorage

3. Tagged the free node with this role
openstack baremetal node set --property capabilities='profile:blockstorage,boot_option:local' d0d9800f-3e2a-49e7-9b3b-cd3fde9340c0

4. 
vi virt/nodes_data.yaml 
parameter_defaults:
    ControllerCount: 3
    OvercloudControlFlavor: controller
    ComputeCount: 1
    OvercloudComputeFlavor: compute
    BlockStorageCount: 1
    OvercloudBlockStorageFlavor: blockstorage

Just added the two last lines. 


5. Deploy overcloud, the resulting output:

(undercloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+--------------------------+--------+------------+-------------+------------------------+
| ID                                   | Name                     | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------------------+--------+------------+-------------+------------------------+
| b72561c2-e95f-45c8-a358-e7e813adf26a | compute-0                | ACTIVE | -          | Running     | ctlplane=192.168.24.20 |
| d291a285-d9a7-4df5-8bf8-6c1d5331f4d3 | controller-0             | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
| f87b5d46-4640-4307-93e3-4505cf1a8084 | controller-1             | ACTIVE | -          | Running     | ctlplane=192.168.24.19 |
| 99d7784a-6c10-4fe1-892b-fcb84e46d6c6 | controller-2             | ACTIVE | -          | Running     | ctlplane=192.168.24.6  |
| aff2c9aa-7b98-42be-b35e-2601fda65574 | overcloud-blockstorage-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.9  |
+--------------------------------------+--------------------------+--------+------------+-------------+------------------------+

Now lets check controller-0, we have 1 c-vol (openstack-cinder-volume:pcmklatest) 
[root@controller-0 ~]# docker ps | grep cinder
e9827c029f06        192.168.24.1:8787/rhosp13/openstack-cinder-volume:pcmklatest                            "/bin/bash /usr/lo..."   14 minutes ago      Up 14 minutes                                 openstack-cinder-volume-docker-0
00d93e10ee86        192.168.24.1:8787/rhosp13/openstack-cinder-api:2019-02-01.1-cve-grades                  "kolla_start"            16 minutes ago      Up 16 minutes                                 cinder_api_cron
8c7ce868a5fc        192.168.24.1:8787/rhosp13/openstack-cinder-scheduler:2019-02-01.1-cve-grades            "kolla_start"            17 minutes ago      Up 17 minutes (healthy)                       cinder_scheduler
0f92fac565da        192.168.24.1:8787/rhosp13/openstack-cinder-api:2019-02-01.1-cve-grades                  "kolla_start"            17 minutes ago      Up 17 minutes (healthy)                       cinder_api


Now lets check status on overcloud-blockstorage-0, we stop a second c-vol (openstack-cinder-volume):
[root@overcloud-blockstorage-0 ~]# docker ps
CONTAINER ID        IMAGE                                                                       COMMAND             CREATED             STATUS                    PORTS               NAMES
8947e6ff8e2a        192.168.24.1:8787/rhosp13/openstack-cron:2019-02-01.1-cve-grades            "kolla_start"       19 minutes ago      Up 19 minutes                                 logrotate_crond
a382c9621666        192.168.24.1:8787/rhosp13/openstack-cinder-volume:2019-02-01.1-cve-grades   "kolla_start"       19 minutes ago      Up 19 minutes (healthy)                       cinder_volume
4f72d409766a        192.168.24.1:8787/rhosp13/openstack-iscsid:2019-02-01.1-cve-grades          "kolla_start"       24 minutes ago      Up 24 minutes (healthy)                       iscsid


Looking good, as we have two c-vols running,
one on pacemaker/controllers and the other on our added BlockStorage node, verified.  
 
Used this guide
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/advanced_overcloud_customization/roles#sect-Creating_a_Generic_Node_with_No_Services
Despite the fact that it suggest creating a role yaml and calling it during OC deploy, which I didn't do yet things still worked. 
Guessing it worked as I used a built-in role (blockstorage) instead of a custom role.

Comment 40 errata-xmlrpc 2019-03-14 13:54:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0448

Note You need to log in before you can comment on or make changes to this bug.

abeekhof
abishop
amcleod
aschultz
batkisso
cchen
cschwede
dbecker
gfidente
jjoyce
jschluet
k-akatsuka
lmarsh
lmiccini
mburns
michele
morazi
pgrist
pkomarov
rhel-osp-bz
slinaber
tshefi
tvignaud
wlehman