Bug 1578158

Summary: FFU: openstack-cinder-backup pacemaker volume is not deleted during controllers upgrade
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: urgent Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: dbecker, jschluet, mandreou, mbracho, mbultel, mburns, morazi
Target Milestone: rcKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.2-23.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:56:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Marius Cornea 2018-05-14 23:17:41 UTC
Description of problem:

FFU: openstack-cinder-backup pacemaker resource is not deleted during controllers upgrade and as a result the cinder-backup service fails to start in containerized mode. After controller upgrade:

[root@controller-0 heat-admin]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-1 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum
Last updated: Mon May 14 23:12:51 2018
Last change: Mon May 14 22:55:09 2018 by root via cibadmin on controller-0

12 nodes configured
38 resources configured

Online: [ controller-0 controller-1 controller-2 ]
GuestOnline: [ galera-bundle-0@controller-1 galera-bundle-1@controller-2 galera-bundle-2@controller-0 rabbitmq-bundle-0@controller-1 rabbitmq-bundle-1@controller-2 rabbitmq-bundle-2@controller-0 redis-bundle-0@controller-1 redis-bundle-1@controller-2 redis-bundle-2@controller-0 ]

Full list of resources:

 ip-172.17.4.18	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.3.17	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-192.168.24.6	(ocf::heartbeat:IPaddr2):	Started controller-2
 ip-172.17.1.13	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.1.12	(ocf::heartbeat:IPaddr2):	Started controller-2
 openstack-cinder-backup	(systemd:openstack-cinder-backup):	Stopped
 Docker container set: rabbitmq-bundle [rhos-qe-mirror-rdu2.usersys.redhat.com:5000/rhosp13/openstack-rabbitmq:pcmklatest]
   rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	Started controller-1
   rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	Started controller-2
   rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	Started controller-0
 Docker container set: galera-bundle [rhos-qe-mirror-rdu2.usersys.redhat.com:5000/rhosp13/openstack-mariadb:pcmklatest]
   galera-bundle-0	(ocf::heartbeat:galera):	Master controller-1
   galera-bundle-1	(ocf::heartbeat:galera):	Master controller-2
   galera-bundle-2	(ocf::heartbeat:galera):	Master controller-0
 Docker container set: redis-bundle [rhos-qe-mirror-rdu2.usersys.redhat.com:5000/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0	(ocf::heartbeat:redis):	Master controller-1
   redis-bundle-1	(ocf::heartbeat:redis):	Slave controller-2
   redis-bundle-2	(ocf::heartbeat:redis):	Slave controller-0
 Docker container set: haproxy-bundle [rhos-qe-mirror-rdu2.usersys.redhat.com:5000/rhosp13/openstack-haproxy:pcmklatest]
   haproxy-bundle-docker-0	(ocf::heartbeat:docker):	Started controller-1
   haproxy-bundle-docker-1	(ocf::heartbeat:docker):	Started controller-2
   haproxy-bundle-docker-2	(ocf::heartbeat:docker):	Started controller-0
 Docker container: openstack-cinder-volume [rhos-qe-mirror-rdu2.usersys.redhat.com:5000/rhosp13/openstack-cinder-volume:pcmklatest]
   openstack-cinder-volume-docker-0	(ocf::heartbeat:docker):	Started controller-0

Failed Actions:
* openstack-cinder-backup_start_0 on controller-0 'not running' (7): call=115, status=complete, exitreason='',
    last-rc-change='Mon May 14 22:45:42 2018', queued=0ms, exec=2077ms
* openstack-cinder-backup_start_0 on controller-2 'not running' (7): call=111, status=complete, exitreason='',
    last-rc-change='Mon May 14 22:45:51 2018', queued=0ms, exec=2098ms
* openstack-cinder-backup_start_0 on controller-1 'not running' (7): call=111, status=complete, exitreason='',
    last-rc-change='Mon May 14 22:45:46 2018', queued=0ms, exec=2110ms

  

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-8.0.2-17.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 with cinder-backup service enabled(-e /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml)
2. Run the fast forward upgrade procedure 
3. After openstack overcloud upgrade run --roles Controller  check the output of pcs status and docker ps

Actual results:
openstack-cinder-backup is still present and there are no containers running the cinder-backup service.

Expected results:
openstack-cinder-backup pacemaker resource is deleted and the cinder backup service runs inside containers.

Additional info:

Looks like we're missing the fast forward upgrade tasks in https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/cinder-backup.yaml

Comment 1 Marios Andreou 2018-05-15 06:00:39 UTC
taking this and marking triaged - will take a closer look and post something later thanks mcornea

Comment 2 Marios Andreou 2018-05-15 08:34:48 UTC
indeed @mcornea as you pointed out it seems to be forgotted ffu tasks added https://review.openstack.org/568520 to trackers it is copy/paste from the other docker/services/pacemaker/ things i.e. check and disable the cluster resource on the bootstrap node.

Comment 12 errata-xmlrpc 2018-06-27 13:56:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086