Bug 1564449

Summary: [UPGRADES] Update pcs' bundles with new images during major upgrade
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: openstack-tripleo-heat-templatesAssignee: Damien Ciabrini <dciabrin>
Status: CLOSED ERRATA QA Contact: Yurii Prokulevych <yprokule>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: aherr, astupnik, augol, chjones, jschluet, mandreou, mbultel, mburns, mcornea, michele, mkrcmari, pkomarov, rhel-osp-director-maint
Target Milestone: betaKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.2-3.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:50:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yurii Prokulevych 2018-04-06 10:26:00 UTC
Description of problem:
-----------------------
During major RHOS upgrade images newer images have own namespace.
This has to handles for pcs* managed services to avoid issue when docker image from current RHOS version is already removed, but is still referenced in bundles.

E.g:
pcs status
    Cluster name: tripleo_cluster
    Stack: corosync
    Current DC: database-0 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
    Last updated: Fri Apr  6 09:28:56 2018
    Last change: Fri Apr  6 09:20:13 2018 by hacluster via crmd on controller-2
     
    20 nodes configured
    38 resources configured
     
    Online: [ controller-0 controller-1 controller-2 database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 ]
    RemoteOnline: [ networker-0 networker-1 ]
    GuestOnline: [ galera-bundle-0@database-0 galera-bundle-1@database-1 galera-bundle-2@database-2 rabbitmq-bundle-0@messaging-0 rabbitmq-bundle-1@messaging-1 rabbitmq-bundle-2@messaging-2 ]
     
    Full list of resources:
     
     networker-0    (ocf::pacemaker:remote):        Started database-0
     networker-1    (ocf::pacemaker:remote):        Started database-1
     Docker container set: galera-bundle [192.168.24.1:8787/rhosp12/openstack-mariadb:pcmklatest]
       galera-bundle-0      (ocf::heartbeat:galera):        Master database-0
       galera-bundle-1      (ocf::heartbeat:galera):        Master database-1
       galera-bundle-2      (ocf::heartbeat:galera):        Master database-2
     Docker container set: redis-bundle [192.168.24.1:8787/rhosp12/openstack-redis:pcmklatest]
       redis-bundle-0       (ocf::heartbeat:redis): Stopped
       redis-bundle-1       (ocf::heartbeat:redis): Stopped
       redis-bundle-2       (ocf::heartbeat:redis): Stopped
     Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp12/openstack-rabbitmq:pcmklatest]
       rabbitmq-bundle-0    (ocf::heartbeat:rabbitmq-cluster):      Started messaging-0
       rabbitmq-bundle-1    (ocf::heartbeat:rabbitmq-cluster):      Started messaging-1
       rabbitmq-bundle-2    (ocf::heartbeat:rabbitmq-cluster):      Started messaging-2
     ip-192.168.24.7        (ocf::heartbeat:IPaddr2):       Stopped
     ip-10.0.0.101  (ocf::heartbeat:IPaddr2):       Stopped
     ip-172.17.1.13 (ocf::heartbeat:IPaddr2):       Stopped
     ip-172.17.1.17 (ocf::heartbeat:IPaddr2):       Stopped
     ip-172.17.3.11 (ocf::heartbeat:IPaddr2):       Stopped
     ip-172.17.4.14 (ocf::heartbeat:IPaddr2):       Stopped
     Docker container set: haproxy-bundle [192.168.24.1:8787/rhosp12/openstack-haproxy:pcmklatest]
       haproxy-bundle-docker-0      (ocf::heartbeat:docker):        Stopped
       haproxy-bundle-docker-1      (ocf::heartbeat:docker):        Stopped
       haproxy-bundle-docker-2      (ocf::heartbeat:docker):        Stopped
     
    Failed Actions:
    * redis-bundle-docker-0_start_0 on controller-2 'unknown error' (1): call=227, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp12/openstack-redis:pcmklatest',
        last-rc-change='Fri Apr  6 09:20:20 2018', queued=0ms, exec=253ms
    * redis-bundle-docker-1_start_0 on controller-2 'unknown error' (1): call=231, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp12/openstack-redis:pcmklatest',
        last-rc-change='Fri Apr  6 09:20:21 2018', queued=0ms, exec=280ms
    * redis-bundle-docker-2_start_0 on controller-2 'unknown error' (1): call=220, status=complete, exitreason='failed to pull image 192.168.24.1:8787/rhosp12/openstack-redis:pcmklatest',
        last-rc-change='Fri Apr  6 09:20:15 2018', queued=0ms, exec=226ms



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-heat-templates-8.0.2-0.20180327213843.f25e2d8.el7ost.noarch

How reproducible:
-----------------
100%

Steps to Reproduce:
-------------------
1. Upgrade UC to RHOS-13
2. Setup latest repos on oc
3. Prepare docker images from RHOS-13
4. Run `openstack overcloud upgrade prepare ...` to generate upgrade playbooks
5. Start upgrade of nodes hosting pcs* managed services

Actual results:
---------------
Upgrade process hungs until bundles are updated with correct image, e.g.:
  pcs resource bundle update redis-bundle container image=<path/to/new/image>

Comment 1 Damien Ciabrini 2018-04-11 15:24:26 UTC
review 560426 should fix the bug in queens upstream. It should be applied on top of 560322 which is not merged yet in queens, but is merged already in master.

Comment 8 Jon Schlueter 2018-04-25 12:47:44 UTC
in build, sorry for the noise.

Comment 14 errata-xmlrpc 2018-06-27 13:50:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086