Description of problem: Update from 2018-05-07.2 build failed on controller update in the task [Retag pcmklatest to latest Cinder-Backup image] Error message: "Error response from daemon: no such id: 192.168.24.1:8787/rhosp13/openstack-cinder-backup:2018-05-07.2" Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Install osp13 build 2018-05-07.2 2. update unercloud 3. update overcloud Actual results: Expected results: Additional info: See attached logs. Automatic job on stage server: http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-13-from-2018-05-07.2-HA-ipv4/1/console
Created attachment 1434337 [details] controller sosreport part a
As a result of the error controller 2 is offline: [heat-admin@controller-0 ~]$ sudo pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: controller-1 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum Last updated: Thu May 10 11:59:37 2018 Last change: Wed May 9 16:26:09 2018 by root via cibadmin on controller-0 12 nodes configured 38 resources configured Online: [ controller-0 controller-1 ] OFFLINE: [ controller-2 ] GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 rabbitmq-bundle-0@controller-0 rabbitmq-bundle-1@controller-1 redis-bundle-0@controller-0 redis-bundle-1@controller-1 ] Full list of resources: Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-0 rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-1 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Stopped Docker container set: galera-bundle [192.168.24.1:8787/rhosp13/openstack-mariadb:pcmklatest] galera-bundle-0 (ocf::heartbeat:galera): Master controller-0 galera-bundle-1 (ocf::heartbeat:galera): Master controller-1 galera-bundle-2 (ocf::heartbeat:galera): Stopped Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Master controller-0 redis-bundle-1 (ocf::heartbeat:redis): Slave controller-1 redis-bundle-2 (ocf::heartbeat:redis): Stopped ip-192.168.24.8 (ocf::heartbeat:IPaddr2): Started controller-0 ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.1.12 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.1.13 (ocf::heartbeat:IPaddr2): Started controller-0 ip-172.17.3.10 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.4.19 (ocf::heartbeat:IPaddr2): Started controller-0 Docker container set: haproxy-bundle [192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest] haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started controller-0 haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started controller-1 haproxy-bundle-docker-2 (ocf::heartbeat:docker): Stopped Docker container: openstack-cinder-volume [192.168.24.1:8787/rhosp13/openstack-cinder-volume:pcmklatest] openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started controller-0 Docker container: openstack-cinder-backup [192.168.24.1:8787/rhosp13/openstack-cinder-backup:pcmklatest] openstack-cinder-backup-docker-0 (ocf::heartbeat:docker): Started controller-1 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [heat-admin@controller-0 ~]$
Created attachment 1434338 [details] controller sosreport part b
Created attachment 1434339 [details] controller sosreport part c
Created attachment 1434340 [details] controller sosreport part d
Created attachment 1434341 [details] controller sosreport part e
Created attachment 1434342 [details] /home/stack files
Looking at logs + code, this is probably specifically affecting cinder-backup service. I have a fix proposal but wasn't able to test it yet as i hit unrelated issues with upstream env. Raviv, to progress forward with testing, i think you can either: * apply the intended fix https://review.openstack.org/567806 to your enviornment (this would be nice as we'd also pre-validate the fix downstream), or * temporarily remove environments/cinder-backup.yaml from the command lines used when testing.
I have manually applied the patch and the update passed this stage, We should have this patch merged and landing downstream asapץ
The patch is hitting instability in the upstream CI, but once it lands at least to master, we can propose a downstream backport without waiting on the upstream one i think.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086