Bug 1576782

Summary:

[UPDATE] update failed at Task [Retag pcmklatest to latest Cinder-Backup image]

Product:

Red Hat OpenStack

Reporter:

Raviv Bar-Tal <rbartal>

Component:

openstack-tripleo-heat-templates

Assignee:

Emilien Macchi <emacchi>

Status:

CLOSED ERRATA

QA Contact:

Raviv Bar-Tal <rbartal>

Severity:

medium

Docs Contact:

Priority:

high

Version:

13.0 (Queens)

CC:

dbecker, jschluet, jstransk, mbracho, mbultel, mburns, morazi

Target Milestone:

Keywords:

Triaged

Target Release:

13.0 (Queens)

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

openstack-tripleo-heat-templates-8.0.2-22.el7ost

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-06-27 13:55:31 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
controller sosreport part a	none
controller sosreport part b	none
controller sosreport part c	none
controller sosreport part d	none
controller sosreport part e	none
/home/stack files	none

Description Raviv Bar-Tal 2018-05-10 11:54:50 UTC

Description of problem:
Update from 2018-05-07.2 build failed  on controller update in the task [Retag pcmklatest to latest Cinder-Backup image]
Error message: 
"Error response from daemon: no such id: 192.168.24.1:8787/rhosp13/openstack-cinder-backup:2018-05-07.2"

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Install osp13 build 2018-05-07.2
2. update unercloud
3. update overcloud


Actual results:


Expected results:


Additional info:
See attached logs.
Automatic job on stage server:
http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-13-from-2018-05-07.2-HA-ipv4/1/console

Comment 1 Raviv Bar-Tal 2018-05-10 12:01:50 UTC

Created attachment 1434337 [details]
controller sosreport part a

Comment 2 Raviv Bar-Tal 2018-05-10 12:03:17 UTC

As a result of the error controller 2 is offline:
[heat-admin@controller-0 ~]$ sudo pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-1 (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum
Last updated: Thu May 10 11:59:37 2018
Last change: Wed May  9 16:26:09 2018 by root via cibadmin on controller-0

12 nodes configured
38 resources configured

Online: [ controller-0 controller-1 ]
OFFLINE: [ controller-2 ]
GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 rabbitmq-bundle-0@controller-0 rabbitmq-bundle-1@controller-1 redis-bundle-0@controller-0 redis-bundle-1@controller-1 ]

Full list of resources:

 Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest]
   rabbitmq-bundle-0	(ocf::heartbeat:rabbitmq-cluster):	Started controller-0
   rabbitmq-bundle-1	(ocf::heartbeat:rabbitmq-cluster):	Started controller-1
   rabbitmq-bundle-2	(ocf::heartbeat:rabbitmq-cluster):	Stopped
 Docker container set: galera-bundle [192.168.24.1:8787/rhosp13/openstack-mariadb:pcmklatest]
   galera-bundle-0	(ocf::heartbeat:galera):	Master controller-0
   galera-bundle-1	(ocf::heartbeat:galera):	Master controller-1
   galera-bundle-2	(ocf::heartbeat:galera):	Stopped
 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0	(ocf::heartbeat:redis):	Master controller-0
   redis-bundle-1	(ocf::heartbeat:redis):	Slave controller-1
   redis-bundle-2	(ocf::heartbeat:redis):	Stopped
 ip-192.168.24.8	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-10.0.0.101	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.1.12	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.1.13	(ocf::heartbeat:IPaddr2):	Started controller-0
 ip-172.17.3.10	(ocf::heartbeat:IPaddr2):	Started controller-1
 ip-172.17.4.19	(ocf::heartbeat:IPaddr2):	Started controller-0
 Docker container set: haproxy-bundle [192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest]
   haproxy-bundle-docker-0	(ocf::heartbeat:docker):	Started controller-0
   haproxy-bundle-docker-1	(ocf::heartbeat:docker):	Started controller-1
   haproxy-bundle-docker-2	(ocf::heartbeat:docker):	Stopped
 Docker container: openstack-cinder-volume [192.168.24.1:8787/rhosp13/openstack-cinder-volume:pcmklatest]
   openstack-cinder-volume-docker-0	(ocf::heartbeat:docker):	Started controller-0
 Docker container: openstack-cinder-backup [192.168.24.1:8787/rhosp13/openstack-cinder-backup:pcmklatest]
   openstack-cinder-backup-docker-0	(ocf::heartbeat:docker):	Started controller-1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[heat-admin@controller-0 ~]$

Comment 3 Raviv Bar-Tal 2018-05-10 12:05:14 UTC

Created attachment 1434338 [details]
controller sosreport part b

Comment 4 Raviv Bar-Tal 2018-05-10 12:06:26 UTC

Created attachment 1434339 [details]
controller sosreport part c

Comment 5 Raviv Bar-Tal 2018-05-10 12:08:17 UTC

Created attachment 1434340 [details]
controller sosreport part d

Comment 6 Raviv Bar-Tal 2018-05-10 12:10:35 UTC

Created attachment 1434341 [details]
controller sosreport part e

Comment 7 Raviv Bar-Tal 2018-05-10 12:12:05 UTC

Created attachment 1434342 [details]
/home/stack files

Comment 9 Jiri Stransky 2018-05-11 14:59:31 UTC

Looking at logs + code, this is probably specifically affecting cinder-backup service. I have a fix proposal but wasn't able to test it yet as i hit unrelated issues with upstream env.

Raviv, to progress forward with testing, i think you can either:

* apply the intended fix https://review.openstack.org/567806 to your enviornment (this would be nice as we'd also pre-validate the fix downstream),

or

* temporarily remove environments/cinder-backup.yaml from the command lines used when testing.

Comment 10 Raviv Bar-Tal 2018-05-14 11:25:43 UTC

I have manually applied the patch and the update passed this stage,
We should have this patch merged and landing downstream asapץ

Comment 12 Jiri Stransky 2018-05-15 13:16:57 UTC

The patch is hitting instability in the upstream CI, but once it lands at least to master, we can propose a downstream backport without waiting on the upstream one i think.

Comment 21 errata-xmlrpc 2018-06-27 13:55:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086