Bug 1952482

Summary: RHOSP 13 to 16.1 Upgrades] nova_hybrid_state task should check for the running container and image instead checking file docker-container-hybrid_nova_compute.json
Product: Red Hat OpenStack Reporter: Shravan Kumar Tiwari <shtiwari>
Component: openstack-tripleo-heat-templatesAssignee: Lukas Bezdicka <lbezdick>
Status: CLOSED ERRATA QA Contact: Jose Luis Franco <jfrancoa>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: drosenfe, irathore, jfrancoa, jpretori, kthakre, mburns, msufiyan, sgolovat
Target Milestone: z6Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210408163452.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-26 13:52:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shravan Kumar Tiwari 2021-04-22 11:09:41 UTC
Description of problem:

It is observed that if the The first time the nova_hybrid_state upgrade fails because of some reasons and if /var/lib/tripleo-config/docker-container-hybrid_nova_compute.json is created (Because of this paunch apply did not execute on all the compute nodes of that specific role)

Then re-running the hybrid state step again caused the paunch taks to be skipped as the docker-container-hybrid_nova_compute.json file was already present due to earlier failed execution.


2021-04-20 10:04:14,879 p=33118 u=mistral n=ansible | TASK [Apply paunch config for nova_compute] ************************************
2021-04-20 10:04:14,880 p=33118 u=mistral n=ansible | Tuesday 20 April 2021  10:04:14 +0200 (0:00:00.685)       0:02:13.952 *********
2021-04-20 10:04:14,931 p=33118 u=mistral n=ansible | skipping: [com018] => {"changed": false, "skip_reason": "Conditional result was False"}
2021-04-20 10:04:14,932 p=33118 u=mistral n=ansible | skipping: [com019] => {"changed": false, "skip_reason": "Conditional result was False"}
2021-04-20 10:04:14,953 p=33118 u=mistral n=ansible | skipping: [com026] => {"changed": false, "skip_reason": "Conditional result was False"}
2021-04-20 10:04:14,973 p=33118 u=mistral n=ansible | skipping: [com027] => {"changed": false, "skip_reason": "Conditional result was False"}


This has caused the issue later as hybrid state steps got executed without any error skipping the paunch task and then tenant VMs started getting issue of volume detach/attach, even creation of VM and boot also experienced the issue

Actual results:



Expected results:

There should be more optimal approach to handle such situation or to Change the stat into check for running container and image instead of checking the file.

Additional info:

Comment 7 Lukas Bezdicka 2021-05-05 10:41:02 UTC
*** Bug 1953234 has been marked as a duplicate of this bug. ***

Comment 8 Jose Luis Franco 2021-05-05 13:52:21 UTC
http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-HA-no-ceph-ovn-dvr/94/undercloud-0/home/stack/overcloud_upgrade_run-controller-0.log.gz

2021-05-04 06:40:49 | TASK [Check if iscsid is running with proper image] ****************************
2021-05-04 06:40:49 | Tuesday 04 May 2021  06:40:44 +0000 (0:00:00.496)       0:00:55.409 *********** 
2021-05-04 06:40:49 | changed: [compute-0] => {"changed": true, "cmd": "docker ps | grep \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20210430.1\"\n", "delta": "0:00:00.043251", "end": "2021-05-04 06:40:45.146092", "failed_when_result": false, "msg": "non-zero return code", "rc": 1, "start": "2021-05-04 06:40:45.102841", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2021-05-04 06:40:49 | changed: [compute-1] => {"changed": true, "cmd": "docker ps | grep \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-iscsid:16.1_20210430.1\"\n", "delta": "0:00:00.050798", "end": "2021-05-04 06:40:45.178955", "failed_when_result": false, "msg": "non-zero return code", "rc": 1, "start": "2021-05-04 06:40:45.128157", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

............................................................

2021-05-04 06:41:15 | TASK [Apply paunch config for iscsid] ******************************************
2021-05-04 06:41:15 | Tuesday 04 May 2021  06:40:50 +0000 (0:00:00.695)       0:01:00.962 *********** 
2021-05-04 06:41:15 | changed: [compute-0] => {"changed": true, "cmd": "paunch apply --file /var/lib/tripleo-config/docker-container-hybrid_iscsid.json --config-id hybrid_iscsid", "delta": "0:00:11.657153", "end": "2021-05-04 06:41:02.306083", "rc": 0, "start": "2021-05-04 06:40:50.648930", "stderr": "", "stderr_lines": [], "stdout": "Did not find container with \"['docker', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--filter', 'label=config_id=hybrid_iscsid', '--format', '{{.Names}}']\" - retrying without config_id\nDid not find container with \"['docker', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--format', '{{.Names}}']\"", "stdout_lines": ["Did not find container with \"['docker', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--filter', 'label=config_id=hybrid_iscsid', '--format', '{{.Names}}']\" - retrying without config_id", "Did not find container with \"['docker', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--format', '{{.Names}}']\""]}
2021-05-04 06:41:15 | 
2021-05-04 06:41:15 | changed: [compute-1] => {"changed": true, "cmd": "paunch apply --file /var/lib/tripleo-config/docker-container-hybrid_iscsid.json --config-id hybrid_iscsid", "delta": "0:00:11.910811", "end": "2021-05-04 06:41:02.564756", "rc": 0, "start": "2021-05-04 06:40:50.653945", "stderr": "", "stderr_lines": [], "stdout": "Did not find container with \"['docker', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--filter', 'label=config_id=hybrid_iscsid', '--format', '{{.Names}}']\" - retrying without config_id\nDid not find container with \"['docker', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--format', '{{.Names}}']\"", "stdout_lines": ["Did not find container with \"['docker', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--filter', 'label=config_id=hybrid_iscsid', '--format', '{{.Names}}']\" - retrying without config_id", "Did not find container with \"['docker', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--format', '{{.Names}}']\""]}
2021-05-04 06:41:15 | 
2021-05-04 06:41:15 | TASK [Check if nova_compute is running with proper image] **********************
2021-05-04 06:41:15 | Tuesday 04 May 2021  06:41:02 +0000 (0:00:12.183)       0:01:13.146 *********** 
2021-05-04 06:41:15 | changed: [compute-0] => {"changed": true, "cmd": "docker ps | grep \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:16.1_20210430.1\"\n", "delta": "0:00:00.033059", "end": "2021-05-04 06:41:02.855364", "failed_when_result": false, "msg": "non-zero return code", "rc": 1, "start": "2021-05-04 06:41:02.822305", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2021-05-04 06:41:15 | changed: [compute-1] => {"changed": true, "cmd": "docker ps | grep \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-nova-compute:16.1_20210430.1\"\n", "delta": "0:00:00.037808", "end": "2021-05-04 06:41:02.883398", "failed_when_result": false, "msg": "non-zero return code", "rc": 1, "start": "2021-05-04 06:41:02.845590", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2021-05-04 06:41:15 | 
...........................................................

2021-05-04 06:41:57 | TASK [Check if ovn_controller is runing with proper image] *********************
2021-05-04 06:41:57 | Tuesday 04 May 2021  06:41:53 +0000 (0:00:00.092)       0:02:03.696 *********** 
2021-05-04 06:41:57 | changed: [compute-0] => {"changed": true, "cmd": "docker ps | grep \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-controller:16.1_20210430.1\"", "delta": "0:00:00.042160", "end": "2021-05-04 06:41:53.463997", "failed_when_result": false, "msg": "non-zero return code", "rc": 1, "start": "2021-05-04 06:41:53.421837", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2021-05-04 06:41:57 | changed: [compute-1] => {"changed": true, "cmd": "docker ps | grep \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-controller:16.1_20210430.1\"", "delta": "0:00:00.039445", "end": "2021-05-04 06:41:53.491884", "failed_when_result": false, "msg": "non-zero return code", "rc": 1, "start": "2021-05-04 06:41:53.452439", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2021-05-04 06:41:57 | 

http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-HA-no-ceph-ovn-dvr/94/undercloud-0/var/log/dnf.rpm.log.gz

2021-05-04T05:08:30Z SUBDEBUG Installed: openstack-tripleo-heat-templates-11.3.2-1.20210408163452.el8ost.noarch

Comment 14 errata-xmlrpc 2021-05-26 13:52:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.6 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2097