Bug 1860640

Summary: 16.0 to 16.1 undercloud update error with [tripleo-podman : Clean podman images]
Product: Red Hat OpenStack Reporter: Dan Macpherson <dmacpher>
Component: tripleo-ansibleAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: Sasha Smolyak <ssmolyak>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: cjeanner, emacchi, jhajyahy, mrelewicz, rurena, Sam.Wan
Target Milestone: z2Keywords: Reopened, Triaged, ZStream
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tripleo-ansible-0.5.1-1.20200810213406.47bafcc.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-28 15:38:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Macpherson 2020-07-26 05:23:11 UTC
Description of problem:
Performing a undercloud update from 16.0 to 16.1. After running the "openstack undercloud upgrade" command, the actual upgrade tasks run successfully, but the post-upgrade tasks fail at the following:

TASK [Purge Podman] *********************************************************************************************************************************************************
Sunday 26 July 2020  04:47:15 +0000 (0:00:00.187)       0:00:13.334 ***********

TASK [tripleo-podman : Clean podman images] *********************************************************************************************************************************
Sunday 26 July 2020  04:47:15 +0000 (0:00:00.193)       0:00:13.527 ***********
fatal: [ccsosp-undercloud]: FAILED! => {"changed": true, "cmd": ["podman", "image", "prune", "-a"], "delta": "0:00:00.064692", "end": "2020-07-26 04:47:15.471864", "msg": "non-zero return code", "rc": 125, "start": "2020-07-26 04:47:15.407172", "stderr": "Error: error reading input: EOF", "stderr_lines": ["Error: error reading input: EOF"], "stdout": "\nWARNING! This will remove all dangling images.\nAre you sure you want to continue? [y/N] ", "stdout_lines": ["", "WARNING! This will remove all dangling images.", "Are you sure you want to continue? [y/N] "]}


Version-Release number of selected component (if applicable):
tripleo-ansible-0.5.1-0.20200611113655.34b8fcc.el8ost.noarch


How reproducible:
Tried running "openstack undercloud upgrade" for the 16.0 to 16.1 undercloud update three times and it fails on the same task.


Steps to Reproduce:
1. Install 16.0 undercloud
2. Switch repos and container image details
3. Run "openstack undercloud upgrade"
4. "Running ansible upgrade tasks" sections completes successfully
5. "Running ansible post-upgrade tasks" fails at the [tripleo-podman : Clean podman images] step

Actual results:
"post-upgrade" tasks fail

Expected results:
Successful "post-upgrade" tasks

Additional info:
Before my fourth attempt, I modified tripleo-podman/tasks/tripleo_podman_purge.yml and added the -f option:

- name: Podman prune
  become: true
  block:
    - name: Clean podman images
      command: podman image prune -a -f

    - name: Clean podman volumes
      command: podman volume prune -f

On my fourth attempt, the "post-upgrade" ran to completion.

Comment 1 Dan Macpherson 2020-07-26 06:38:53 UTC
False alarm. Didn't enable EUS and was using podman 1.9.x

Comment 2 Sam Wan 2020-07-29 06:26:04 UTC
confirmed this bug on RHOSP16.1 RC

=========================================

 [root@elabdir85 ~]# more /etc/rhosp-release
Red Hat OpenStack Platform release 16.1.0 RC (Train)
 [root@elabdir85 ~]# more /usr/share/ansible/roles/tripleo-podman/tasks/tripleo_podman_purge.yml
---
# Copyright 2019 Red Hat, Inc.
# All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.


- name: Podman prune
  become: true
  block:
    - name: Clean podman images
      command: podman image prune -a

    - name: Clean podman volumes
      command: podman volume prune -f
 [root@elabdir85 ~]#
=========================================

So it should be fixed in 16.1 GA.

Comment 3 Cédric Jeanneret 2020-07-29 13:15:03 UTC
Hello there,

Small info: the new "-f" dlag shouldn't be in any OSP supported podman version. It was introduced after 1.6.4, and not backported.

So it's more a dependency (hard version requirement) issue than actual code issue.

Cheers,

C.

Comment 4 Emilien Macchi 2020-07-29 18:18:50 UTC
*** Bug 1861820 has been marked as a duplicate of this bug. ***

Comment 5 Dan Macpherson 2020-07-30 05:25:45 UTC
(In reply to Cédric Jeanneret from comment #3)
> Hello there,
> 
> Small info: the new "-f" dlag shouldn't be in any OSP supported podman
> version. It was introduced after 1.6.4, and not backported.
> 
> So it's more a dependency (hard version requirement) issue than actual code
> issue.
> 
> Cheers,
> 
> C.

Cedric, are you referring to the: "podman volume prune -f" command because that might be a separate issue?

If you were referring to the image prune command instead, yep, I closed the BZ because in my test I accidentally forgot to switch to EUS, which has podman 1.6, so ended up using podman 1.9. So I closed this BZ as a false alarm.

Sam reopened in comment #2. Not sure if Sam did the same thing I did. @Sam -- which RHEL8 repos are you using? Normal or EUS?

Comment 6 Dan Macpherson 2020-07-30 05:35:39 UTC
Also, Sam what version of podman are you using? It should be 1.6.

Comment 7 Dan Macpherson 2020-07-30 05:35:52 UTC
See also:
https://bugzilla.redhat.com/show_bug.cgi?id=1858373

Comment 8 Sam Wan 2020-07-30 06:05:34 UTC
Dear all,

When I reopened this bug, I was indeed using 'NORMAL' repos on both director and overcloud nodes.

I'm upgrading from 16.1 rc to GA version today and ran into the same issue, even after I've changed the repos to 'EUS".
I have successfully run 'openstack undercloud upgrade' and now podman on undercloud is 1.6.4
==================================
 [root@elabdir85 ~]# yum repolist --enabled
Updating Subscription Management repositories.
/usr/lib/python3.6/site-packages/dateutil/parser/_parser.py:70: UnicodeWarning: decode() called on unicode string, see https://bugzilla.redhat.com/show_bug.cgi?id=1693751
  instream = instream.decode()

repo id                                                                         repo name
ansible-2.9-for-rhel-8-x86_64-rpms                                              Red Hat Ansible Engine 2.9 for RHEL 8 x86_64 (RPMs)
fast-datapath-for-rhel-8-x86_64-rpms                                            Fast Datapath for RHEL 8 x86_64 (RPMs)
openstack-16.1-for-rhel-8-x86_64-rpms                                           Red Hat OpenStack Platform 16.1 for RHEL 8 x86_64 (RPMs)
openstack-beta-for-rhel-8-x86_64-rpms                                           Red Hat OpenStack Platform Beta for RHEL 8 x86_64 (RPMs)
rhel-8-for-x86_64-appstream-eus-rpms                                            Red Hat Enterprise Linux 8 for x86_64 - AppStream - Extended Update Support (RPMs)
rhel-8-for-x86_64-baseos-eus-rpms                                               Red Hat Enterprise Linux 8 for x86_64 - BaseOS - Extended Update Support (RPMs)
rhel-8-for-x86_64-highavailability-eus-rpms                                     Red Hat Enterprise Linux 8 for x86_64 - High Availability - Extended Update Support (RPMs)
 [root@elabdir85 ~]# podman -v
podman version 1.6.4
 [root@elabdir85 ~]#
==================================

However when I tried to upgrade overcloud controller node, it failed
==================================
TASK [tripleo-podman : Clean podman images] ************************************
Thursday 30 July 2020  13:42:38 +0800 (0:00:00.261)       0:33:54.350 *********
fatal: [elabrhosp85ctl0]: FAILED! => {"changed": true, "cmd": ["podman", "image", "prune", "-a"], "delta": "0:00:00.060719", "end": "2020-07-30 05:42:38.672008", "msg": "non-zero return code", "rc": 125, "start": "2020-07-30 05:42:38.611289", "stderr": "Error: error reading input: EOF", "stderr_lines": ["Error: error reading input: EOF"], "stdout": "\nWARNING! This will remove all dangling images.\nAre you sure you want to continue? [y/N] ", "stdout_lines": ["", "WARNING! This will remove all dangling images.", "Are you sure you want to continue? [y/N] "]}

NO MORE HOSTS LEFT *************************************************************
TASK [tripleo-podman : Clean podman images] ************************************
Thursday 30 July 2020  13:42:38 +0800 (0:00:00.261)       0:33:54.350 *********
fatal: [elabrhosp85ctl0]: FAILED! => {"changed": true, "cmd": ["podman", "image", "prune", "-a"], "delta": "0:00:00.060719", "end": "2020-07-30 05:42:38.672008", "msg": "non-zero return code", "rc": 125, "start": "2020-07-30 05:42:38.611289", "stderr": "Error: error reading input: EOF", "stderr_lines": ["Error: error reading input: EOF"], "stdout": "\nWARNING! This will remove all dangling images.\nAre you sure you want to continue? [y/N] ", "stdout_lines": ["", "WARNING! This will remove all dangling images.", "Are you sure you want to continue? [y/N] "]}

NO MORE HOSTS LEFT *************************************************************
==================================

I logged into controller node and confirmed that it's using EUS repos, but its podman is still '1.9.3'
==================================
[root@elabrhosp85ctl0 ~]# yum repolist --enabled
Updating Subscription Management repositories.
/usr/lib/python3.6/site-packages/dateutil/parser/_parser.py:70: UnicodeWarning: decode() called on unicode string, see https://bugzilla.redhat.com/show_bug.cgi?id=1693751
  instream = instream.decode()

repo id                                                                         repo name
ansible-2.9-for-rhel-8-x86_64-rpms                                              Red Hat Ansible Engine 2.9 for RHEL 8 x86_64 (RPMs)
cert-1-for-rhel-8-x86_64-rpms                                                   Red Hat Certification for RHEL 8 x86_64 (RPMs)
fast-datapath-for-rhel-8-x86_64-rpms                                            Fast Datapath for RHEL 8 x86_64 (RPMs)
openstack-16.1-for-rhel-8-x86_64-rpms                                           Red Hat OpenStack Platform 16.1 for RHEL 8 x86_64 (RPMs)
rhel-8-for-x86_64-appstream-eus-rpms                                            Red Hat Enterprise Linux 8 for x86_64 - AppStream - Extended Update Support (RPMs)
rhel-8-for-x86_64-baseos-eus-rpms                                               Red Hat Enterprise Linux 8 for x86_64 - BaseOS - Extended Update Support (RPMs)
rhel-8-for-x86_64-highavailability-eus-rpms                                     Red Hat Enterprise Linux 8 for x86_64 - High Availability - Extended Update Support (RPMs)
[root@elabrhosp85ctl0 ~]# podman -v
podman version 1.9.3
[root@elabrhosp85ctl0 ~]#
==================================

I'm not sure which part is wrong.
I'm following this guide to upgrade from 16.1 RC to GA.
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/keeping_red_hat_openstack_platform_updated/

Comment 9 Sam Wan 2020-07-30 08:54:03 UTC
Forget about it, I decide to rebuild the overcloud.

you may close this as not-a-bug
thanks and regards
Sam

Comment 10 Cédric Jeanneret 2020-07-30 08:57:38 UTC
Hello,

Sooo yeah. there's an issue where we don't lock the podman version. Would be good to check if podman-1.6.4 is at least available on the overcloud nodes.

I've pinged Delivery in order to set an upper limit to podman version in paunch dependencies[0], this should hopefully solve the issue. The patch proposed by Emilien[1] (and merged in upstream master) should *never* hit osp-16.0 nor osp-16.1, since it would allow an untested podman (1.9.3) to be deployed during an upgrade... We really, really don't want that situation.

According to this other issue[2] though, it seems to be solved now. Care to re-check? If it's OK, the upper limit thing might be a security net only - but a needed one imho.

Thanks for your feedback!

Cheers,

C.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1861777
[1] https://review.opendev.org/743760
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1858373

Comment 11 Sam Wan 2020-07-30 09:24:26 UTC
Hi,

I don't have permission to see issue[2] -  https://bugzilla.redhat.com/show_bug.cgi?id=1858373, have no idea what it is.

The reason that I decide to rebuild is that this is a testing system and there's nothing to lose.
Actually I will rebuild not only the overcloud but also undercloud.
When I installed undercloud, the images repo was 'registry.redhat.io/rhosp-beta' (GA not released yet)
After GA released, I changed the namespace to 'registry.redhat.io/rhosp-rhel8' and ran 'openstack undercloud upgrade' successfully, there're still packages from beta repo.

=======================================================================================================================================
(undercloud) [stack@elabdir85 ~]$ yum list|grep beta
2020-07-30 17:21:05,702 [ERROR] yum:64084:MainThread @logutil.py:194 - [Errno 13] Permission denied: '/var/log/rhsm/rhsm.log' - Further logging output will be written to stderr
/usr/lib/python3.6/site-packages/dateutil/parser/_parser.py:70: UnicodeWarning: decode() called on unicode string, see https://bugzilla.redhat.com/show_bug.cgi?id=1693751
  instream = instream.decode()

ansible-config_template.noarch                       1.1.1-0.20200604063431.91878d7.el8ost             @openstack-beta-for-rhel-8-x86_64-rpms
ansible-pacemaker.noarch                             1.0.4-0.20200324105818.5847167.el8ost             @openstack-beta-for-rhel-8-x86_64-rpms
ansible-role-atos-hsm.noarch                         0.1.1-0.20200318203421.1269408.el8ost             @openstack-beta-for-rhel-8-x86_64-rpms
ansible-role-chrony.noarch                           1.0.2-0.20200311064420.03e7fbe.el8ost             @openstack-beta-for-rhel-8-x86_64-rpms
ansible-role-container-registry.noarch               1.1.1-0.20200311065947.7eca2dd.el8ost             @openstack-beta-for-rhel-8-x86_64-rpms
ansible-role-openstack-operations.noarch             0.0.1-0.20200311080930.274739e.el8ost             @openstack-beta-for-rhel-8-x86_64-rpms
ansible-role-redhat-subscription.noarch              1.1.1-0.20200522204523.5f65ba4.el8ost             @openstack-beta-for-rhel-8-x86_64-rpms
ansible-role-thales-hsm.noarch                       0.2.1-0.20200311074756.2803c6c.el8ost             @openstack-beta-for-rhel-8-x86_64-rpms
ansible-role-tripleo-modify-image.noarch             1.2.1-0.20200527233426.bc21900.el8ost             @openstack-beta-for-rhel-8-x86_64-rpms
ansible-tripleo-ipa.noarch                           0.2.1-0.20200611104546.c22fc8d.el8ost             @openstack-beta-for-rhel-8-x86_64-rpms
ansible-tripleo-ipsec.noarch                         9.2.1-0.20200311073016.0c8693c.el8ost             @openstack-beta-for-rhel-8-x86_64-rpms
cpp-hocon.x86_64                                     0.1.8-2.el8ost                                    @openstack-beta-for-rhel-8-x86_64-rpms
....
=======================================================================================================================================

I'm not sure if this would cause other issue thus I decidd to reinstall from scratch.

Comment 15 Rafael Urena 2020-09-14 14:51:15 UTC
Do we know what z-stream the fix can be expected to land?

Rafael Ureña
Technical Account Manager

Comment 16 Emilien Macchi 2020-09-14 14:59:49 UTC
(In reply to Rafael Urena from comment #15)
> Do we know what z-stream the fix can be expected to land?

16.1.2

Comment 19 Jad Haj Yahya 2020-09-22 07:23:15 UTC
upgraded undercloud from 16 latest cdn to RHOS-16.1-RHEL-8-20200917.n.3 successfully

Comment 27 errata-xmlrpc 2020-10-28 15:38:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4284