1879531 – RHOSP13- OSP13z12 upgrade - new docker image not pulled for neutron_ovs_agent and some other containers

Bug 1879531 - RHOSP13- OSP13z12 upgrade - new docker image not pulled for neutron_ovs_agent and some other containers

Summary: RHOSP13- OSP13z12 upgrade - new docker image not pulled for neutron_ovs_agent...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-paunch
Sub Component:
Version:	13.0 (Queens)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	z14
Target Release:	13.0 (Queens)
Assignee:	Emilien Macchi
QA Contact:	nlevinki
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1879753 (view as bug list)
Depends On:
Blocks:	1897169
TreeView+	depends on / blocked

Reported:	2020-09-16 13:24 UTC by Shravan Kumar Tiwari
Modified:	2024-06-13 23:05 UTC (History)
CC List:	15 users (show)
Fixed In Version:	python-paunch-2.5.3-9.el7ost
Doc Type:	Bug Fix
Doc Text:	Before this update, when a Red Hat OpenStack Platform (RHOSP) user configured their containers to use the `latest` tag, RHOSP did not fetch nor rebuild these containers to use the updated container images. + With this update the issue is resolved. Now, anytime a user runs a deployment action (including an update), RHOSP always fetches the container images and checks the image ID for each running container to determine if it should be rebuilt to consume the latest image. RHOSP restarts any containers that it updates. + IMPORTANT: This update is a change from previous versions for how RHOSP manages container updates. In past versions, RHOSP would only check if the image existed. Now, RHOSP always refreshes containers during deployment actions, and restarts any containers that it updates. For this reason, you should not reuse tags, like `latest` and always use the `--tag-from-labels` option, unless you are controlling container tags with a Red Hat Satellite deployment.
Clone Of:
Environment:
Last Closed:	2020-12-16 13:55:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1895974	None	None	None	2020-09-17 13:13:43 UTC
OpenStack gerrit	752569	None	MERGED	Trigger container update on image id update	2021-02-15 16:11:16 UTC
OpenStack gerrit	753082	None	MERGED	[QUEENS Only] Always do a image pull	2021-02-15 16:11:16 UTC
Red Hat Issue Tracker	OSP-15104	None	None	None	2022-05-10 13:03:45 UTC
Red Hat Product Errata	RHBA-2020:5575	None	None	None	2020-12-16 13:55:26 UTC

Description Shravan Kumar Tiwari 2020-09-16 13:24:34 UTC

Description of problem:
Customer performed minor update from RHOSP13z11 to 13z12. The update went successfull, but it was observed that some of the container's images were not pulled correctly.

Version-Release number of selected component (if applicable):
RHOSP13z12
Satellite 6.x is used as image registry

How reproducible:
- Container images are prepared with the tag 'latest'.
- Minor update from 13z11 to 13z12 is performed.

Steps to Reproduce:
1.
2.
3.

Actual results:
Latest image is not pulled during update for some of the containers
we Identified 2 container in existing customer setup.
- neutron_ovs_agent
- nova_libvirt

Expected results:
It is expected that all the containers should be updated to the latest images present in satellite content view.

Additional info:

- Same satellite content view is used when scale-out of new compute is done and it is observed that newly scaled out computes having containers with latest packages in them.
- Manually doing the docker pull with tag 'latest' is pulling the latest image for those containers but it seems that it is not happening during minor update.

[root@com003 ~]# docker pull satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent:latest
Trying to pull repository satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent ...
latest: Pulling from satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent
Digest: sha256:71da2d60264996c80eec418371322ef7c2856e030aae054fff967b05afb88fb1
Status: Image is up to date for satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent:latest

Comment 2 Sofer Athlan-Guyot 2020-09-16 13:44:49 UTC

Hi,

so I tested that behavior using latest on a test environment:

  1. create a new image tag named latest for rh-osbs/rhosp13-openstack-nova-compute
  2. adjust image name in heat DockerNovaComputeImage, DockerNovaLibvirtConfigImage
  3. run prepare
  4. run update run --limit compute-0

The compute node had a tag 20200903.1 previously so the update went find and the container was updated.

Then I repeated 1.,2.,3.,4. So I have a new image id but still referenced by "latest"

The compute node pulled in the "latest" image but didn't update the nova_compute and nova_migration_target container.

This reproduce the issue seen there.  To be complete, some container can still be updated if anything in their "configuration" (as done by puppet) changes.

That would explain why we have some container updated and other not.

Note that the paunch (the tool that recreate the container) specify that "latest" should be avoided [1]

I let DFG:DF decide for the next steps.  The "workaround" would be to use tag and not "latest".

[1] https://github.com/openstack/paunch/blob/stable/queens/README.rst

Comment 3 Alex Schultz 2020-09-17 13:09:03 UTC

I'm looking at how to patch paunch to handle this case. Our recommendation is to not use latest for now. There isn't a simple workaround for this at this time. A more complex work around would be to could manually remove the container and then rerun paunch to rebuild it.

Comment 4 Alex Schultz 2020-09-17 22:09:10 UTC

*** Bug 1879753 has been marked as a duplicate of this bug. ***

Comment 7 Alex Schultz 2020-09-18 22:02:40 UTC

By the way from the original bug report:

[root@com003 ~]# docker pull satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent:latest
Trying to pull repository satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent ...
latest: Pulling from satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent
Digest: sha256:71da2d60264996c80eec418371322ef7c2856e030aae054fff967b05afb88fb1
Status: Image is up to date for satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent:latest

This indicates that it was pulled already and matches whatever the source was. The container instance may not have been recreated to consume the latest image, but the image was updated.

Comment 9 Alex Schultz 2020-09-21 15:36:50 UTC

For the record, the part that prevents the updated image from being fetch is this bit of code:

https://opendev.org/openstack/paunch/src/branch/stable/queens/paunch/builder/compose1.py#L293-L295

            # only pull if the image does not exist locally
            if self.runner.inspect(image, format='exists', type='image'):
                continue

We only fetch the image if it doesn't exist on disk. I'll have to see if there's another bit of code that should be fetching the container. Alternatively if you remove the image on disk or pre-fetch it, the other patch will restart the container.

Comment 24 David Rosenfeld 2020-11-10 21:07:05 UTC

Executed test plan from comment 14 and saw expected output:

[heat-admin@compute-0 ~]$ sudo docker exec -ti -u root logrotate_crond ls
bin   bz1879531  etc   lib    media  opt   root  run_command  srv  tmp	var
boot  dev	 home  lib64  mnt    proc  run	 sbin	      sys  usr

Comment 25 Jose Luis Franco 2020-11-17 11:59:08 UTC

This fix is causing the following error in FFU 13 to 16.1 : https://bugzilla.redhat.com/show_bug.cgi?id=1897169 . We did try to solve the problem with https://review.opendev.org/#/c/762649/ but the way OSP13 paunch works is causing bigger problems, as described in https://bugzilla.redhat.com/show_bug.cgi?id=1898503 

In the end, the solution for this problem (which in my opinion isn't a good practice, having a latest tag for your containers) is blocking completely the FFU procedure. In my humble opinion, we need to take a step back and try to see how to solve this issue without impacting other areas. Moving back the bugzilla into ON_DEV until we have a clearer idea on how to proceed.

Comment 26 Alex Schultz 2020-11-17 14:32:55 UTC

The issue is not with this patch. The issue is with the requirement of a hybrid state.  We can talk about that over there, but the expectation needs to be that we fetch the provided version always to ensure that the container state is expected.

Comment 27 Jose Luis Franco 2020-11-18 14:20:49 UTC

Thanks Alex for the clarification, we will solve the problem in the corresponding OSP16.1 BZ. It is now clear that this BZ is legit and the fix was needed.

Comment 34 errata-xmlrpc 2020-12-16 13:55:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 13.0 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5575

Comment 40 Red Hat Bugzilla 2023-09-15 01:30:36 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.