Description of problem: Customer performed minor update from RHOSP13z11 to 13z12. The update went successfull, but it was observed that some of the container's images were not pulled correctly. Version-Release number of selected component (if applicable): RHOSP13z12 Satellite 6.x is used as image registry How reproducible: - Container images are prepared with the tag 'latest'. - Minor update from 13z11 to 13z12 is performed. Steps to Reproduce: 1. 2. 3. Actual results: Latest image is not pulled during update for some of the containers we Identified 2 container in existing customer setup. - neutron_ovs_agent - nova_libvirt Expected results: It is expected that all the containers should be updated to the latest images present in satellite content view. Additional info: - Same satellite content view is used when scale-out of new compute is done and it is observed that newly scaled out computes having containers with latest packages in them. - Manually doing the docker pull with tag 'latest' is pulling the latest image for those containers but it seems that it is not happening during minor update. [root@com003 ~]# docker pull satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent:latest Trying to pull repository satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent ... latest: Pulling from satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent Digest: sha256:71da2d60264996c80eec418371322ef7c2856e030aae054fff967b05afb88fb1 Status: Image is up to date for satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent:latest
Hi, so I tested that behavior using latest on a test environment: 1. create a new image tag named latest for rh-osbs/rhosp13-openstack-nova-compute 2. adjust image name in heat DockerNovaComputeImage, DockerNovaLibvirtConfigImage 3. run prepare 4. run update run --limit compute-0 The compute node had a tag 20200903.1 previously so the update went find and the container was updated. Then I repeated 1.,2.,3.,4. So I have a new image id but still referenced by "latest" The compute node pulled in the "latest" image but didn't update the nova_compute and nova_migration_target container. This reproduce the issue seen there. To be complete, some container can still be updated if anything in their "configuration" (as done by puppet) changes. That would explain why we have some container updated and other not. Note that the paunch (the tool that recreate the container) specify that "latest" should be avoided [1] I let DFG:DF decide for the next steps. The "workaround" would be to use tag and not "latest". [1] https://github.com/openstack/paunch/blob/stable/queens/README.rst
I'm looking at how to patch paunch to handle this case. Our recommendation is to not use latest for now. There isn't a simple workaround for this at this time. A more complex work around would be to could manually remove the container and then rerun paunch to rebuild it.
*** Bug 1879753 has been marked as a duplicate of this bug. ***
By the way from the original bug report: [root@com003 ~]# docker pull satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent:latest Trying to pull repository satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent ... latest: Pulling from satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent Digest: sha256:71da2d60264996c80eec418371322ef7c2856e030aae054fff967b05afb88fb1 Status: Image is up to date for satellite.deployment.be:5000/cloud-oscar05-pxs_osp13-osp13_containers-neutron-openvswitch-agent:latest This indicates that it was pulled already and matches whatever the source was. The container instance may not have been recreated to consume the latest image, but the image was updated.
For the record, the part that prevents the updated image from being fetch is this bit of code: https://opendev.org/openstack/paunch/src/branch/stable/queens/paunch/builder/compose1.py#L293-L295 # only pull if the image does not exist locally if self.runner.inspect(image, format='exists', type='image'): continue We only fetch the image if it doesn't exist on disk. I'll have to see if there's another bit of code that should be fetching the container. Alternatively if you remove the image on disk or pre-fetch it, the other patch will restart the container.
Executed test plan from comment 14 and saw expected output: [heat-admin@compute-0 ~]$ sudo docker exec -ti -u root logrotate_crond ls bin bz1879531 etc lib media opt root run_command srv tmp var boot dev home lib64 mnt proc run sbin sys usr
This fix is causing the following error in FFU 13 to 16.1 : https://bugzilla.redhat.com/show_bug.cgi?id=1897169 . We did try to solve the problem with https://review.opendev.org/#/c/762649/ but the way OSP13 paunch works is causing bigger problems, as described in https://bugzilla.redhat.com/show_bug.cgi?id=1898503 In the end, the solution for this problem (which in my opinion isn't a good practice, having a latest tag for your containers) is blocking completely the FFU procedure. In my humble opinion, we need to take a step back and try to see how to solve this issue without impacting other areas. Moving back the bugzilla into ON_DEV until we have a clearer idea on how to proceed.
The issue is not with this patch. The issue is with the requirement of a hybrid state. We can talk about that over there, but the expectation needs to be that we fetch the provided version always to ensure that the container state is expected.
Thanks Alex for the clarification, we will solve the problem in the corresponding OSP16.1 BZ. It is now clear that this BZ is legit and the fix was needed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 13.0 director bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5575
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days