Description of problem: upgrade OCP on Atomic Host 7.4.5 failed upgrade succeeds if on Atomic Host 7.5.3 I did not see upgrade AH OS is necessary in doc https://docs.openshift.com/container-platform/3.10/upgrading/index.html if it is necessary, it is better let playbook do it, or give message about it OCP v3.9 on AH 7.4.5 upgrade successfully to v3.10 Version-Release number of the following components: openshift-ansible-3.11.12-1.git.0.0c64f7a.el7.noarch Kernel Version: 3.10.0-693.21.1.el7.x86_64 Operating System: Red Hat Enterprise Linux Atomic Host 7.4.5 How reproducible: Always Steps to Reproduce: 1. Install OCP v3.10 on Atomic Host 7.4.5 2. Upgrade to v3.11 Actual results: Upgrade failed Failure summary: 1. Hosts: wmengugah745ol-node-1.0921-hb2.qe.rhcloud.com, wmengugah745ol-node-registry-router-1.0921-hb2.qe.rhcloud.com Play: Update registry authentication credentials Task: Install or Update node system container Message: time="2018-09-21T08:10:07Z" level=fatal msg="Error: blob sha256:367d845540573038025f445c654675aa63905ec8682938fb45bc00f40849c37b is already present, but with size 200670683 instead of 74930327" 2. Hosts: wmengugah745ol-master-etcd-1.0921-hb2.qe.rhcloud.com Play: Update registry authentication credentials Task: Install or Update node system container Message: time="2018-09-21T08:10:09Z" level=fatal msg="Error: blob sha256:367d845540573038025f445c654675aa63905ec8682938fb45bc00f40849c37b is already present, but with size 200670683 instead of 74930327" Expected results: Upgrade succeeded Additional info: Please attach logs from ansible-playbook with the -vvv flag
This is failing in a module call that updates the system container using the atomic command. Moving over to containers team. Here's the module call https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/tasks/node_system_container_install.yml#L2-L28 Here's the source for that module https://github.com/openshift/openshift-ansible/blob/master/roles/lib_openshift/library/oc_atomic_container.py
We should test this using Atomic Host 7.5 as minimum version since that was required by 3.10. https://access.redhat.com/articles/2176281#comment-1326561
This is failing in containes/image Copy method, not sure where skopeo is being used or containers/image. Does anyone know that? Miloslav, do you know what's happening?
Failure happens during this call to "atomic install" https://github.com/openshift/openshift-ansible/blob/master/roles/lib_openshift/library/oc_atomic_container.py#L81 which in turn calls into "skopeo copy" (iirc, Giuseppe?). Figuring out why we're hitting this corner case and how to solve it.
I think the issue is caused by the old version of skopeo present on AH 7.4.5 that didn't correctly report the layer size from the ostree storage. As a workaround the metadata of the system containers branches can be deleted, forcing to fully re-fetch the images: "ostree refs --delete ociimage"
Per discussion with Mrunal; now that a workaround has been identified, we will defer this to 3.11.z.
alright, so for 3.11.z this is going to be just a matter of using a newer skopeo, correct? Lokesh, could you look into building a newer skopeo?
it works if both the skopeo used to install and upgrade OCP are updated. An updated skopeo will still fail to upgrade if OCP was installed used the old version.
this has been fixed
Checked with v3.10.127 upgrade to v3.11.98 with atomic host 7.4.5 and not met this issue, so move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1605