Created attachment 1672682 [details] logs Description of problem: When CAM tries to migrate imagestreams created with import-image command, and those imagestreams are tagged, the migration fails. Version-Release number of selected component (if applicable): 1.1.2 CAM How reproducible: Always Steps to Reproduce: 1. Create these resources oc new-project ocp-28021-istags oc -n ocp-28021-istags import-image internal-alpine:int --from docker.io/alpine:latest --confirm oc -n ocp-28021-istags import-image internal-alpine:3.9 --from docker.io/alpine:3.9 --confirm oc -n ocp-28021-istags tag internal-alpine:int internal-alpine:tag1 --alias=False oc -n ocp-28021-istags tag internal-alpine:int internal-alpine:tag2 --alias=False oc -n ocp-28021-istags tag internal-alpine:3.9 internal-alpine:tag3 --alias=False oc -n ocp-28021-istags tag internal-alpine:3.9 internal-alpine:latest --alias=False oc -n ocp-28021-istags tag internal-alpine:3.9 internal-alpine:tag4 --alias=False 2. Migrate the namespace Actual results: The migration fails with this error time="2020-03-20T13:53:39Z" level=info msg="error restoring internal-alpine:tag2: ImageStream.image.openshift.io \"internal-alpine\" is invalid: []: Int ernal error: imagestreams \"internal-alpine\" is invalid: spec.tags[tag2].from.name: Invalid value: \"internal-alpine@sha256:ddba4d27a7ffc3f86dd6c2f9204 1af252a1f23a8e742c90e6e1297bfa1bc0c45\": error generating tag event: imagestreamimage.image.openshift.io \"sha256:ddba4d27a7ffc3f86dd6c2f92041af252a1f23 a8e742c90e6e1297bfa1bc0c45\" not found" logSource="pkg/restore/restore.go:1079" restore=openshift-migration/ocp-28021-istags-mig-1584712008-llptj Expected results: The migration should succeed and the imagestream and its tags should be migrated to the target cluster. Additional info:
*** Bug 1816151 has been marked as a duplicate of this bug. ***
This seems to be another example of the race condition which occurs because there are interdependencies among the different imagestreamtags being restored. The tag that the oc tagged istag points to must be migrated first. There are actually several related issues here: 1) When migrating internal images, the actual imagestreamtag isn't created until the docker image copy has completed. The fix for this is to move internal image copy to the stage backup/restore. 2) When migrating imagestreamtags, we need to detect required references in the restore plugin and return these via the AdditionalItems hash a) For alias tags, we can just grab this associated resource referenced in istag.Tag.From. b) For non-alias tags, we don't have a direct reference. We only have an ImageStreamImage reference, which is not directly migrated. For this case, we have to add an annotation on backup. Find all istags for which istag.Image.Metadata.Name matches this istag.Tag.From.Name, filter out istags which are themselves ImageStreamTag or ImageStreamImage references, and add an annotation referencing the istag that remains, if any. 3) There's an upstream issue created -- even with the AdditionalItems entry, there's a race condition. The upstream issue needs to be fixed, and then our plugins modified to take advantage of the new "wait" method on the plugin to make sure the required istag is created and ready before restoring this istag. Upstream issue: https://github.com/vmware-tanzu/velero/issues/1350 4) If the referenced ImageStreamTag/non-reference-tag-ImageStreamImage doesn't exist, then restore will fail. When 2a), 2b) above end up not finding the referenced resource, we need to strip the istag.tag element on backup.
Aligning to next release to consider.
It looks like this new report is consistent with the previous situations we've seen this in. See comment 2 for what we're planning to implement in CAM 1.3 -- there are actually a few related bugs which work together to cause this.
Partial fix for this bug is here: https://github.com/konveyor/openshift-velero-plugin/pull/57 This ensures that velero restores the istags in the appropriate order, taking care of dependencies. The rest of the fix requires upstream work (design done; implementation is next) to force velero to wait until the required items are restored and available before attempting to restore the dependent items.
Fix posted at https://github.com/konveyor/velero/pull/83 and https://github.com/konveyor/openshift-velero-plugin/pull/71
Addressing this issue requires upgrading to a newer version of Velero, Velero 1.10, implying OADP ~1.2. MTC requires OADP 1.0 and is not compatible at present time with never releases of OADP/Velero. As MTC has requirements for supporting a wide range of OCP releases down to 3.11 we are more sensitive to upgrading to newer versions of Velero. I am marking this CLOSED WONTFIX. If anyone feels strongly against this please add a comment and let us know so we may reevaluate.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days