Description of problem: =========================== Migration of apps from OCP 3.11 GA to OCP 4.3.5 failed as it failed to create the InitialBackup Error copying image: Error writing manifest: Error uploading manifest latest to 172.30.73.7:5000/project34/httpd-example-0: errors:\nmanifest blob unknown: blob unknown to registry\nmanifest blob unknown: blob unknown to registry\n" backup=openshift-migration/migmigration-05397-d98gg cmd=/plugins/velero-plugins logSource="/go/src/github.com/fusor/openshift-migration-plugin/velero-plugins/migimagestream/backup.go:84" pluginName=velero-plugins time="2020-03-21T14:56:21Z" level=info msg="1 errors encountered backup up item" backup=openshift-migration/migmigration-05397-d98gg group=image.openshift.io/v1 logSource="pkg/backup/resource_backupper.go:284" name=httpd-example-0 namespace=project34 resource=imagestreams Version-Release number of selected component (if applicable): ================================================================ OCP 3.11 GA: ============= [root@dell-per630-05 ~]# rpm -qa | grep openshift atomic-openshift-hyperkube-3.11.188-1.git.0.db0eaa8.el7.x86_64 atomic-openshift-clients-3.11.188-1.git.0.db0eaa8.el7.x86_64 atomic-openshift-docker-excluder-3.11.188-1.git.0.db0eaa8.el7.noarch atomic-openshift-excluder-3.11.188-1.git.0.db0eaa8.el7.noarch atomic-openshift-node-3.11.188-1.git.0.db0eaa8.el7.x86_64 OCP 4.3.5: ============= ocp_installers_index_url: https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.3.3/ ocp_rhcos_index_url: https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.3/latest/ How reproducible: Steps to Reproduce: ============================ 1. Install OCP 3.11 GA . Create Projects with apps. 2. Install OCP 4.3.5 3. Start migration from OCP 3.11 to OCP 4.3.5 Actual results: =============== Expected results: ================== Additional info: ============================= Slack channel discussion link: https://coreos.slack.com/archives/CHWBWE8LD/p1584803348279400 Registry logs from OCP 3.11 cluster: ========================================== time="2020-03-21T14:56:21.68391271Z" level=debug msg="s3aws.GetContent("/docker/registry/v2/repositories/project34/httpd-example-0/_layers/sha256/f1e56db67514d64aacc14367d514a44098[207/1837] 039444b90d1ea76c8fb4/link")" go.version=go1.13.4 http.request.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.request.host="172.30.73.7:5000" http.request.id=75bcee29 -cfe8-4cc1-b2d5-6c857b9bbac0 http.request.method=PUT http.request.remoteaddr="10.130.0.1:53848" http.request.uri="/v2/project34/httpd-example-0/manifests/latest" http.request.useragent="Go-h ttp-client/1.1" trace.duration=17.915026ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storag e/driver/base.(*Base).GetContent" trace.id=0635a6c3-8ca5-4c7e-8b68-95a053b6c694 trace.line=95 vars.name="project34/httpd-example-0" vars.reference=latest time="2020-03-21T14:56:21.708862566Z" level=debug msg="s3aws.Stat("/docker/registry/v2/blobs/sha256/f1/f1e56db67514d64aacc14367d514a44098bcafe117d4039444b90d1ea76c8fb4/data")" go.version=go1 .13.4 http.request.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.request.host="172.30.73.7:5000" http.request.id=75bcee29-cfe8-4cc1-b2d5-6c857b9bbac0 http.request.m ethod=PUT http.request.remoteaddr="10.130.0.1:53848" http.request.uri="/v2/project34/httpd-example-0/manifests/latest" http.request.useragent="Go-http-client/1.1" trace.duration=24.903232ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=513e66 69-ff42-436d-925a-8014f9b2e206 trace.line=155 vars.name="project34/httpd-example-0" vars.reference=latest time="2020-03-21T14:56:21.724017808Z" level=debug msg="s3aws.GetContent("/docker/registry/v2/repositories/project34/httpd-example-0/_layers/sha256/0e6748108ed650611fc6918b6319f0665398cb219be a0d8a9d23ba7a01b26a48/link")" go.version=go1.13.4 http.request.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.request.host="172.30.73.7:5000" http.request.id=75bcee2 9-cfe8-4cc1-b2d5-6c857b9bbac0 http.request.method=PUT http.request.remoteaddr="10.130.0.1:53848" http.request.uri="/v2/project34/httpd-example-0/manifests/latest" http.request.useragent="Go- http-client/1.1" trace.duration=15.10567ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storag e/driver/base.(*Base).GetContent" trace.id=5bda04dd-fcc5-4da3-81ab-f980d0738167 trace.line=95 vars.name="project34/httpd-example-0" vars.reference=latest time="2020-03-21T14:56:21.842753473Z" level=debug msg="s3aws.Stat("/docker/registry/v2/blobs/sha256/0e/0e6748108ed650611fc6918b6319f0665398cb219bea0d8a9d23ba7a01b26a48/data")" go.version=go1 .13.4 http.request.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.request.host="172.30.73.7:5000" http.request.id=75bcee29-cfe8-4cc1-b2d5-6c857b9bbac0 http.request.m ethod=PUT http.request.remoteaddr="10.130.0.1:53848" http.request.uri="/v2/project34/httpd-example-0/manifests/latest" http.request.useragent="Go-http-client/1.1" trace.duration=118.683178ms trace.file="/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go" trace.func="github.com/docker/distribution/registry/storage/driver/base.(*Base).Stat" trace.id=86b6a c74-4004-42c2-9ef4-f44dcd8740a8 trace.line=155 vars.name="project34/httpd-example-0" vars.reference=latest time="2020-03-21T14:56:21.84283751Z" level=error msg="response completed with error" err.code="manifest blob unknown" err.detail=sha256:b97d26121a76202c69136d5426c485adebe3b190bb6ee30a316673 cf18b73745 err.message="blob unknown to registry" go.version=go1.13.4 http.request.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.request.host="172.30.73.7:5000" htt p.request.id=75bcee29-cfe8-4cc1-b2d5-6c857b9bbac0 http.request.method=PUT http.request.remoteaddr="10.130.0.1:53848" http.request.uri="/v2/project34/httpd-example-0/manifests/latest" http.re quest.useragent="Go-http-client/1.1" http.response.contenttype="application/json; charset=utf-8" http.response.duration=436.671418ms http.response.status=400 http.response.written=319 vars.n ame="project34/httpd-example-0" vars.reference=latest
Velero pod logs: http://css-storinator-02.css.lab.eng.rdu2.redhat.com/storage/Bugzilla_info/1816180/velero_logs_from_ocp_3_11_cluster
(In reply to spandura from comment #1) > Velero pod logs: > http://css-storinator-02.css.lab.eng.rdu2.redhat.com/storage/Bugzilla_info/ > 1816180/velero_logs_from_ocp_3_11_cluster Including all logs here: http://css-storinator-02.css.lab.eng.rdu2.redhat.com/storage/Bugzilla_info/1816180/
would you please execute command 'oc get pods -n openshift-migration' on 3.11 cluster? you should see pod 'registry-migplan-k4bpb-1-vvk8j', then execute 'oc describe pod registry-migplan-k4bpb-1-vvk8j'?
Today we hit a similar problem, probably the registry-migplan-k4bpb-1-vvk8j is failed status on the 3.11 cluster side.
(In reply to Xin jiang from comment #3) > would you please execute command 'oc get pods -n openshift-migration' on > 3.11 cluster? you should see pod 'registry-migplan-k4bpb-1-vvk8j', then > execute 'oc describe pod registry-migplan-k4bpb-1-vvk8j'? [root@dell-per630-05 ~]# oc get pods -n openshift-migration NAME READY STATUS RESTARTS AGE migration-operator-5997688469-984sg 2/2 Running 0 4h registry-migplan-k4bpb-1-vvk8j 1/1 Running 0 15m restic-5hnvm 1/1 Running 0 21m restic-c2ncd 1/1 Running 0 21m restic-fln9p 1/1 Running 0 21m restic-hvpsd 1/1 Running 0 21m restic-mkkqv 1/1 Running 0 21m restic-tshvr 1/1 Running 0 21m velero-6bc8b85bf-mvrt7 1/1 Running 0 21m [root@dell-per630-05 ~]# We have tear down the setup and we don't have the output of "describe" command. All the logs related to this are http://css-storinator-02.css.lab.eng.rdu2.redhat.com/storage/Bugzilla_info/1816180/
*** Bug 1831614 has been marked as a duplicate of this bug. ***
@alay here's the new bug to focus on issues in the registry: bz1873586
I have a customer facing similar issues. I would like to precise a bit what the errors "manifest unknown: manifest unknown" or "manifest blob unknown: blob unknown to registry" mean. What I think is happening is that the migration tool goes through the images referenced in the imagestream tags and tries to pull them one after the other. For some of them it fails. It does not fail because it is sending a wrong command it fails because the image manifest of what is referenced in the imagestream (stored in etcd) does not exist in the image registry "manifest unknown: manifest unknown" or exists but references an image layer that does not exist or is corrupted: "manifest blob unknown: blob unknown to registry". If you were trying to pull the same using podman or docker you would get the same result. I wish the migration tool would rather use skopeo than trying to pull/push images: - it would be way quicker - it is able to preserve sha digests - it would not require to start an intermediary registry In any way it would really be better if the migration tool does not completely hangs when it cannot pull an image. It should flag it as failed and carries on. To mitigate the issue I recommended my customer to aggressively prune objects before migrating a project. Besides the fact that it will speed up the process it may also remove most of the imagestream tags that are not in use. The ones that are in use will most probably get successfully migrated as the migration tool is using the same pull command as what is used for deploying the image and running it as a container. If the pull command would not work the image would not get running as a container.
ew
Closing this BZ against MTC, it's clear the underlying issue is related to a registry with a backing store on NFS with a root cause that is outside the scope of MTC. In an effort to improve this, retry logic has been added to MTC to add some amount of resilience to the transfer process so that if this error does show up, a retry may be able to transparently resolve it. Additionally, since this was last filed, MTC has added direct image migrations (DIM), that may have lessened the impact of this. Please reopen with a comment if this continues to surface and there's more work to be done here, specific to MTC.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days