Bug 1816180
| Summary: | Image migration fails with: manifest blob unknown: blob unknown to registry | |||
|---|---|---|---|---|
| Product: | Migration Toolkit for Containers | Reporter: | spandura | |
| Component: | General | Assignee: | Scott Seago <sseago> | |
| Status: | CLOSED NOTABUG | QA Contact: | Xin jiang <xjiang> | |
| Severity: | low | Docs Contact: | Avital Pinnick <apinnick> | |
| Priority: | low | |||
| Version: | 1.3.0 | CC: | alpatel, apjagtap, chezhang, ernelson, fgiloux, jmatthew, mberube, mduasope, rbost, rjohnson, sregidor, sseago, whu | |
| Target Milestone: | --- | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1831614 (view as bug list) | Environment: | ||
| Last Closed: | 2021-06-30 15:20:58 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1831614 | |||
| Bug Blocks: | 1759663 | |||
|
Description
spandura
2020-03-23 13:55:17 UTC
Velero pod logs: http://css-storinator-02.css.lab.eng.rdu2.redhat.com/storage/Bugzilla_info/1816180/velero_logs_from_ocp_3_11_cluster (In reply to spandura from comment #1) > Velero pod logs: > http://css-storinator-02.css.lab.eng.rdu2.redhat.com/storage/Bugzilla_info/ > 1816180/velero_logs_from_ocp_3_11_cluster Including all logs here: http://css-storinator-02.css.lab.eng.rdu2.redhat.com/storage/Bugzilla_info/1816180/ would you please execute command 'oc get pods -n openshift-migration' on 3.11 cluster? you should see pod 'registry-migplan-k4bpb-1-vvk8j', then execute 'oc describe pod registry-migplan-k4bpb-1-vvk8j'? Today we hit a similar problem, probably the registry-migplan-k4bpb-1-vvk8j is failed status on the 3.11 cluster side. (In reply to Xin jiang from comment #3) > would you please execute command 'oc get pods -n openshift-migration' on > 3.11 cluster? you should see pod 'registry-migplan-k4bpb-1-vvk8j', then > execute 'oc describe pod registry-migplan-k4bpb-1-vvk8j'? [root@dell-per630-05 ~]# oc get pods -n openshift-migration NAME READY STATUS RESTARTS AGE migration-operator-5997688469-984sg 2/2 Running 0 4h registry-migplan-k4bpb-1-vvk8j 1/1 Running 0 15m restic-5hnvm 1/1 Running 0 21m restic-c2ncd 1/1 Running 0 21m restic-fln9p 1/1 Running 0 21m restic-hvpsd 1/1 Running 0 21m restic-mkkqv 1/1 Running 0 21m restic-tshvr 1/1 Running 0 21m velero-6bc8b85bf-mvrt7 1/1 Running 0 21m [root@dell-per630-05 ~]# We have tear down the setup and we don't have the output of "describe" command. All the logs related to this are http://css-storinator-02.css.lab.eng.rdu2.redhat.com/storage/Bugzilla_info/1816180/ *** Bug 1831614 has been marked as a duplicate of this bug. *** @alay here's the new bug to focus on issues in the registry: bz1873586 I have a customer facing similar issues. I would like to precise a bit what the errors "manifest unknown: manifest unknown" or "manifest blob unknown: blob unknown to registry" mean. What I think is happening is that the migration tool goes through the images referenced in the imagestream tags and tries to pull them one after the other. For some of them it fails. It does not fail because it is sending a wrong command it fails because the image manifest of what is referenced in the imagestream (stored in etcd) does not exist in the image registry "manifest unknown: manifest unknown" or exists but references an image layer that does not exist or is corrupted: "manifest blob unknown: blob unknown to registry". If you were trying to pull the same using podman or docker you would get the same result. I wish the migration tool would rather use skopeo than trying to pull/push images: - it would be way quicker - it is able to preserve sha digests - it would not require to start an intermediary registry In any way it would really be better if the migration tool does not completely hangs when it cannot pull an image. It should flag it as failed and carries on. To mitigate the issue I recommended my customer to aggressively prune objects before migrating a project. Besides the fact that it will speed up the process it may also remove most of the imagestream tags that are not in use. The ones that are in use will most probably get successfully migrated as the migration tool is using the same pull command as what is used for deploying the image and running it as a container. If the pull command would not work the image would not get running as a container. ew Closing this BZ against MTC, it's clear the underlying issue is related to a registry with a backing store on NFS with a root cause that is outside the scope of MTC. In an effort to improve this, retry logic has been added to MTC to add some amount of resilience to the transfer process so that if this error does show up, a retry may be able to transparently resolve it. Additionally, since this was last filed, MTC has added direct image migrations (DIM), that may have lessened the impact of this. Please reopen with a comment if this continues to surface and there's more work to be done here, specific to MTC. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |