Bug 1669096 - ImageStatus request got "Manifest does not match provided manifest" when digest is not equal to the sha256 id in name under /var/lib/containers/storage/overlay-images/images.json
Summary: ImageStatus request got "Manifest does not match provided manifest" when dige...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.1.0
Assignee: Miloslav Trmač
QA Contact: weiwei jiang
URL:
Whiteboard:
: 1671178 (view as bug list)
Depends On:
Blocks: 1955657
TreeView+ depends on / blocked
 
Reported: 2019-01-24 10:42 UTC by weiwei jiang
Modified: 2021-04-30 15:07 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Local storage of container images deduplicated them based on the "config" JSON, recording only the latest encountered manifest. Consequence: If different representations of the same underlying image (same "config") were pulled to the same node, references to these image using a manifest digest (…@sha256:…) could fail with "Manifest does not match provided manifest digest…" Fix: Local storage of manifest images now records each manifest individually, making it possible to refer to specific manifests using matching manifest digests. Result: References to images using manifest digests now work as expected.
Clone Of:
: 1955657 (view as bug list)
Environment:
Last Closed: 2019-06-04 10:42:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
/var/lib/containers/storage/overlay-images/images.json (69.14 KB, text/plain)
2019-01-24 10:43 UTC, weiwei jiang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:42:23 UTC

Description weiwei jiang 2019-01-24 10:42:34 UTC
Description of problem:
$ sudo crictl -D  inspecti image-registry.openshift-image-registry.svc:5000/4c9r7/origin-ruby22-sample@sha256:ec8e0ade25fe2ddcdb4ec8ef3a36bda087f0a13c1a9fb5458879ad7dadc133f2                                                                         
DEBU[0000] ImageStatusRequest: &ImageStatusRequest{Image:&ImageSpec{Image:image-registry.openshift-image-registry.svc:5000/4c9r7/origin-ruby22-sample@sha256:ec8e0ade25fe2ddcdb4ec8ef3a36bda087f0a13c1a9fb5458879ad7dadc133f2,},Verbose:true,}                                 
DEBU[0000] ImageStatusResponse: nil
FATA[0000] image status for "image-registry.openshift-image-registry.svc:5000/4c9r7/origin-ruby22-sample@sha256:ec8e0ade25fe2ddcdb4ec8ef3a36bda087f0a13c1a9fb5458879ad7dadc133f2" request failed: rpc error: code = Unknown desc = Manifest does not match provided manifest digest sha256:ec8e0ade25fe2ddcdb4ec8ef3a36bda087f0a13c1a9fb5458879ad7dadc133f2

Version-Release number of selected component (if applicable):
$ rpm -qa|grep -i cri-
cri-o-1.12.4-4.rhaos4.0.git8ecb249.el7.x86_64
cri-tools-1.12.0-1.rhaos4.0.git5a01d85.el7.x86_64

$ oc version 
oc v4.0.0-0.144.0
kubernetes v1.11.0+1e2c515a4e
features: Basic-Auth GSSAPI Kerberos SPNEGO

$ oc get nodes -o wide 
NAME                                         STATUS    ROLES     AGE       VERSION              INTERNAL-IP    EXTERNAL-IP      OS-IMAGE             KERNEL-VERSION              CONTAINER-RUNTIME
ip-10-0-139-220.us-east-2.compute.internal   Ready     worker    1d        v1.11.0+1e2c515a4e   10.0.139.220   <none>           Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7
ip-10-0-157-104.us-east-2.compute.internal   Ready     worker    1d        v1.11.0+1e2c515a4e   10.0.157.104   <none>           Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7
ip-10-0-167-30.us-east-2.compute.internal    Ready     worker    1d        v1.11.0+1e2c515a4e   10.0.167.30    <none>           Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7
ip-10-0-23-48.us-east-2.compute.internal     Ready     master    1d        v1.11.0+1e2c515a4e   10.0.23.48     3.16.255.16      Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7
ip-10-0-37-217.us-east-2.compute.internal    Ready     master    1d        v1.11.0+1e2c515a4e   10.0.37.217    18.224.190.214   Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7
ip-10-0-8-244.us-east-2.compute.internal     Ready     master    1d        v1.11.0+1e2c515a4e   10.0.8.244     18.188.5.52      Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7


How reproducible:
Sometimes

Steps to Reproduce(not stable):
1. oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/sample-app/application-template-stibuild.json
2.
3.

Actual results:
Pod got CreateContainerError during SyncPod


Expected results:
Pod should be running

Additional info:
/var/lib/containers/storage/overlay-images/images.json is attached

Comment 1 weiwei jiang 2019-01-24 10:43:45 UTC
Created attachment 1523054 [details]
/var/lib/containers/storage/overlay-images/images.json

Comment 2 weiwei jiang 2019-01-24 10:46:24 UTC
Seems like same with https://bugzilla.redhat.com/show_bug.cgi?id=1546324

Comment 4 Miloslav Trmač 2019-01-28 15:47:07 UTC
Is there a consistent reproducer? If it can be reproduced only sometimes, as the report says, would it be possible to give me access to the registry when it does happen?

Comment 7 Seth Jennings 2019-01-31 01:37:35 UTC
*** Bug 1671178 has been marked as a duplicate of this bug. ***

Comment 9 Colin Walters 2019-02-08 16:09:46 UTC
This is a constant pain for my development workflow today.  I hit this while developing containers that run on the master.

What I've been doing to work around this is:

# To ensure we can kill pods referencing the corrupted images
# and we won't race with kublet to schedule them back
$ kubectl taint nodes osiris-master-0 walters=foo:NoSchedule
$ oc delete pods/xxx

(ssh to node)
# podman images -q | xargs podman rmi

# And remove the taint, allowing the pod to get rescheduled
$ kubectl taint nodes osiris-master-0 walters:NoSchedule-

Comment 15 weiwei jiang 2019-02-14 08:31:57 UTC
Checked with 
# cat /etc/os-release 
NAME="Red Hat CoreOS"
VERSION="4.0"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION_ID="4.0"
PRETTY_NAME="Red Hat CoreOS 4.0"
ANSI_COLOR="0;31"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat 7"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.0"
REDHAT_SUPPORT_PRODUCT="Red Hat"
REDHAT_SUPPORT_PRODUCT_VERSION="4.0"
OSTREE_VERSION=47.315

# rpm -qa|grep -i cri-o
cri-o-1.12.5-6.rhaos4.0.git80d1487.el7.x86_64

And the issue has been fixed now.


# cat /var/lib/containers/storage/overlay-images/images.json |python -m json.tool
<------------snip--------->
    {
        "big-data-digests": {
            "manifest": "sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d",
            "manifest-sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d": "sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d"
        },
        "big-data-names": [
            "manifest-sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d",
            "manifest"
        ],
        "big-data-sizes": {
            "manifest": 3862,
            "manifest-sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d": 3862
        },
        "created": "2018-11-02T18:24:31.956261005Z",
        "digest": "sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d",
        "id": "b02de22ff740f0bfa7e5dde5aa1a8169051375a5f0c69c28fafefc9408f72b06",
        "layer": "b6f3704e8e7510f3cd5d5d2b439ecc72815f85b02c608bf370d82636cccab6ca",
        "metadata": "{}",
        "names": [
            "quay.io/coreos/kube-client-agent:36c62ccd7b16b522450c61e96fc556b217ee24f5"
        ]
    },
<------------snip--------->

# crictl -D inspecti quay.io/coreos/kube-client-agent:36c62ccd7b16b522450c61e96fc556b217ee24f5
DEBU[0000] ImageStatusRequest: &ImageStatusRequest{Image:&ImageSpec{Image:quay.io/coreos/kube-client-agent:36c62ccd7b16b522450c61e96fc556b217ee24f5,},Verbose:true,} 
DEBU[0000] ImageStatusResponse: &ImageStatusResponse{Image:&Image{Id:b02de22ff740f0bfa7e5dde5aa1a8169051375a5f0c69c28fafefc9408f72b06,RepoTags:[quay.io/coreos/kube-client-agent:36c62ccd7b16b522450c61e96fc556b217ee24f5],RepoDigests:[quay.io/coreos/kube-client-agent@sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d],Size_:33806892,Uid:nil,Username:,},Info:map[string]string{},} 
{
  "status": {
    "id": "b02de22ff740f0bfa7e5dde5aa1a8169051375a5f0c69c28fafefc9408f72b06",
    "repoTags": [
      "quay.io/coreos/kube-client-agent:36c62ccd7b16b522450c61e96fc556b217ee24f5"
    ],
    "repoDigests": [
      "quay.io/coreos/kube-client-agent@sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d"
    ],
    "size": "33806892",
    "uid": null,
    "username": ""
  }
}

Comment 18 errata-xmlrpc 2019-06-04 10:42:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.