Bug 1669096

Summary: ImageStatus request got "Manifest does not match provided manifest" when digest is not equal to the sha256 id in name under /var/lib/containers/storage/overlay-images/images.json
Product: OpenShift Container Platform Reporter: weiwei jiang <wjiang>
Component: ContainersAssignee: Miloslav Trmač <mitr>
Status: CLOSED ERRATA QA Contact: weiwei jiang <wjiang>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: amurdaca, aos-bugs, dwalsh, eparis, jokerman, mitr, mmccomas, mpatel, nalin, sponnaga, wabouham, walters, xtian, xxia, yinzhou
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Local storage of container images deduplicated them based on the "config" JSON, recording only the latest encountered manifest. Consequence: If different representations of the same underlying image (same "config") were pulled to the same node, references to these image using a manifest digest (…@sha256:…) could fail with "Manifest does not match provided manifest digest…" Fix: Local storage of manifest images now records each manifest individually, making it possible to refer to specific manifests using matching manifest digests. Result: References to images using manifest digests now work as expected.
Story Points: ---
Clone Of:
: 1955657 (view as bug list) Environment:
Last Closed: 2019-06-04 10:42:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1955657    
Attachments:
Description Flags
/var/lib/containers/storage/overlay-images/images.json none

Description weiwei jiang 2019-01-24 10:42:34 UTC
Description of problem:
$ sudo crictl -D  inspecti image-registry.openshift-image-registry.svc:5000/4c9r7/origin-ruby22-sample@sha256:ec8e0ade25fe2ddcdb4ec8ef3a36bda087f0a13c1a9fb5458879ad7dadc133f2                                                                         
DEBU[0000] ImageStatusRequest: &ImageStatusRequest{Image:&ImageSpec{Image:image-registry.openshift-image-registry.svc:5000/4c9r7/origin-ruby22-sample@sha256:ec8e0ade25fe2ddcdb4ec8ef3a36bda087f0a13c1a9fb5458879ad7dadc133f2,},Verbose:true,}                                 
DEBU[0000] ImageStatusResponse: nil
FATA[0000] image status for "image-registry.openshift-image-registry.svc:5000/4c9r7/origin-ruby22-sample@sha256:ec8e0ade25fe2ddcdb4ec8ef3a36bda087f0a13c1a9fb5458879ad7dadc133f2" request failed: rpc error: code = Unknown desc = Manifest does not match provided manifest digest sha256:ec8e0ade25fe2ddcdb4ec8ef3a36bda087f0a13c1a9fb5458879ad7dadc133f2

Version-Release number of selected component (if applicable):
$ rpm -qa|grep -i cri-
cri-o-1.12.4-4.rhaos4.0.git8ecb249.el7.x86_64
cri-tools-1.12.0-1.rhaos4.0.git5a01d85.el7.x86_64

$ oc version 
oc v4.0.0-0.144.0
kubernetes v1.11.0+1e2c515a4e
features: Basic-Auth GSSAPI Kerberos SPNEGO

$ oc get nodes -o wide 
NAME                                         STATUS    ROLES     AGE       VERSION              INTERNAL-IP    EXTERNAL-IP      OS-IMAGE             KERNEL-VERSION              CONTAINER-RUNTIME
ip-10-0-139-220.us-east-2.compute.internal   Ready     worker    1d        v1.11.0+1e2c515a4e   10.0.139.220   <none>           Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7
ip-10-0-157-104.us-east-2.compute.internal   Ready     worker    1d        v1.11.0+1e2c515a4e   10.0.157.104   <none>           Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7
ip-10-0-167-30.us-east-2.compute.internal    Ready     worker    1d        v1.11.0+1e2c515a4e   10.0.167.30    <none>           Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7
ip-10-0-23-48.us-east-2.compute.internal     Ready     master    1d        v1.11.0+1e2c515a4e   10.0.23.48     3.16.255.16      Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7
ip-10-0-37-217.us-east-2.compute.internal    Ready     master    1d        v1.11.0+1e2c515a4e   10.0.37.217    18.224.190.214   Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7
ip-10-0-8-244.us-east-2.compute.internal     Ready     master    1d        v1.11.0+1e2c515a4e   10.0.8.244     18.188.5.52      Red Hat CoreOS 4.0   3.10.0-957.1.3.el7.x86_64   cri-o://1.12.4-4.rhaos4.0.git8ecb249.el7


How reproducible:
Sometimes

Steps to Reproduce(not stable):
1. oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/sample-app/application-template-stibuild.json
2.
3.

Actual results:
Pod got CreateContainerError during SyncPod


Expected results:
Pod should be running

Additional info:
/var/lib/containers/storage/overlay-images/images.json is attached

Comment 1 weiwei jiang 2019-01-24 10:43:45 UTC
Created attachment 1523054 [details]
/var/lib/containers/storage/overlay-images/images.json

Comment 2 weiwei jiang 2019-01-24 10:46:24 UTC
Seems like same with https://bugzilla.redhat.com/show_bug.cgi?id=1546324

Comment 4 Miloslav Trmač 2019-01-28 15:47:07 UTC
Is there a consistent reproducer? If it can be reproduced only sometimes, as the report says, would it be possible to give me access to the registry when it does happen?

Comment 7 Seth Jennings 2019-01-31 01:37:35 UTC
*** Bug 1671178 has been marked as a duplicate of this bug. ***

Comment 9 Colin Walters 2019-02-08 16:09:46 UTC
This is a constant pain for my development workflow today.  I hit this while developing containers that run on the master.

What I've been doing to work around this is:

# To ensure we can kill pods referencing the corrupted images
# and we won't race with kublet to schedule them back
$ kubectl taint nodes osiris-master-0 walters=foo:NoSchedule
$ oc delete pods/xxx

(ssh to node)
# podman images -q | xargs podman rmi

# And remove the taint, allowing the pod to get rescheduled
$ kubectl taint nodes osiris-master-0 walters:NoSchedule-

Comment 15 weiwei jiang 2019-02-14 08:31:57 UTC
Checked with 
# cat /etc/os-release 
NAME="Red Hat CoreOS"
VERSION="4.0"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION_ID="4.0"
PRETTY_NAME="Red Hat CoreOS 4.0"
ANSI_COLOR="0;31"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat 7"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.0"
REDHAT_SUPPORT_PRODUCT="Red Hat"
REDHAT_SUPPORT_PRODUCT_VERSION="4.0"
OSTREE_VERSION=47.315

# rpm -qa|grep -i cri-o
cri-o-1.12.5-6.rhaos4.0.git80d1487.el7.x86_64

And the issue has been fixed now.


# cat /var/lib/containers/storage/overlay-images/images.json |python -m json.tool
<------------snip--------->
    {
        "big-data-digests": {
            "manifest": "sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d",
            "manifest-sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d": "sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d"
        },
        "big-data-names": [
            "manifest-sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d",
            "manifest"
        ],
        "big-data-sizes": {
            "manifest": 3862,
            "manifest-sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d": 3862
        },
        "created": "2018-11-02T18:24:31.956261005Z",
        "digest": "sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d",
        "id": "b02de22ff740f0bfa7e5dde5aa1a8169051375a5f0c69c28fafefc9408f72b06",
        "layer": "b6f3704e8e7510f3cd5d5d2b439ecc72815f85b02c608bf370d82636cccab6ca",
        "metadata": "{}",
        "names": [
            "quay.io/coreos/kube-client-agent:36c62ccd7b16b522450c61e96fc556b217ee24f5"
        ]
    },
<------------snip--------->

# crictl -D inspecti quay.io/coreos/kube-client-agent:36c62ccd7b16b522450c61e96fc556b217ee24f5
DEBU[0000] ImageStatusRequest: &ImageStatusRequest{Image:&ImageSpec{Image:quay.io/coreos/kube-client-agent:36c62ccd7b16b522450c61e96fc556b217ee24f5,},Verbose:true,} 
DEBU[0000] ImageStatusResponse: &ImageStatusResponse{Image:&Image{Id:b02de22ff740f0bfa7e5dde5aa1a8169051375a5f0c69c28fafefc9408f72b06,RepoTags:[quay.io/coreos/kube-client-agent:36c62ccd7b16b522450c61e96fc556b217ee24f5],RepoDigests:[quay.io/coreos/kube-client-agent@sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d],Size_:33806892,Uid:nil,Username:,},Info:map[string]string{},} 
{
  "status": {
    "id": "b02de22ff740f0bfa7e5dde5aa1a8169051375a5f0c69c28fafefc9408f72b06",
    "repoTags": [
      "quay.io/coreos/kube-client-agent:36c62ccd7b16b522450c61e96fc556b217ee24f5"
    ],
    "repoDigests": [
      "quay.io/coreos/kube-client-agent@sha256:d68f85b5ca3adccdc2f4a4c5263f1792798ed44a9b1d63a96004b6e283dc338d"
    ],
    "size": "33806892",
    "uid": null,
    "username": ""
  }
}

Comment 18 errata-xmlrpc 2019-06-04 10:42:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758