Bug 1845766

Summary: Image info should display information about images: pod "append-test" failed with reason: "", message: ""
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Image RegistryAssignee: Oleg Bulatov <obulatov>
Status: CLOSED ERRATA QA Contact: Wenjing Zheng <wzheng>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.6CC: aos-bugs, pasik
Target Milestone: ---Keywords: Reopened
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
operator.Create the release image "latest" containing all images built by this job
Last Closed: 2020-10-27 16:06:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description W. Trevor King 2020-06-10 02:33:23 UTC
test:
[sig-imageregistry][Feature:ImageInfo] Image info should display information about images [Suite:openshift/conformance/parallel]

is failing frequently in CI, see search results:
$ w3m -dump -cols 200 'https://search.svc.ci.openshift.org/?maxAge=48h&name=release-openshift-&search=pod%20%22append-test%22%20failed%20with%20reason:%20%22%22,%20message:%20%22%22' | grep 'failures match'
release-openshift-origin-installer-e2e-gcp-4.4 - 47 runs, 38% failed, 6% of failures match
release-openshift-ocp-installer-e2e-aws-4.1 - 3 runs, 67% failed, 50% of failures match
release-openshift-ocp-installer-e2e-azure-4.6 - 5 runs, 60% failed, 33% of failures match

Example job [1], which fails with:

fail [k8s.io/kubernetes/test/e2e/framework/pods.go:200]: wait for pod "append-test" to success
Expected success, but got an error:
    <*errors.errorString | 0xc0023ca2e0>: {
        s: "pod \"append-test\" failed with reason: \"\", message: \"\"",
    }
    pod "append-test" failed with reason: "", message: ""

[1]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.6/74

Comment 1 W. Trevor King 2020-06-10 03:07:35 UTC
Also submitted an upstream PR for the "to success" -> "to succeed" typo [1].

[1]: https://github.com/kubernetes/kubernetes/pull/91975

Comment 2 W. Trevor King 2020-06-10 03:18:13 UTC
Hah, apparently the job I picked (by failure error) is actually a different test:

[sig-imageregistry][Feature:ImageAppend] Image append should create images by appending them [Suite:openshift/conformance/parallel

Pulling logs for the failed pod:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.6/74/artifacts/e2e-azure/container-logs/test.log | grep -B5 'end of log for.*append-test'
+ oc image append --insecure --from docker.io/library/busybox:latest --to image-registry.openshift-image-registry.svc:5000/e2e-test-image-append-cwj54/test:busybox1 --image '{"Cmd":["/bin/sleep"]}'
Uploading 760.5kB ...
Pushed sha256:52a90165f18eed5d7652503ff7f71fd130e3a498b83f791af402b1f36cfa5b58 to image-registry.openshift-image-registry.svc:5000/e2e-test-image-append-cwj54/test:busybox1
+ oc create is test2
Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout
<----end of log for "append-test"/"test"

So... an SDN error?

Comment 3 Oleg Bulatov 2020-06-10 09:24:55 UTC
172.30.0.1:443 - it's kube-apiserver, image append flakes should be assigned to SDN or kube-apiserver team.

---

from https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.4/2590:

+ oc image info quay.io/coreos/etcd:latest
error: unable to connect to image repository quay.io/coreos/etcd:latest: endpoint "https://quay.io" does not support v2 API

---

from https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-compact-4.5/95:

+ mkdir -p /tmp/test
+ oc image extract --insecure image-registry.openshift-image-registry.svc:5000/e2e-test-image-extract-pnq8v/1:busybox --path=/:/tmp/test
error: image does not exist

---

So sometimes it fails because of quay.io problems. But some failures need to be investigated further.

Comment 4 Oleg Bulatov 2020-06-19 08:37:26 UTC
All recent failures are about quay.io, there is nothing we can fix.

+ oc image info quay.io/coreos/etcd:latest
error: unable to connect to image repository quay.io/coreos/etcd:latest: endpoint "https://quay.io" does not support v2 API

Comment 5 W. Trevor King 2020-06-25 05:34:03 UTC
> error: unable to connect to image repository quay.io/coreos/etcd:latest: endpoint "https://quay.io" does not support v2 API

Is this something we can take back to the Quay folks?  Are they... 500ing us?  Is there a way we can get at least this level of detail into the test-case failure message, instead of its current empty-string reason and message?

Comment 9 Wenjing Zheng 2020-07-15 08:45:59 UTC
I can see the test passed here: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.6/1283271052872388608

And also this command succeeds:
$ oc image info quay.io/coreos/etcd:latest
Name:          quay.io/coreos/etcd:latest
Digest:        sha256:5b6691b7225a3f77a5a919a81261bbfb31283804418e187f7116a0a9ef65d21d
Media Type:    application/vnd.docker.distribution.manifest.v1+prettyjws
Created:       2y ago
Image Size:    9 layers (size unavailable)
Layers:        -- sha256:ff3a5c916c92643ff77519ffa742d3ec61b7f591b6b7504599d95a4a41134e28
               -- sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
               -- sha256:96b0e24539ea72226710d11720f39ac030b36f01eddf97cf91d762b0eabeb24b
               -- sha256:d1eca4d018947ae4cda26fb2ba4001592ae1cfaaf0ca59c0383531f551548179
               -- sha256:ad732d7a61c2a827257da9f61e1031bc3ee6dc92b8164d14a9e7273d1a474ad8
               -- sha256:8bc526247b5c79742e354638a1e33ed2f237c0e7e77adbd0da8fee20085df772
               -- sha256:5f56944bb51c627532324ca0f715de6563c08209fdc5dafa43993fd23652a3e6
               -- sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
               -- sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
OS:            linux
Arch:          amd64
Command:       /usr/local/bin/etcd
Exposes Ports: 2379/tcp, 2380/tcp
Environment:   PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

$ oc version
Client Version: 4.6.0-0.nightly-2020-07-14-035247
Server Version: 4.6.0-0.nightly-2020-07-13-224201
Kubernetes Version: v1.18.3+a34fde4

Comment 11 errata-xmlrpc 2020-10-27 16:06:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196