1845766 – Image info should display information about images: pod "append-test" failed with reason: "", message: ""

Bug 1845766 - Image info should display information about images: pod "append-test" failed with reason: "", message: ""

Summary: Image info should display information about images: pod "append-test" failed ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Image Registry
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Oleg Bulatov
QA Contact:	Wenjing Zheng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-10 02:33 UTC by W. Trevor King
Modified:	2020-10-27 16:06 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:	operator.Create the release image "latest" containing all images built by this job
Last Closed:	2020-10-27 16:06:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift library-go pull 825	None	closed	Bug 1845766: Add status code to ErrNotV2Registry	2020-07-15 06:50:10 UTC
Github	openshift oc pull 479	None	closed	Bug 1845766: Add status to ErrNotV2Registry	2020-07-15 06:50:09 UTC
Red Hat Product Errata	RHBA-2020:4196	None	None	None	2020-10-27 16:06:19 UTC

Description W. Trevor King 2020-06-10 02:33:23 UTC

test:
[sig-imageregistry][Feature:ImageInfo] Image info should display information about images [Suite:openshift/conformance/parallel]

is failing frequently in CI, see search results:
$ w3m -dump -cols 200 'https://search.svc.ci.openshift.org/?maxAge=48h&name=release-openshift-&search=pod%20%22append-test%22%20failed%20with%20reason:%20%22%22,%20message:%20%22%22' | grep 'failures match'
release-openshift-origin-installer-e2e-gcp-4.4 - 47 runs, 38% failed, 6% of failures match
release-openshift-ocp-installer-e2e-aws-4.1 - 3 runs, 67% failed, 50% of failures match
release-openshift-ocp-installer-e2e-azure-4.6 - 5 runs, 60% failed, 33% of failures match

Example job [1], which fails with:

fail [k8s.io/kubernetes/test/e2e/framework/pods.go:200]: wait for pod "append-test" to success
Expected success, but got an error:
    <*errors.errorString | 0xc0023ca2e0>: {
        s: "pod \"append-test\" failed with reason: \"\", message: \"\"",
    }
    pod "append-test" failed with reason: "", message: ""

[1]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.6/74

Comment 1 W. Trevor King 2020-06-10 03:07:35 UTC

Also submitted an upstream PR for the "to success" -> "to succeed" typo [1].

[1]: https://github.com/kubernetes/kubernetes/pull/91975

Comment 2 W. Trevor King 2020-06-10 03:18:13 UTC

Hah, apparently the job I picked (by failure error) is actually a different test:

[sig-imageregistry][Feature:ImageAppend] Image append should create images by appending them [Suite:openshift/conformance/parallel

Pulling logs for the failed pod:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.6/74/artifacts/e2e-azure/container-logs/test.log | grep -B5 'end of log for.*append-test'
+ oc image append --insecure --from docker.io/library/busybox:latest --to image-registry.openshift-image-registry.svc:5000/e2e-test-image-append-cwj54/test:busybox1 --image '{"Cmd":["/bin/sleep"]}'
Uploading 760.5kB ...
Pushed sha256:52a90165f18eed5d7652503ff7f71fd130e3a498b83f791af402b1f36cfa5b58 to image-registry.openshift-image-registry.svc:5000/e2e-test-image-append-cwj54/test:busybox1
+ oc create is test2
Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout
<----end of log for "append-test"/"test"

So... an SDN error?

Comment 3 Oleg Bulatov 2020-06-10 09:24:55 UTC

172.30.0.1:443 - it's kube-apiserver, image append flakes should be assigned to SDN or kube-apiserver team.

---

from https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.4/2590:

+ oc image info quay.io/coreos/etcd:latest
error: unable to connect to image repository quay.io/coreos/etcd:latest: endpoint "https://quay.io" does not support v2 API

---

from https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-compact-4.5/95:

+ mkdir -p /tmp/test
+ oc image extract --insecure image-registry.openshift-image-registry.svc:5000/e2e-test-image-extract-pnq8v/1:busybox --path=/:/tmp/test
error: image does not exist

---

So sometimes it fails because of quay.io problems. But some failures need to be investigated further.

Comment 4 Oleg Bulatov 2020-06-19 08:37:26 UTC

All recent failures are about quay.io, there is nothing we can fix.

+ oc image info quay.io/coreos/etcd:latest
error: unable to connect to image repository quay.io/coreos/etcd:latest: endpoint "https://quay.io" does not support v2 API

Comment 5 W. Trevor King 2020-06-25 05:34:03 UTC

> error: unable to connect to image repository quay.io/coreos/etcd:latest: endpoint "https://quay.io" does not support v2 API

Is this something we can take back to the Quay folks?  Are they... 500ing us?  Is there a way we can get at least this level of detail into the test-case failure message, instead of its current empty-string reason and message?

Comment 9 Wenjing Zheng 2020-07-15 08:45:59 UTC

I can see the test passed here: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.6/1283271052872388608

And also this command succeeds:
$ oc image info quay.io/coreos/etcd:latest
Name:          quay.io/coreos/etcd:latest
Digest:        sha256:5b6691b7225a3f77a5a919a81261bbfb31283804418e187f7116a0a9ef65d21d
Media Type:    application/vnd.docker.distribution.manifest.v1+prettyjws
Created:       2y ago
Image Size:    9 layers (size unavailable)
Layers:        -- sha256:ff3a5c916c92643ff77519ffa742d3ec61b7f591b6b7504599d95a4a41134e28
               -- sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
               -- sha256:96b0e24539ea72226710d11720f39ac030b36f01eddf97cf91d762b0eabeb24b
               -- sha256:d1eca4d018947ae4cda26fb2ba4001592ae1cfaaf0ca59c0383531f551548179
               -- sha256:ad732d7a61c2a827257da9f61e1031bc3ee6dc92b8164d14a9e7273d1a474ad8
               -- sha256:8bc526247b5c79742e354638a1e33ed2f237c0e7e77adbd0da8fee20085df772
               -- sha256:5f56944bb51c627532324ca0f715de6563c08209fdc5dafa43993fd23652a3e6
               -- sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
               -- sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
OS:            linux
Arch:          amd64
Command:       /usr/local/bin/etcd
Exposes Ports: 2379/tcp, 2380/tcp
Environment:   PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

$ oc version
Client Version: 4.6.0-0.nightly-2020-07-14-035247
Server Version: 4.6.0-0.nightly-2020-07-13-224201
Kubernetes Version: v1.18.3+a34fde4

Comment 11 errata-xmlrpc 2020-10-27 16:06:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.