Description of problem: This release informing azure test is in a permafail state: https://prow.svc.ci.openshift.org/job-history/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.4 Seeing the following errors in these failed runs: fail [k8s.io/kubernetes/test/e2e/framework/volume/fixtures.go:390]: Unexpected error: <*errors.StatusError | 0xc0015cde00>: { ErrStatus: { TypeMeta: {Kind: "", APIVersion: ""}, ListMeta: { SelfLink: "", ResourceVersion: "", Continue: "", RemainingItemCount: nil, }, Status: "Failure", Message: "etcdserver: request timed out", Reason: "", Details: nil, Code: 500, }, } etcdserver: request timed out occurred Apr 02 23:11:06.343 E kube-apiserver Kube API started failing: etcdserver: leader changed Examples: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.4/70/artifacts/e2e-azure/e2e.log https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.4/70/artifacts/e2e-azure/e2e.log Also see: 2020-04-02 23:19:44.148718 W | wal: sync duration of 1.204969s, expected less than 1s 2020-04-02 23:19:44.148793 W | etcdserver: failed to send out heartbeat on time (exceeded the 500ms timeout for 205.9901ms, to e2fe3bf11f2491c5) 2020-04-02 23:19:44.148801 W | etcdserver: server is likely overloaded 2020-04-02 23:19:44.148808 W | etcdserver: failed to send out heartbeat on time (exceeded the 500ms timeout for 206.0069ms, to 382c74853240f5cf) 2020-04-02 23:19:44.148812 W | etcdserver: server is likely overloaded 20 https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.4/72/artifacts/e2e-azure/pods/openshift-etcd_etcd-ci-op-02852rlc-5ab2f-4w8zd-master-0_etcd.log And a lot of: 2020-04-02 23:01:50.926685 W | etcdserver: read-only range request "key:\"/kubernetes.io/networkpolicies/e2e-test-router-scoped-d2ftl/\" range_end:\"/kubernetes.io/networkpolicies/e2e-test-router-scoped-d2ftl0\" " with result "range_response_count:0 size:6" took too long (105.9526ms) to execute 2020-04-02 23:01:51.556599 W | etcdserver: read-only range request "key:\"/kubernetes.io/rolebindings/openshift-machine-api/cluster-autoscaler-operator\" " with result "range_response_count:1 size:405" took too long (106.0961ms) to execute 2020-04-02 23:01:51.556716 W | etcdserver: read-only range request "key:\"/kubernetes.io/monitoring.coreos.com/prometheusrules/openshift-marketplace/marketplace-alert-rules\" " with result "range_response_count:1 size:3965" took too long (105.7011ms) to execute 2 https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.4/72/artifacts/e2e-azure/pods/openshift-etcd_etcd-ci-op-02852rlc-5ab2f-4w8zd-master-1_etcd.log Another run from Feb: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.4/53/artifacts/e2e-azure/e2e.log I did see https://bugzilla.redhat.com/show_bug.cgi?id=1819907 but unsure if this is a dupe.
This is a 4.4 blocker. Moving back to 4.4 release.
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/24858/pull-ci-openshift-origin-master-e2e-gcp-builds/1845 is failing and I see: 2020-04-09T21:34:46.03330722Z 2020-04-09 21:34:46.033252 W | etcdserver: read-only range request "key:\"/kubernetes.io/configmaps/openshift-kube-scheduler/serviceaccount-ca\" " with result "range_response_count:1 size:6299" took too long (456.196812ms) to execute xref https://bugzilla.redhat.com/show_bug.cgi?id=1817588#c2 as I see other etcd/apiserver errors, so we may have a dupe.
Tests seem to be failing more recently because of a failure with image registry operator. > E0413 22:43:19.318721 14 controller.go:252] unable to sync: Config.imageregistry.operator.openshift.io "cluster" is invalid: spec.storage.azure.container: Invalid value: "": spec.storage.azure.container in body should be at least 3 chars long, requeuing This was fixed in 4.5 but also needs to be backported to 4.4 https://bugzilla.redhat.com/show_bug.cgi?id=1823590