Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1793675

Summary:	[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] removes definition from spec when one version gets changed to not be served [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
Product:	OpenShift Container Platform	Reporter:	Scott Dodson <sdodson>
Component:	Installer	Assignee:	Abhinav Dahiya <adahiya>
Installer sub component:	openshift-installer	QA Contact:	Johnny Liu <jialiu>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	high
Priority:	high	CC:	aos-bugs, lszaszki, mfojtik, sttts
Version:	4.3.0
Target Milestone:	---
Target Release:	4.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-02-17 13:54:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Scott Dodson 2020-01-21 19:16:20 UTC

This is currently the most common failure in e2e-metal CI jobs, it appears that the API server is refusing connections.

Jan 21 11:28:21.175: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Jan 21 11:28:21.180: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Jan 21 11:28:22.789: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Jan 21 11:28:22.791: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
Jan 21 11:28:23.148: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Jan 21 11:28:23.385: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Jan 21 11:28:23.385: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
Jan 21 11:28:23.385: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
Jan 21 11:28:23.465: INFO: e2e test version: v1.16.2
Jan 21 11:28:23.536: INFO: kube-apiserver version: v1.16.2
Jan 21 11:28:23.536: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Jan 21 11:28:23.611: INFO: Cluster IP family: ipv4
[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:60
[BeforeEach] [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:151
STEP: Creating a kubernetes client
Jan 21 11:28:23.627: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
STEP: Building a namespace api object, basename crd-publish-openapi
Jan 21 11:28:23.845: INFO: About to run a Kube e2e test, ensuring namespace is privileged
Jan 21 11:28:24.579: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[It] removes definition from spec when one version gets changed to not be served [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:694
STEP: set up a multi version CRD
Jan 21 11:28:24.659: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Jan 21 11:30:34.235: FAIL: failed to wait for definition "com.example.crd-publish-openapi-test-multi-to-single-ver.v6alpha1.E2e-test-crd-publish-openapi-9535-crd" to be served with the right OpenAPI schema: failed to wait for OpenAPI spec validating condition: Get https://api.ci-op-0y5bbkz0-4c0cd.origin-ci-int-aws.dev.rhcloud.com:6443/openapi/v2: net/http: TLS handshake timeout; lastMsg: 
[AfterEach] [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:152
STEP: Collecting events from namespace "e2e-crd-publish-openapi-9440".
STEP: Found 0 events.
Jan 21 11:30:42.542: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Jan 21 11:30:42.542: INFO: 
Jan 21 11:30:42.760: INFO: skipping dumping cluster info - cluster too large
Jan 21 11:30:42.760: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-crd-publish-openapi-9440" for this suite.
Jan 21 11:30:42.911: INFO: Running AfterSuite actions on all nodes
Jan 21 11:30:42.911: INFO: Running AfterSuite actions on node 1
fail [k8s.io/kubernetes/test/e2e/apimachinery/crd_publish_openapi.go:436]: Jan 21 11:30:34.401: failed to wait for definition "com.example.crd-publish-openapi-test-multi-to-single-ver.v6alpha1.E2e-test-crd-publish-openapi-9535-crd" to be served with the right OpenAPI schema: failed to wait for OpenAPI spec validating condition: Get https://api.ci-op-0y5bbkz0-4c0cd.origin-ci-int-aws.dev.rhcloud.com:6443/openapi/v2: net/http: TLS handshake timeout; lastMsg: 

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1134
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1135
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1137
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1141
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1142
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1146
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1151
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1152
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1153

Comment 3 Stephen Benjamin 2020-02-17 13:07:05 UTC

e2e-metal job is baremetal UPI.

Comment 4 Lukasz Szaszkiewicz 2020-02-17 13:08:04 UTC

I think I have found it, the following test failed because they were forcefully killed. The actual error message was =  Container test exited with code 1, reason OOMKilled

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1135
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1137
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1141
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1142
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1146
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1151
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1152



We should check how much memory the test job gets.

Comment 5 Lukasz Szaszkiewicz 2020-02-17 13:50:53 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1153 was also OOMKilled, although it's not clear whether it was the root cause.

Comment 6 Scott Dodson 2020-02-17 13:54:23 UTC

For https://bugzilla.redhat.com/show_bug.cgi?id=1796127 we found that the test container in e2e-metal had been allocated 3GiB rather than 4GiB as most other jobs had. Searching through https://prow.svc.ci.openshift.org there are no recent occurences of this container being oomkilled. Marking this a a dupe

*** This bug has been marked as a duplicate of bug 1796127 ***