Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1793675

Summary: [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] removes definition from spec when one version gets changed to not be served [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
Product: OpenShift Container Platform Reporter: Scott Dodson <sdodson>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Johnny Liu <jialiu>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, lszaszki, mfojtik, sttts
Version: 4.3.0   
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-17 13:54:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Scott Dodson 2020-01-21 19:16:20 UTC
This is currently the most common failure in e2e-metal CI jobs, it appears that the API server is refusing connections.

Jan 21 11:28:21.175: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Jan 21 11:28:21.180: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Jan 21 11:28:22.789: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Jan 21 11:28:22.791: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable
Jan 21 11:28:23.148: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready
Jan 21 11:28:23.385: INFO: 0 / 0 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)
Jan 21 11:28:23.385: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.
Jan 21 11:28:23.385: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start
Jan 21 11:28:23.465: INFO: e2e test version: v1.16.2
Jan 21 11:28:23.536: INFO: kube-apiserver version: v1.16.2
Jan 21 11:28:23.536: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Jan 21 11:28:23.611: INFO: Cluster IP family: ipv4
[BeforeEach] [Top Level]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:60
[BeforeEach] [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:151
STEP: Creating a kubernetes client
Jan 21 11:28:23.627: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
STEP: Building a namespace api object, basename crd-publish-openapi
Jan 21 11:28:23.845: INFO: About to run a Kube e2e test, ensuring namespace is privileged
Jan 21 11:28:24.579: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[It] removes definition from spec when one version gets changed to not be served [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:694
STEP: set up a multi version CRD
Jan 21 11:28:24.659: INFO: >>> kubeConfig: /tmp/admin.kubeconfig
Jan 21 11:30:34.235: FAIL: failed to wait for definition "com.example.crd-publish-openapi-test-multi-to-single-ver.v6alpha1.E2e-test-crd-publish-openapi-9535-crd" to be served with the right OpenAPI schema: failed to wait for OpenAPI spec validating condition: Get https://api.ci-op-0y5bbkz0-4c0cd.origin-ci-int-aws.dev.rhcloud.com:6443/openapi/v2: net/http: TLS handshake timeout; lastMsg: 
[AfterEach] [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin]
  /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:152
STEP: Collecting events from namespace "e2e-crd-publish-openapi-9440".
STEP: Found 0 events.
Jan 21 11:30:42.542: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Jan 21 11:30:42.542: INFO: 
Jan 21 11:30:42.760: INFO: skipping dumping cluster info - cluster too large
Jan 21 11:30:42.760: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-crd-publish-openapi-9440" for this suite.
Jan 21 11:30:42.911: INFO: Running AfterSuite actions on all nodes
Jan 21 11:30:42.911: INFO: Running AfterSuite actions on node 1
fail [k8s.io/kubernetes/test/e2e/apimachinery/crd_publish_openapi.go:436]: Jan 21 11:30:34.401: failed to wait for definition "com.example.crd-publish-openapi-test-multi-to-single-ver.v6alpha1.E2e-test-crd-publish-openapi-9535-crd" to be served with the right OpenAPI schema: failed to wait for OpenAPI spec validating condition: Get https://api.ci-op-0y5bbkz0-4c0cd.origin-ci-int-aws.dev.rhcloud.com:6443/openapi/v2: net/http: TLS handshake timeout; lastMsg: 

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1134
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1135
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1137
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1141
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1142
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1146
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1151
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1152
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1153

Comment 3 Stephen Benjamin 2020-02-17 13:07:05 UTC
e2e-metal job is baremetal UPI.

Comment 5 Lukasz Szaszkiewicz 2020-02-17 13:50:53 UTC
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1153 was also OOMKilled, although it's not clear whether it was the root cause.

Comment 6 Scott Dodson 2020-02-17 13:54:23 UTC
For https://bugzilla.redhat.com/show_bug.cgi?id=1796127 we found that the test container in e2e-metal had been allocated 3GiB rather than 4GiB as most other jobs had. Searching through https://prow.svc.ci.openshift.org there are no recent occurences of this container being oomkilled. Marking this a a dupe

*** This bug has been marked as a duplicate of bug 1796127 ***