Description of problem: The test "[sig-api-machinery] CustomResourcePublishOpenAPI [Feature:CustomResourcePublishOpenAPI] works for CRD with validation schema [Suite:openshift/conformance/parallel] [Suite:k8s]" fails sometimes. ``` fail [k8s.io/kubernetes/test/e2e/apimachinery/crd_publish_openapi.go:95]: Oct 8 12:42:31.036: unexpected no error when creating CR without required field: error running &{/usr/bin/kubectl [kubectl --server=https://api.ci-op-vi9gvk0s-103c6.origin-ci-int-aws.dev.rhcloud.com:6443 --kubeconfig=/tmp/admin.kubeconfig --namespace=e2e-crd-publish-openapi-3381 create -f -] [] 0xc0068dc840 The E2e-test-crd-publish-openapi-9502-crd "test-foo" is invalid: []: Invalid value: map[string]interface {}{"apiVersion":"crd-publish-openapi-test-foo.k8s.io/v1", "kind":"E2e-test-crd-publish-openapi-9502-crd", "metadata":map[string]interface {}{"creationTimestamp":"2019-10-08T12:42:31Z", "generation":1, "name":"test-foo", "namespace":"e2e-crd-publish-openapi-3381", "uid":"17e4d934-e9c9-11e9-975a-121bc63d6326"}, "spec":map[string]interface {}{"bars":[]interface {}{map[string]interface {}{"age":"10"}}}}: validation failure list: spec.bars.name in body is required [] <nil> 0xc004c32de0 exit status 1 <nil> <nil> true [0xc00b0e8228 0xc00b0e8250 0xc00b0e8260] [0xc00b0e8228 0xc00b0e8250 0xc00b0e8260] [0xc00b0e8230 0xc00b0e8248 0xc00b0e8258] [0x95acb0 0x95ade0 0x95ade0] 0xc002bfc060 <nil>}: Command stdout: stderr: The E2e-test-crd-publish-openapi-9502-crd "test-foo" is invalid: []: Invalid value: map[string]interface {}{"apiVersion":"crd-publish-openapi-test-foo.k8s.io/v1", "kind":"E2e-test-crd-publish-openapi-9502-crd", "metadata":map[string]interface {}{"creationTimestamp":"2019-10-08T12:42:31Z", "generation":1, "name":"test-foo", "namespace":"e2e-crd-publish-openapi-3381", "uid":"17e4d934-e9c9-11e9-975a-121bc63d6326"}, "spec":map[string]interface {}{"bars":[]interface {}{map[string]interface {}{"age":"10"}}}}: validation failure list: spec.bars.name in body is required error: exit status 1 ``` Recent failures: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-mirrors-4.2/41 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-mirrors-4.2/48 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-mirrors-4.2/61 Version-Release number of selected component (if applicable): 4.2 How reproducible: Sometimes Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
22 failures in the last 24 hours: https://search.svc.ci.openshift.org/?search=failed%3A.*works+for+CRD+with+validation+schema&maxAge=24h&context=2&type=all (aggregating [Feature:CustomResourcePublishOpenAPI] and [Privileged:ClusterAdmin]) seems like at least a medium to me.
*** Bug 1760198 has been marked as a duplicate of this bug. ***
*** Bug 1822293 has been marked as a duplicate of this bug. ***
this is the top failing test for sig-api-machiney and it's failing 18% of the time in 4.4. Raising priority+severity (please backport any fix to 4.4).
I think I found the root cause of the issue. At least the tests I looked at failed because of it. Although there could be more issues with these tests. There is a race condition between the client (test) and the servers. Oddly enough the tests take into account that there can be many servers but don't guarantee that all will see the same update. In short, the tests create a CRD and wait until "all" servers generate the OpenAPI spec with that resource. To check that, they send multiple requests to a single public IP (LB) and compare the output. This is actually the problematic part of this test as there is no guarantee they will contact all servers. I did a simple test to prove that, I disabled the OpenAPI spec generation in one server (out of 3) and run the tests. Most of the time they didn't fail on "wait until all servers generate the spec" part. The proper fix would be to contact all replicas and check if they generated the same OpenAPI spec before running the actual tests.
*** Bug 1822294 has been marked as a duplicate of this bug. ***
Several failures on seemingly very similar issue with test titled: "[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] [Top Level] [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of same group but different versions [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s] " An example: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.4/819
@Xingxing I think that my latest pull improved the stability of CRD tests, have a look https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-origin-installer-e2e-gcp-4.5&sort-by-flakiness
From https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-origin-installer-e2e-gcp-4.5&sort-by-flakiness view, searching the keywords 'works for CRD with validation schema' on page, we can see the test '[sig-api-machinery] CustomResourcePublishOpenAPI [Feature:CustomResourcePublishOpenAPI]' related blocks all are green over the past 7 days. Do a quick searching with https://search.apps.build01.ci.devcluster.openshift.com/?search=failed%3A.*works+for+CRD+with+validation+schema&maxAge=168h&context=2&type=all&name=&maxMatches=5&maxBytes=20971520&groupBy=job&wrap=on, there is no related errors for 4.5 over the past 7 days as well. So the bug was fixed. move the bug verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409