Bug 1761043 - [e2e] [sig-api-machinery] CustomResourcePublishOpenAPI [Feature:CustomResourcePublishOpenAPI] works for CRD with validation schema [Suite:openshift/conformance/parallel] [Suite:k8s]
Summary: [e2e] [sig-api-machinery] CustomResourcePublishOpenAPI [Feature:CustomResourc...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Lukasz Szaszkiewicz
QA Contact: Ke Wang
URL:
Whiteboard:
: 1760198 1822293 1822294 (view as bug list)
Depends On: 1829294
Blocks: 1828790
TreeView+ depends on / blocked
 
Reported: 2019-10-12 03:40 UTC by Jianwei Hou
Modified: 2020-07-13 17:12 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1828790 1829294 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:11:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 24894 0 None closed refactor/improve CRD publishing e2e tests in an HA setup 2020-06-23 08:12:29 UTC
Github openshift origin pull 24914 0 None closed take_two: refactor/improve CRD publishing e2e tests in an HA setup 2020-06-23 08:12:29 UTC
Github openshift origin pull 24920 0 None closed Bug 1761043: provides a temporal fix to improve CRD publishing e2e tests in an HA setup 2020-06-23 08:12:29 UTC
Github openshift origin pull 24930 0 None closed Bug 1761043: provides a temporal fix to improve CRD publishing e2e tests in an HA setup 2020-06-23 08:12:28 UTC
Github openshift origin pull 24931 0 None closed Bug 1761043: provides a temporal fix to improve CRD publishing e2e tests in an HA setup 2020-06-23 08:12:28 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:12:02 UTC

Description Jianwei Hou 2019-10-12 03:40:53 UTC
Description of problem:
The test "[sig-api-machinery] CustomResourcePublishOpenAPI [Feature:CustomResourcePublishOpenAPI] works for CRD with validation schema [Suite:openshift/conformance/parallel] [Suite:k8s]" fails sometimes.

```
fail [k8s.io/kubernetes/test/e2e/apimachinery/crd_publish_openapi.go:95]: Oct  8 12:42:31.036: unexpected no error when creating CR without required field: error running &{/usr/bin/kubectl [kubectl --server=https://api.ci-op-vi9gvk0s-103c6.origin-ci-int-aws.dev.rhcloud.com:6443 --kubeconfig=/tmp/admin.kubeconfig --namespace=e2e-crd-publish-openapi-3381 create -f -] []  0xc0068dc840  The E2e-test-crd-publish-openapi-9502-crd "test-foo" is invalid: []: Invalid value: map[string]interface {}{"apiVersion":"crd-publish-openapi-test-foo.k8s.io/v1", "kind":"E2e-test-crd-publish-openapi-9502-crd", "metadata":map[string]interface {}{"creationTimestamp":"2019-10-08T12:42:31Z", "generation":1, "name":"test-foo", "namespace":"e2e-crd-publish-openapi-3381", "uid":"17e4d934-e9c9-11e9-975a-121bc63d6326"}, "spec":map[string]interface {}{"bars":[]interface {}{map[string]interface {}{"age":"10"}}}}: validation failure list:
spec.bars.name in body is required
 [] <nil> 0xc004c32de0 exit status 1 <nil> <nil> true [0xc00b0e8228 0xc00b0e8250 0xc00b0e8260] [0xc00b0e8228 0xc00b0e8250 0xc00b0e8260] [0xc00b0e8230 0xc00b0e8248 0xc00b0e8258] [0x95acb0 0x95ade0 0x95ade0] 0xc002bfc060 <nil>}:
Command stdout:

stderr:
The E2e-test-crd-publish-openapi-9502-crd "test-foo" is invalid: []: Invalid value: map[string]interface {}{"apiVersion":"crd-publish-openapi-test-foo.k8s.io/v1", "kind":"E2e-test-crd-publish-openapi-9502-crd", "metadata":map[string]interface {}{"creationTimestamp":"2019-10-08T12:42:31Z", "generation":1, "name":"test-foo", "namespace":"e2e-crd-publish-openapi-3381", "uid":"17e4d934-e9c9-11e9-975a-121bc63d6326"}, "spec":map[string]interface {}{"bars":[]interface {}{map[string]interface {}{"age":"10"}}}}: validation failure list:
spec.bars.name in body is required

error:
exit status 1
```

Recent failures:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-mirrors-4.2/41
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-mirrors-4.2/48
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-mirrors-4.2/61


Version-Release number of selected component (if applicable):
4.2

How reproducible:
Sometimes

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Ben Parees 2020-03-10 19:05:42 UTC
22 failures in the last 24 hours:
https://search.svc.ci.openshift.org/?search=failed%3A.*works+for+CRD+with+validation+schema&maxAge=24h&context=2&type=all

(aggregating [Feature:CustomResourcePublishOpenAPI] and [Privileged:ClusterAdmin])

seems like at least a medium to me.

Comment 3 Ben Parees 2020-03-10 19:19:40 UTC
*** Bug 1760198 has been marked as a duplicate of this bug. ***

Comment 4 Venkata Siva Teja Areti 2020-04-08 18:03:45 UTC
*** Bug 1822293 has been marked as a duplicate of this bug. ***

Comment 5 Ben Parees 2020-04-13 19:56:41 UTC
this is the top failing test for sig-api-machiney and it's failing 18% of the time in 4.4.  Raising priority+severity (please backport any fix to 4.4).

Comment 7 Lukasz Szaszkiewicz 2020-04-15 15:27:35 UTC
I think I found the root cause of the issue. At least the tests I looked at failed because of it. Although there could be more issues with these tests.

There is a race condition between the client (test) and the servers. Oddly enough the tests take into account that there can be many servers but don't guarantee that all will see the same update. 

In short, the tests create a CRD and wait until "all" servers generate the OpenAPI spec with that resource. To check that, they send multiple requests to a single public IP (LB) and compare the output. 
This is actually the problematic part of this test as there is no guarantee they will contact all servers.

I did a simple test to prove that, I disabled the OpenAPI spec generation in one server (out of 3) and run the tests. Most of the time they didn't fail on "wait until all servers generate the spec" part.

The proper fix would be to contact all replicas and check if they generated the same OpenAPI spec before running the actual tests.

Comment 9 Lukasz Szaszkiewicz 2020-04-28 11:16:02 UTC
*** Bug 1822294 has been marked as a duplicate of this bug. ***

Comment 10 Brett Tofel 2020-04-30 18:06:19 UTC
Several failures on seemingly very similar issue with test titled:
"[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] [Top Level] [sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] works for multiple CRDs of same group but different versions [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s] "

An example:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.4/819

Comment 11 Lukasz Szaszkiewicz 2020-05-04 10:22:53 UTC
@Xingxing I think that my latest pull improved the stability of CRD tests, have a look https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-origin-installer-e2e-gcp-4.5&sort-by-flakiness

Comment 14 Ke Wang 2020-05-07 08:38:07 UTC
From https://testgrid.k8s.io/redhat-openshift-ocp-release-4.5-blocking#release-openshift-origin-installer-e2e-gcp-4.5&sort-by-flakiness view, searching the keywords 'works for CRD with validation schema' on page, we can see the test '[sig-api-machinery] CustomResourcePublishOpenAPI [Feature:CustomResourcePublishOpenAPI]' related blocks all are green over the past 7 days.

Do a quick searching with https://search.apps.build01.ci.devcluster.openshift.com/?search=failed%3A.*works+for+CRD+with+validation+schema&maxAge=168h&context=2&type=all&name=&maxMatches=5&maxBytes=20971520&groupBy=job&wrap=on, there is no related errors for 4.5 over the past 7 days as well.

So the bug was fixed. move the bug verified.

Comment 16 errata-xmlrpc 2020-07-13 17:11:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.