+++ This bug was initially created as a clone of Bug #2106403 +++ Description of problem: The e2e-nutanix-operator webhooks test suite does not support provider Nutanix Version-Release number of selected component (if applicable): 4.12 How reproducible: Always Steps to Reproduce: 1. Create an OCP cluster with Nutanix platform, using the latest 4.12 nightly release image 2. Clone the repo github.com/openshift/cluster-api-actuator-pkg 3. Set the KUBECONFIG to the OCP cluster created in step 1 4. Run the command: NAMESPACE=kube-system ./hack/ci-integration.sh -focus "Webhooks" -v Actual results: The tests failed and the the default machineset of the OCP cluster got deleted. Expected results: All the tests in the suite pass. Additional info: --- Additional comment from mimccune on 2022-07-12 16:05:48 UTC --- @yanhli does this fail even with the change from Sid? (https://github.com/openshift/machine-api-operator/pull/1034) --- Additional comment from sishukla on 2022-07-13 13:40:37 UTC --- @mimccune The failure is actually due to the setup clause of the webhook spec in the cluster-api-pkg-actuator test skipping out early without setting the labels appropriately and the cleanup phase using unset labels which match all and mark all machinesets and machines for deletion. --- Additional comment from sishukla on 2022-07-13 13:42:06 UTC --- https://github.com/openshift/cluster-api-actuator-pkg/blob/master/pkg/infra/webhooks.go#L37-L66 --- Additional comment from sishukla on 2022-07-13 13:43:23 UTC --- Two things: - that conditional should be in the very beginning of the spec description and not in the setup clause. - we need to add nutanix platform to that conditional --- Additional comment from mimccune on 2022-07-13 13:45:52 UTC --- (In reply to Sid Shukla from comment #4) > Two things: > - that conditional should be in the very beginning of the spec description > and not in the setup clause. ah yeah, so it adds the skip instead of failing on non-supported platforms? --- Additional comment from sishukla on 2022-07-13 15:17:22 UTC --- So, here's an example: ```go package ginkgo_playground_test import ( "fmt" "testing" . "github.com/onsi/ginkgo/v2" . "github.com/onsi/gomega" ) func TestGinkgoPlayground(t *testing.T) { RegisterFailHandler(Fail) RunSpecs(t, "GinkgoPlayground Suite") } var _ = Describe("Printing execution order of closures", func() { fmt.Println("describe block") BeforeEach(func() { fmt.Println("before each block") Skip("skip") }) AfterEach(func() { fmt.Println("after each block") }) It("executes the It block", func() { fmt.Println("it block") }) }) ``` When you run ginkgo test on this, here's the output ``` $ ginkgo run . describe block Running Suite: GinkgoPlayground Suite - /Users/sid.shukla/go/src/github.com/thunderboltsid/ginkgo-playground ============================================================================================================ Random Seed: 1657725350 Will run 1 of 1 specs before each block after each block ------------------------------ S [SKIPPED] [0.000 seconds] Printing execution order of closures [BeforeEach] /Users/sid.shukla/go/src/github.com/thunderboltsid/ginkgo-playground/ginkgo_playground_suite_test.go:18 executes the It block /Users/sid.shukla/go/src/github.com/thunderboltsid/ginkgo-playground/ginkgo_playground_suite_test.go:27 skip In [BeforeEach] at: /Users/sid.shukla/go/src/github.com/thunderboltsid/ginkgo-playground/ginkgo_playground_suite_test.go:20 ------------------------------ Ran 0 of 1 Specs in 0.001 seconds SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 1 Skipped PASS Ginkgo ran 1 suite in 2.339198756s Test Suite Passed ``` As you can see, the AfterEach closure and the BeforeEach closure get executed if the skip happens inside the BeforeEach closure. --- Additional comment from sishukla on 2022-07-13 15:22:45 UTC --- If the skip clause is moved over to the Describe closure, the BeforeEach and AfterEach closures are not executed. ```go package ginkgo_playground_test import ( "fmt" "testing" . "github.com/onsi/ginkgo/v2" . "github.com/onsi/gomega" ) func TestGinkgoPlayground(t *testing.T) { RegisterFailHandler(Fail) RunSpecs(t, "GinkgoPlayground Suite") } var _ = Describe("Printing execution order of closures", func() { fmt.Println("describe block") defer GinkgoRecover() Skip("skip") BeforeEach(func() { fmt.Println("before each block") }) AfterEach(func() { fmt.Println("after each block") }) It("executes the It block", func() { fmt.Println("it block") }) }) ``` as can be seen from running this ``` $ ginkgo run . describe block Running Suite: GinkgoPlayground Suite - /Users/sid.shukla/go/src/github.com/thunderboltsid/ginkgo-playground ============================================================================================================ Random Seed: 1657725679 Will run 0 of 0 specs Ran 0 of 0 Specs in 0.000 seconds SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 0 Skipped PASS Ginkgo ran 1 suite in 2.262312576s Test Suite Passed ``` --- Additional comment from sishukla on 2022-07-13 15:26:41 UTC --- What that entails for the Webhook spec is if a platform is not explicitly in this switch (https://github.com/openshift/cluster-api-actuator-pkg/blob/master/pkg/infra/webhooks.go#L39), the testSelector (https://github.com/openshift/cluster-api-actuator-pkg/blob/master/pkg/infra/webhooks.go#L51-L53) never gets initialized. As a result, when the `AfterEach` closure runs, it ends up marking all machines and machinesets for deletion (https://github.com/openshift/cluster-api-actuator-pkg/blob/master/pkg/infra/webhooks.go#L57-L65). --- Additional comment from mimccune on 2022-07-13 16:14:25 UTC --- great analysis Sid, it makes sense to me. would you like to propose a patch for this? (otherwise i can make something from your samples here) --- Additional comment from yanhli on 2022-07-13 18:57:22 UTC --- I filed the PR https://github.com/openshift/cluster-api-actuator-pkg/pull/236. And manually tested. @mimccune Please review the fix at https://github.com/openshift/cluster-api-actuator-pkg/pull/236. --- Additional comment from mimccune on 2022-07-13 19:23:36 UTC --- awesome, thank you Yanhua!
Validated as below - [miyadav@miyadav ~]$ vi ~/.kube/config [miyadav@miyadav ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-08-26-162248 True False 30m Cluster version is 4.11.0-0.nightly-2022-08-26-162248 [miyadav@miyadav ~]$ git clone github.com/openshift/cluster-api-actuator-pkg fatal: repository 'github.com/openshift/cluster-api-actuator-pkg' does not exist [miyadav@miyadav ~]$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 7.6G 0 7.6G 0% /dev tmpfs 7.7G 230M 7.4G 3% /dev/shm tmpfs 7.7G 2.0M 7.7G 1% /run tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup /dev/mapper/RHELCSB-Root 50G 48G 2.6G 95% / /dev/nvme0n1p2 3.0G 467M 2.6G 16% /boot /dev/nvme0n1p1 200M 17M 184M 9% /boot/efi /dev/mapper/RHELCSB-Home 100G 29G 72G 29% /home tmpfs 1.6G 56K 1.6G 1% /run/user/119637 [miyadav@miyadav ~]$ git clone git:openshift/cluster-api-actuator-pkg.git Cloning into 'cluster-api-actuator-pkg'... remote: Enumerating objects: 42267, done. remote: Counting objects: 100% (4627/4627), done. remote: Compressing objects: 100% (2212/2212), done. remote: Total 42267 (delta 2198), reused 4446 (delta 2119), pack-reused 37640 Receiving objects: 100% (42267/42267), 63.52 MiB | 1.06 MiB/s, done. Resolving deltas: 100% (19202/19202), done. [miyadav@miyadav ~]$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 7.6G 0 7.6G 0% /dev tmpfs 7.7G 227M 7.4G 3% /dev/shm tmpfs 7.7G 2.0M 7.7G 1% /run tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup /dev/mapper/RHELCSB-Root 50G 48G 2.6G 95% / /dev/nvme0n1p2 3.0G 467M 2.6G 16% /boot /dev/nvme0n1p1 200M 17M 184M 9% /boot/efi /dev/mapper/RHELCSB-Home 100G 29G 72G 29% /home tmpfs 1.6G 56K 1.6G 1% /run/user/119637 [miyadav@miyadav ~]$ cd cluster-api-actuator-pkg/ [miyadav@miyadav cluster-api-actuator-pkg]$ NAMESPACE=kube-system ./hack/ci-integration.sh -focus "Webhooks" -v You're using deprecated Ginkgo functionality: ============================================= Ginkgo 2.0 is under active development and will introduce several new features, improvements, and a small handful of breaking changes. A release candidate for 2.0 is now available and 2.0 should GA in Fall 2021. Please give the RC a try and send us feedback! - To learn more, view the migration guide at https://github.com/onsi/ginkgo/blob/ver2/docs/MIGRATING_TO_V2.md - For instructions on using the Release Candidate visit https://github.com/onsi/ginkgo/blob/ver2/docs/MIGRATING_TO_V2.md#using-the-beta - To comment, chime in at https://github.com/onsi/ginkgo/issues/711 --stream is deprecated and will be removed in Ginkgo 2.0 Learn more at: https://github.com/onsi/ginkgo/blob/ver2/docs/MIGRATING_TO_V2.md#removed--stream To silence deprecations that can be silenced set the following environment variable: ACK_GINKGO_DEPRECATIONS=1.16.5 I0829 10:57:31.672543 39384 request.go:601] Waited for 1.049373323s due to client-side throttling, not priority and fairness, request: GET:https://api.sgao-0.qe.devcluster.openshift.com:6443/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s Running Suite: Machine Suite ============================ Random Seed: 1661750808 Will run 4 of 37 specs SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS ------------------------------ [Feature:Machines] Webhooks should be able to create a machine from a minimal providerSpec /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:68 • [SLOW TEST:144.080 seconds] [Feature:Machines] Webhooks /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:21 should be able to create a machine from a minimal providerSpec /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:68 ------------------------------ [Feature:Machines] Webhooks should be able to create machines from a machineset with a minimal providerSpec /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:94 I0829 11:00:02.602291 39384 request.go:601] Waited for 1.050221197s due to client-side throttling, not priority and fairness, request: GET:https://api.sgao-0.qe.devcluster.openshift.com:6443/apis/whereabouts.cni.cncf.io/v1alpha1?timeout=32s • [SLOW TEST:177.254 seconds] [Feature:Machines] Webhooks /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:21 should be able to create machines from a machineset with a minimal providerSpec /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:94 ------------------------------ [Feature:Machines] Webhooks should return an error when removing required fields from the Machine providerSpec /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:101 I0829 11:02:59.749116 39384 request.go:601] Waited for 1.000195504s due to client-side throttling, not priority and fairness, request: GET:https://api.sgao-0.qe.devcluster.openshift.com:6443/apis/storage.k8s.io/v1beta1?timeout=32s • [SLOW TEST:22.425 seconds] [Feature:Machines] Webhooks /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:21 should return an error when removing required fields from the Machine providerSpec /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:101 ------------------------------ [Feature:Machines] Webhooks should return an error when removing required fields from the MachineSet providerSpec /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:135 I0829 11:03:22.535854 39384 request.go:601] Waited for 1.049392136s due to client-side throttling, not priority and fairness, request: GET:https://api.sgao-0.qe.devcluster.openshift.com:6443/apis/performance.openshift.io/v2?timeout=32s • [SLOW TEST:23.756 seconds] [Feature:Machines] Webhooks /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:21 should return an error when removing required fields from the MachineSet providerSpec /home/miyadav/cluster-api-actuator-pkg/pkg/infra/webhooks.go:135 ------------------------------ Ran 4 of 37 Specs in 371.092 seconds SUCCESS! -- 4 Passed | 0 Failed | 0 Pending | 33 Skipped PASS You're using deprecated Ginkgo functionality: ============================================= Ginkgo 2.0 is under active development and will introduce several new features, improvements, and a small handful of breaking changes. A release candidate for 2.0 is now available and 2.0 should GA in Fall 2021. Please give the RC a try and send us feedback! - To learn more, view the migration guide at https://github.com/onsi/ginkgo/blob/ver2/docs/MIGRATING_TO_V2.md - For instructions on using the Release Candidate visit https://github.com/onsi/ginkgo/blob/ver2/docs/MIGRATING_TO_V2.md#using-the-beta - To comment, chime in at https://github.com/onsi/ginkgo/issues/711 You are using a custom reporter. Support for custom reporters will likely be removed in V2. Most users were using them to generate junit or teamcity reports and this functionality will be merged into the core reporter. In addition, Ginkgo 2.0 will support emitting a JSON-formatted report that users can then manipulate to generate custom reports. If this change will be impactful to you please leave a comment on https://github.com/onsi/ginkgo/issues/711 Learn more at: https://github.com/onsi/ginkgo/blob/ver2/docs/MIGRATING_TO_V2.md#removed-custom-reporters To silence deprecations that can be silenced set the following environment variable: ACK_GINKGO_DEPRECATIONS=1.16.5 Ginkgo ran 1 suite in 6m56.348282187s Test Suite Passed [miyadav@miyadav cluster-api-actuator-pkg] Additional info : Before and after run , machineset remained same - [miyadav@miyadav cluster-api-actuator-pkg]$ oc get machineset -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE sgao-0-6v47t-worker 2 2 2 2 55m [miyadav@miyadav cluster-api-actuator-pkg]$ oc get machineset -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE sgao-0-6v47t-worker 2 2 2 2 60m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.11.3 packages and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6287