Bug 1744908
| Summary: | [Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Suite:openshift/conformance/serial] | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | scheng |
| Component: | Cloud Compute | Assignee: | Jan Chaloupka <jchaloup> |
| Status: | CLOSED ERRATA | QA Contact: | Jianwei Hou <jhou> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.2.0 | CC: | agarcial, wsun |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | azure | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-16 06:37:21 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
scheng
2019-08-23 08:22:30 UTC
I0823 05:57:02.704074 1 controller.go:205] Reconciling machine "ci-op-4inqf4cw-3a8ca-666zl-worker-centralus1-cz5td" triggers delete I0823 06:00:14.863085 1 controller.go:239] Machine "ci-op-4inqf4cw-3a8ca-666zl-worker-centralus1-cz5td" deletion successful The deletion on Azure just takes longer than on AWS. So even 2 minutes is not sufficient. The test appears flaky(passed 6 times, failed 4 times), sometimes it failed due to timeout. https://testgrid.k8s.io/redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-azure-serial-4.2&sort-by-flakiness= STEP: waiting for cluster to get back to original size. Final size should be 3 worker nodes STEP: got 6 nodes, expecting 3 STEP: got 6 nodes, expecting 3 STEP: got 6 nodes, expecting 3 STEP: got 6 nodes, expecting 3 STEP: got 6 nodes, expecting 3 STEP: got 6 nodes, expecting 3 STEP: got 6 nodes, expecting 3 STEP: got 6 nodes, expecting 3 ........... STEP: got 4 nodes, expecting 3 STEP: got 4 nodes, expecting 3 STEP: got 4 nodes, expecting 3 STEP: got 4 nodes, expecting 3 STEP: got 4 nodes, expecting 3 Aug 28 05:55:02.613: INFO: Running AfterSuite actions on all nodes Aug 28 05:55:02.613: INFO: Running AfterSuite actions on node 1 fail [github.com/openshift/origin/test/extended/machines/scale.go:221]: Timed out after 240.000s. From https://console.cloud.google.com/storage/browser/_details/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/61/artifacts/e2e-azure-serial/container-logs/test.log: ``` Aug 28 05:55:02.613: INFO: Running AfterSuite actions on all nodes ``` From https://console.cloud.google.com/storage/browser/_details/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/61/artifacts/e2e-azure-serial/pods/openshift-machine-api_machine-api-controllers-564f659496-grhfp_machine-controller.log: ``` I0828 05:51:37.299415 1 controller.go:205] Reconciling machine "ci-op-1zzgtx6m-3a8ca-trcpx-worker-centralus3-fvmdx" triggers delete I0828 05:52:39.091238 1 controller.go:302] drain successful for machine "ci-op-1zzgtx6m-3a8ca-trcpx-worker-centralus3-fvmdx" I0828 05:55:41.244092 1 virtualmachines.go:242] successfully deleted vm ci-op-1zzgtx6m-3a8ca-trcpx-worker-centralus3-fvmdx I0828 05:56:41.832275 1 disks.go:65] successfully deleted disk ci-op-1zzgtx6m-3a8ca-trcpx-worker-centralus3-fvmdx_OSDisk I0828 05:56:52.253019 1 networkinterfaces.go:197] successfully deleted nic ci-op-1zzgtx6m-3a8ca-trcpx-worker-centralus3-fvmdx-nic I0828 05:56:52.328967 1 controller.go:239] Machine "ci-op-1zzgtx6m-3a8ca-trcpx-worker-centralus3-fvmdx" deletion successful ``` The last node was deleted almost 2 minutes after the timeout. It took smth over 3 minutes to delete the vm resource in Azure. In total, 5m15s to delete the last machine. So even 6 minutes timeout does not have to enough if two machines are requested to be deleted. The last five tests have all passed. The test is stable and reliable, mark as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |