Bug 2090049
Summary: | destroying GCP cluster which has a compute node without infra id in name would fail to delete 2 k8s firewall-rules and VPC network | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jianli Wei <jiwei> |
Component: | Installer | Assignee: | Brent Barbachem <bbarbach> |
Installer sub component: | openshift-installer | QA Contact: | Jianli Wei <jiwei> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | CC: | bbarbach, gpei |
Version: | 4.11 | ||
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 11:14:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jianli Wei
2022-05-25 02:23:24 UTC
Hello, I am opening up a discussion based on the results that I have found investigating this BZ. When an Machine is created without the infra-ID as the prefix, it is compared with the cluster name to determine if it should be deleted. It find that these do NOT match and thus stops the deletion of this resource. Actually, it appears to stop the deletion of all of the TargetPools for GCP. (Question #1, should we delete the resources that are definitely part of the cluster on destroy, even though some may fail the current checks?) The reason that these resources are unable to be destroyed (firewalls ... etc.) is because they are attached to the resource(s) that could not be removed. They will never be able to be removed if they are still in use. (Question #2, should we delete resources that were created day 2? ). If the operator creates resources, the installer shouldn't make assumptions about their use and remove them should it? @jianli wei, forgot to CC you on that comment Tested with the build (https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1533626873861378048) genreated by slack App "Cluster Bot" for the PR https://github.com/openshift/installer/pull/5965, no the issue any more. > 1. launch the cluster https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/109236/ (SUCCESS) LAUNCHER_VARS installer_payload_image: registry.build01.ci.openshift.org/ci-ln-zz49qtk/release:latest > 2. scale-up using machineset's yaml to launch one additional compute node whose name doesn't have cluster infra id $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.ci.test-2022-06-06-025004-ci-ln-zz49qtk-latest True False 9m41s Cluster version is 4.11.0-0.ci.test-2022-06-06-025004-ci-ln-zz49qtk-latest $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-openshift-gtsth-master-0.c.openshift-qe.internal Ready master 35m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-master-1.c.openshift-qe.internal Ready master 34m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-master-2.c.openshift-qe.internal Ready master 34m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-a-xqlqm.c.openshift-qe.internal Ready worker 19m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-b-7m86p.c.openshift-qe.internal Ready worker 19m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-c-7g5x8.c.openshift-qe.internal Ready worker 19m v1.24.0+bb9c2f1 $ oc get machinesets -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE jiwei-openshift-gtsth-worker-a 1 1 1 1 35m jiwei-openshift-gtsth-worker-b 1 1 1 1 35m jiwei-openshift-gtsth-worker-c 1 1 1 1 35m jiwei-openshift-gtsth-worker-f 0 0 35m $ oc get machinesets jiwei-openshift-gtsth-worker-a -n openshift-machine-api -oyaml > /tmp/ms1.yaml $ sed -i 's/jiwei-openshift-gtsth-worker-a/hello-world/g' /tmp/ms1.yaml $ vim /tmp/ms1.yaml $ oc create -f /tmp/ms1.yaml machineset.machine.openshift.io/hello-world created $ oc get machinesets -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE hello-world 1 1 59s jiwei-openshift-gtsth-worker-a 1 1 1 1 38m jiwei-openshift-gtsth-worker-b 1 1 1 1 38m jiwei-openshift-gtsth-worker-c 1 1 1 1 38m jiwei-openshift-gtsth-worker-f 0 0 38m $ oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE hello-world-wjg6h Provisioned n1-standard-4 us-central1 us-central1-a 68s jiwei-openshift-gtsth-master-0 Running n1-standard-4 us-central1 us-central1-a 38m jiwei-openshift-gtsth-master-1 Running n1-standard-4 us-central1 us-central1-b 38m jiwei-openshift-gtsth-master-2 Running n1-standard-4 us-central1 us-central1-c 38m jiwei-openshift-gtsth-worker-a-xqlqm Running n1-standard-4 us-central1 us-central1-a 34m jiwei-openshift-gtsth-worker-b-7m86p Running n1-standard-4 us-central1 us-central1-b 34m jiwei-openshift-gtsth-worker-c-7g5x8 Running n1-standard-4 us-central1 us-central1-c 34m $ $ oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE hello-world-wjg6h Running n1-standard-4 us-central1 us-central1-a 3m24s jiwei-openshift-gtsth-master-0 Running n1-standard-4 us-central1 us-central1-a 40m jiwei-openshift-gtsth-master-1 Running n1-standard-4 us-central1 us-central1-b 40m jiwei-openshift-gtsth-master-2 Running n1-standard-4 us-central1 us-central1-c 40m jiwei-openshift-gtsth-worker-a-xqlqm Running n1-standard-4 us-central1 us-central1-a 36m jiwei-openshift-gtsth-worker-b-7m86p Running n1-standard-4 us-central1 us-central1-b 36m jiwei-openshift-gtsth-worker-c-7g5x8 Running n1-standard-4 us-central1 us-central1-c 36m $ oc get nodes NAME STATUS ROLES AGE VERSION hello-world-wjg6h.c.openshift-qe.internal Ready worker 57s v1.24.0+bb9c2f1 jiwei-openshift-gtsth-master-0.c.openshift-qe.internal Ready master 40m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-master-1.c.openshift-qe.internal Ready master 40m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-master-2.c.openshift-qe.internal Ready master 39m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-a-xqlqm.c.openshift-qe.internal Ready worker 25m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-b-7m86p.c.openshift-qe.internal Ready worker 24m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-c-7g5x8.c.openshift-qe.internal Ready worker 25m v1.24.0+bb9c2f1 $ > 3. destroy the cluster https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-destroy/101055/ (SUCCESS) > 4. check the cluster's resources on GCP and got nothing left-over $ ./gcp_res_check.sh jiwei-openshift-gtsth >>gcloud compute instances list | grep jiwei-openshift-gtsth >>gcloud compute instance-groups list | grep jiwei-openshift-gtsth >>gcloud compute disks list | grep jiwei-openshift-gtsth >>gcloud compute networks list | grep jiwei-openshift-gtsth >>gcloud compute networks subnets list | grep jiwei-openshift-gtsth >>gcloud compute routers list | grep jiwei-openshift-gtsth >>gcloud compute firewall-rules list | grep jiwei-openshift-gtsth To show all fields of the firewall, please show in JSON format: --format=json To show all fields in table format, please see the examples in --help. >>gcloud compute health-checks list | grep jiwei-openshift-gtsth >>gcloud compute http-health-checks list | grep jiwei-openshift-gtsth >>gcloud compute forwarding-rules list | grep jiwei-openshift-gtsth >>gcloud compute addresses list | grep jiwei-openshift-gtsth >>gcloud compute target-pools list | grep jiwei-openshift-gtsth >>gcloud compute backend-services list | grep jiwei-openshift-gtsth >>gcloud dns managed-zones list | grep jiwei-openshift-gtsth >>gcloud dns record-sets list --zone qe | grep jiwei-openshift-gtsth >>gcloud iam service-accounts list | grep jiwei-openshift-gtsth >>gcloud compute images list | grep jiwei-openshift-gtsth >>gsutil ls | grep jiwei-openshift-gtsth >>gcloud deployment-manager deployments list | grep jiwei-openshift-gtsth Mon Jun 6 14:14:32 CST 2022 $ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |