Version: ./openshift-install 4.11.0-0.nightly-2022-05-20-213928 built from commit 69ac7528ba1c957132008ab52ccf5d9e7dad778f release image registry.ci.openshift.org/ocp/release@sha256:e0719cb528dbac58ab0462637a6016aff7ce51b12d65747121a6165c170f9373 release architecture amd64 Platform: GCP Please specify: IPI What happened? After adding one additional compute node to the cluster with its name not having the cluster infra id, then to destroy the cluster would fail to delete the 2 k8s firewall-rules and the VPC network. What did you expect to happen? Even with a compute node whose name doesn't have infra id, destroying the cluster should be able to delete all resources, including the k8s firewall-rules and VPC network. How to reproduce it (as minimally and precisely as possible)? Always. Anything else we need to know? >FYI the QE flexy-install and flexy-destroy jobs: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/105606/ https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-destroy/97154/ >the steps of launching an additional compute node using machineset: $ export KUBECONFIG=kc1 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-05-20-213928 True False 9m58s Cluster version is 4.11.0-0.nightly-2022-05-20-213928 $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-0524-12-lhccb-master-0.c.openshift-qe.internal Ready master 28m v1.23.3+ad897c4 jiwei-0524-12-lhccb-master-1.c.openshift-qe.internal Ready master 28m v1.23.3+ad897c4 jiwei-0524-12-lhccb-master-2.c.openshift-qe.internal Ready master 29m v1.23.3+ad897c4 jiwei-0524-12-lhccb-worker-a-gvjmq.c.openshift-qe.internal Ready worker 20m v1.23.3+ad897c4 jiwei-0524-12-lhccb-worker-b-zmsqj.c.openshift-qe.internal Ready worker 20m v1.23.3+ad897c4 jiwei-0524-12-lhccb-worker-c-89dr8.c.openshift-qe.internal Ready worker 20m v1.23.3+ad897c4 $ oc get machinesets -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE jiwei-0524-12-lhccb-worker-a 1 1 1 1 29m jiwei-0524-12-lhccb-worker-b 1 1 1 1 29m jiwei-0524-12-lhccb-worker-c 1 1 1 1 29m jiwei-0524-12-lhccb-worker-f 0 0 29m $ oc get machinesets jiwei-0524-12-lhccb-worker-a -n openshift-machine-api -oyaml > /tmp/ms1.yaml $ sed -i 's/jiwei-0524-12-lhccb-worker-a/hello-world/g' /tmp/ms1.yaml $ vim /tmp/ms1.yaml <-- to remove "status" section $ oc create -f /tmp/ms1.yaml machineset.machine.openshift.io/hello-world created $ date Tue 24 May 2022 10:09:33 AM UTC $ oc get machinesets -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE hello-world 1 1 11s jiwei-0524-12-lhccb-worker-a 1 1 1 1 42m jiwei-0524-12-lhccb-worker-b 1 1 1 1 42m jiwei-0524-12-lhccb-worker-c 1 1 1 1 42m jiwei-0524-12-lhccb-worker-f 0 0 42m $ oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE hello-world-48kzx Provisioned n1-standard-4 us-central1 us-central1-a 34s jiwei-0524-12-lhccb-master-0 Running n1-standard-4 us-central1 us-central1-a 42m jiwei-0524-12-lhccb-master-1 Running n1-standard-4 us-central1 us-central1-b 42m jiwei-0524-12-lhccb-master-2 Running n1-standard-4 us-central1 us-central1-c 42m jiwei-0524-12-lhccb-worker-a-gvjmq Running n1-standard-4 us-central1 us-central1-a 39m jiwei-0524-12-lhccb-worker-b-zmsqj Running n1-standard-4 us-central1 us-central1-b 39m jiwei-0524-12-lhccb-worker-c-89dr8 Running n1-standard-4 us-central1 us-central1-c 38m $ $ oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE hello-world-48kzx Running n1-standard-4 us-central1 us-central1-a 3m59s jiwei-0524-12-lhccb-master-0 Running n1-standard-4 us-central1 us-central1-a 45m jiwei-0524-12-lhccb-master-1 Running n1-standard-4 us-central1 us-central1-b 45m jiwei-0524-12-lhccb-master-2 Running n1-standard-4 us-central1 us-central1-c 45m jiwei-0524-12-lhccb-worker-a-gvjmq Running n1-standard-4 us-central1 us-central1-a 42m jiwei-0524-12-lhccb-worker-b-zmsqj Running n1-standard-4 us-central1 us-central1-b 42m jiwei-0524-12-lhccb-worker-c-89dr8 Running n1-standard-4 us-central1 us-central1-c 42m $ oc get nodes NAME STATUS ROLES AGE VERSION hello-world-48kzx.c.openshift-qe.internal Ready worker 106s v1.23.3+ad897c4 jiwei-0524-12-lhccb-master-0.c.openshift-qe.internal Ready master 45m v1.23.3+ad897c4 jiwei-0524-12-lhccb-master-1.c.openshift-qe.internal Ready master 45m v1.23.3+ad897c4 jiwei-0524-12-lhccb-master-2.c.openshift-qe.internal Ready master 45m v1.23.3+ad897c4 jiwei-0524-12-lhccb-worker-a-gvjmq.c.openshift-qe.internal Ready worker 37m v1.23.3+ad897c4 jiwei-0524-12-lhccb-worker-b-zmsqj.c.openshift-qe.internal Ready worker 37m v1.23.3+ad897c4 jiwei-0524-12-lhccb-worker-c-89dr8.c.openshift-qe.internal Ready worker 36m v1.23.3+ad897c4 $ >after the flexy-destroy job, the 2 k8s firewall-rules and the VPC network are not deleted, although manually deleting them works $ ./gcp_res_check.sh jiwei-0524-12 >>gcloud compute instances list | grep jiwei-0524-12 >>gcloud compute instance-groups list | grep jiwei-0524-12 >>gcloud compute disks list | grep jiwei-0524-12 >>gcloud compute networks list | grep jiwei-0524-12 jiwei-0524-12-lhccb-network CUSTOM REGIONAL >>gcloud compute networks subnets list | grep jiwei-0524-12 >>gcloud compute routers list | grep jiwei-0524-12 >>gcloud compute firewall-rules list | grep jiwei-0524-12 k8s-a9d84d44a0bf448549e4e4281f21f0d1-http-hc jiwei-0524-12-lhccb-network INGRESS 1000 tcp:31862 False k8s-fw-a9d84d44a0bf448549e4e4281f21f0d1 jiwei-0524-12-lhccb-network INGRESS 1000 tcp:80,tcp:443 False To show all fields of the firewall, please show in JSON format: --format=json To show all fields in table format, please see the examples in --help. >>gcloud compute health-checks list | grep jiwei-0524-12 >>gcloud compute http-health-checks list | grep jiwei-0524-12 >>gcloud compute forwarding-rules list | grep jiwei-0524-12 >>gcloud compute addresses list | grep jiwei-0524-12 >>gcloud compute target-pools list | grep jiwei-0524-12 >>gcloud compute backend-services list | grep jiwei-0524-12 >>gcloud dns managed-zones list | grep jiwei-0524-12 >>gcloud dns record-sets list --zone qe | grep jiwei-0524-12 >>gcloud iam service-accounts list | grep jiwei-0524-12 >>gcloud compute images list | grep jiwei-0524-12 >>gsutil ls | grep jiwei-0524-12 >>gcloud deployment-manager deployments list | grep jiwei-0524-12 Tue May 24 19:22:56 CST 2022 $ gcloud compute firewall-rules delete -q k8s-a9d84d44a0bf448549e4e4281f21f0d1-http-hc Deleted [https://www.googleapis.com/compute/v1/projects/openshift-qe/global/firewalls/k8s-a9d84d44a0bf448549e4e4281f21f0d1-http-hc]. $ gcloud compute firewall-rules delete -q k8s-fw-a9d84d44a0bf448549e4e4281f21f0d1 Deleted [https://www.googleapis.com/compute/v1/projects/openshift-qe/global/firewalls/k8s-fw-a9d84d44a0bf448549e4e4281f21f0d1]. $ gcloud compute networks delete -q jiwei-0524-12-lhccb-network Deleted [https://www.googleapis.com/compute/v1/projects/openshift-qe/global/networks/jiwei-0524-12-lhccb-network]. $
Hello, I am opening up a discussion based on the results that I have found investigating this BZ. When an Machine is created without the infra-ID as the prefix, it is compared with the cluster name to determine if it should be deleted. It find that these do NOT match and thus stops the deletion of this resource. Actually, it appears to stop the deletion of all of the TargetPools for GCP. (Question #1, should we delete the resources that are definitely part of the cluster on destroy, even though some may fail the current checks?) The reason that these resources are unable to be destroyed (firewalls ... etc.) is because they are attached to the resource(s) that could not be removed. They will never be able to be removed if they are still in use. (Question #2, should we delete resources that were created day 2? ). If the operator creates resources, the installer shouldn't make assumptions about their use and remove them should it?
@jianli wei, forgot to CC you on that comment
Tested with the build (https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1533626873861378048) genreated by slack App "Cluster Bot" for the PR https://github.com/openshift/installer/pull/5965, no the issue any more. > 1. launch the cluster https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/109236/ (SUCCESS) LAUNCHER_VARS installer_payload_image: registry.build01.ci.openshift.org/ci-ln-zz49qtk/release:latest > 2. scale-up using machineset's yaml to launch one additional compute node whose name doesn't have cluster infra id $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.ci.test-2022-06-06-025004-ci-ln-zz49qtk-latest True False 9m41s Cluster version is 4.11.0-0.ci.test-2022-06-06-025004-ci-ln-zz49qtk-latest $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-openshift-gtsth-master-0.c.openshift-qe.internal Ready master 35m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-master-1.c.openshift-qe.internal Ready master 34m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-master-2.c.openshift-qe.internal Ready master 34m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-a-xqlqm.c.openshift-qe.internal Ready worker 19m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-b-7m86p.c.openshift-qe.internal Ready worker 19m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-c-7g5x8.c.openshift-qe.internal Ready worker 19m v1.24.0+bb9c2f1 $ oc get machinesets -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE jiwei-openshift-gtsth-worker-a 1 1 1 1 35m jiwei-openshift-gtsth-worker-b 1 1 1 1 35m jiwei-openshift-gtsth-worker-c 1 1 1 1 35m jiwei-openshift-gtsth-worker-f 0 0 35m $ oc get machinesets jiwei-openshift-gtsth-worker-a -n openshift-machine-api -oyaml > /tmp/ms1.yaml $ sed -i 's/jiwei-openshift-gtsth-worker-a/hello-world/g' /tmp/ms1.yaml $ vim /tmp/ms1.yaml $ oc create -f /tmp/ms1.yaml machineset.machine.openshift.io/hello-world created $ oc get machinesets -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE hello-world 1 1 59s jiwei-openshift-gtsth-worker-a 1 1 1 1 38m jiwei-openshift-gtsth-worker-b 1 1 1 1 38m jiwei-openshift-gtsth-worker-c 1 1 1 1 38m jiwei-openshift-gtsth-worker-f 0 0 38m $ oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE hello-world-wjg6h Provisioned n1-standard-4 us-central1 us-central1-a 68s jiwei-openshift-gtsth-master-0 Running n1-standard-4 us-central1 us-central1-a 38m jiwei-openshift-gtsth-master-1 Running n1-standard-4 us-central1 us-central1-b 38m jiwei-openshift-gtsth-master-2 Running n1-standard-4 us-central1 us-central1-c 38m jiwei-openshift-gtsth-worker-a-xqlqm Running n1-standard-4 us-central1 us-central1-a 34m jiwei-openshift-gtsth-worker-b-7m86p Running n1-standard-4 us-central1 us-central1-b 34m jiwei-openshift-gtsth-worker-c-7g5x8 Running n1-standard-4 us-central1 us-central1-c 34m $ $ oc get machines -n openshift-machine-api NAME PHASE TYPE REGION ZONE AGE hello-world-wjg6h Running n1-standard-4 us-central1 us-central1-a 3m24s jiwei-openshift-gtsth-master-0 Running n1-standard-4 us-central1 us-central1-a 40m jiwei-openshift-gtsth-master-1 Running n1-standard-4 us-central1 us-central1-b 40m jiwei-openshift-gtsth-master-2 Running n1-standard-4 us-central1 us-central1-c 40m jiwei-openshift-gtsth-worker-a-xqlqm Running n1-standard-4 us-central1 us-central1-a 36m jiwei-openshift-gtsth-worker-b-7m86p Running n1-standard-4 us-central1 us-central1-b 36m jiwei-openshift-gtsth-worker-c-7g5x8 Running n1-standard-4 us-central1 us-central1-c 36m $ oc get nodes NAME STATUS ROLES AGE VERSION hello-world-wjg6h.c.openshift-qe.internal Ready worker 57s v1.24.0+bb9c2f1 jiwei-openshift-gtsth-master-0.c.openshift-qe.internal Ready master 40m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-master-1.c.openshift-qe.internal Ready master 40m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-master-2.c.openshift-qe.internal Ready master 39m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-a-xqlqm.c.openshift-qe.internal Ready worker 25m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-b-7m86p.c.openshift-qe.internal Ready worker 24m v1.24.0+bb9c2f1 jiwei-openshift-gtsth-worker-c-7g5x8.c.openshift-qe.internal Ready worker 25m v1.24.0+bb9c2f1 $ > 3. destroy the cluster https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-destroy/101055/ (SUCCESS) > 4. check the cluster's resources on GCP and got nothing left-over $ ./gcp_res_check.sh jiwei-openshift-gtsth >>gcloud compute instances list | grep jiwei-openshift-gtsth >>gcloud compute instance-groups list | grep jiwei-openshift-gtsth >>gcloud compute disks list | grep jiwei-openshift-gtsth >>gcloud compute networks list | grep jiwei-openshift-gtsth >>gcloud compute networks subnets list | grep jiwei-openshift-gtsth >>gcloud compute routers list | grep jiwei-openshift-gtsth >>gcloud compute firewall-rules list | grep jiwei-openshift-gtsth To show all fields of the firewall, please show in JSON format: --format=json To show all fields in table format, please see the examples in --help. >>gcloud compute health-checks list | grep jiwei-openshift-gtsth >>gcloud compute http-health-checks list | grep jiwei-openshift-gtsth >>gcloud compute forwarding-rules list | grep jiwei-openshift-gtsth >>gcloud compute addresses list | grep jiwei-openshift-gtsth >>gcloud compute target-pools list | grep jiwei-openshift-gtsth >>gcloud compute backend-services list | grep jiwei-openshift-gtsth >>gcloud dns managed-zones list | grep jiwei-openshift-gtsth >>gcloud dns record-sets list --zone qe | grep jiwei-openshift-gtsth >>gcloud iam service-accounts list | grep jiwei-openshift-gtsth >>gcloud compute images list | grep jiwei-openshift-gtsth >>gsutil ls | grep jiwei-openshift-gtsth >>gcloud deployment-manager deployments list | grep jiwei-openshift-gtsth Mon Jun 6 14:14:32 CST 2022 $
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069