Description of problem: Couldn't delete the machine resource if the machine has no labels or providerSpec filed. Version-Release number of selected component (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.1 True False 20h Cluster version is 4.0.0-0.1 How reproducible: Always Steps to Reproduce: 1. Create a machine without labels apiVersion: cluster.k8s.io/v1alpha1 kind: Machine metadata: finalizers: - machine.cluster.k8s.io name: machine-fail1 namespace: openshift-cluster-api spec: metadata: creationTimestamp: null providerSpec: value: ami: arn: null filters: null id: ami-085b89e82b74a76b5 apiVersion: awsproviderconfig.k8s.io/v1alpha1 credentialsSecret: null deviceIndex: 0 iamInstanceProfile: arn: null filters: null id: qe-jialiu-worker-profile instanceType: m4.large keyName: null kind: AWSMachineProviderConfig loadBalancers: null metadata: creationTimestamp: null placement: availabilityZone: us-east-2a region: us-east-2 publicIp: null securityGroups: - arn: null filters: - name: tag:Name values: - qe-jialiu_worker_sg id: null subnet: arn: null filters: - name: tag:Name values: - qe-jialiu-worker-us-east-2a id: null tags: - name: openshiftClusterID value: d9f17038-4b08-42e1-8773-cf36a4375a15 - name: kubernetes.io/cluster/qe-jialiu value: owned userDataSecret: name: worker-user-data versions: kubelet: "" 2. Create a machine without providerSpec field apiVersion: cluster.k8s.io/v1alpha1 kind: Machine metadata: finalizers: - machine.cluster.k8s.io name: machine-fail namespace: openshift-cluster-api labels: sigs.k8s.io/cluster-api-cluster: qe-jialiu sigs.k8s.io/cluster-api-machine-role: worker sigs.k8s.io/cluster-api-machine-type: worker spec: metadata: creationTimestamp: null providerSpec: {} versions: kubelet: "" 3. Delete machine Actual results: Machine has no labels or providerSpec field can not be deleted. $ oc create -f machine-fail.yaml machine.cluster.k8s.io/machine-fail created $ oc create -f machine-fail1.yaml machine.cluster.k8s.io/machine-fail1 created $ oc get machine NAME INSTANCE STATE TYPE REGION ZONE AGE machine-fail 31s machine-fail1 10s qe-jialiu-master-0 i-0bec598c03bfd2867 running m4.large us-east-2 us-east-2a 20h qe-jialiu-master-1 i-02ca41986fcf4d381 running m4.large us-east-2 us-east-2b 20h qe-jialiu-master-2 i-0260216e16c48bca2 running m4.large us-east-2 us-east-2c 20h qe-jialiu-worker-us-east-2a-ccq9h i-0e3190e78fb9ba6b6 running m4.large us-east-2 us-east-2a 2m qe-jialiu-worker-us-east-2a-z5zj6 i-0ffe4b01024c56625 running m4.large us-east-2 us-east-2a 19h qe-jialiu-worker-us-east-2b-nq95l i-087c675599175acc9 running m4.large us-east-2 us-east-2b 20h qe-jialiu-worker-us-east-2c-bj6c4 i-0d3b7a7d9de2c3ca7 running m4.large us-east-2 us-east-2c 8m $ oc delete machine machine-fail machine.cluster.k8s.io "machine-fail" deleted ^C $ oc delete machine machine-fail1 machine.cluster.k8s.io "machine-fail" deleted ^C $ oc logs -f clusterapi-manager-controllers-5d7f7b954c-mfwlp -c machine-controller I0116 02:11:37.155254 1 actuator.go:401] checking if machine exists E0116 02:11:37.155280 1 actuator.go:438] error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set E0116 02:11:37.155291 1 actuator.go:405] error getting running instances: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set E0116 02:11:37.155299 1 controller.go:166] Error checking existence of machine instance for machine object machine-fail; unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set I0116 02:11:55.445145 1 actuator.go:236] deleting machine E0116 02:11:55.445372 1 actuator.go:104] Machine error: error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set E0116 02:11:55.448953 1 actuator.go:238] error deleting machine: error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set E0116 02:11:55.449026 1 controller.go:141] Error deleting machine object machine-fail; error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set I0116 02:11:57.635584 1 actuator.go:236] deleting machine E0116 02:11:57.635649 1 actuator.go:104] Machine error: error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set E0116 02:11:57.635658 1 actuator.go:238] error deleting machine: error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set E0116 02:11:57.635668 1 controller.go:141] Error deleting machine object machine-fail; error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set I0116 02:29:41.666344 1 actuator.go:401] checking if machine exists E0116 02:29:41.666600 1 actuator.go:405] error getting running instances: unable to get cluster ID for machine: "machine-fail1" E0116 02:29:41.666667 1 controller.go:166] Error checking existence of machine instance for machine object machine-fail1; unable to get cluster ID for machine: "machine-fail1" I0116 02:29:46.537798 1 actuator.go:236] deleting machine E0116 02:29:46.542046 1 actuator.go:310] error getting running instances: unable to get cluster ID for machine: "machine-fail1" E0116 02:29:46.542118 1 actuator.go:238] error deleting machine: unable to get cluster ID for machine: "machine-fail1" E0116 02:29:46.542172 1 controller.go:141] Error deleting machine object machine-fail1; unable to get cluster ID for machine: "machine-fail1" I0116 02:29:51.907031 1 actuator.go:236] deleting machine E0116 02:29:51.909967 1 actuator.go:310] error getting running instances: unable to get cluster ID for machine: "machine-fail1" E0116 02:29:51.909996 1 actuator.go:238] error deleting machine: unable to get cluster ID for machine: "machine-fail1" E0116 02:29:51.910009 1 controller.go:141] Error deleting machine object machine-fail1; unable to get cluster ID for machine: "machine-fail1" Expected results: Failed machine could be deleted Additional info:
Instead of allowing to remove a machine with not provider spec set, we need to avoid creating such machines. Upstream PR: https://github.com/openshift/machine-api-operator/pull/178
*** Bug 1666556 has been marked as a duplicate of this bug. ***
The only sane way how to check for missing labels is to use webhook validation. Until that time, this issue can not be resolved since it's perfectly fine to create a machine with missing labels since it's not possible to check labels on the machine CRD definition level.
moving to 4.2 to include in webhook validations
verified. create machines with no labels or providerSpec filed. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.0-0.nightly-2019-07-10-210957 True False 3h48m Cluster version is 4.1.0-0.nightly-2019-07-10-210957 $ oc get machine NAME INSTANCE STATE TYPE REGION ZONE AGE qe-zhsun-1-2wd26-master-0 i-0c8dd7b50f63d2125 running m4.xlarge ap-northeast-1 ap-northeast-1a 4h6m qe-zhsun-1-2wd26-master-1 i-0e7d28874a7c9dc0e running m4.xlarge ap-northeast-1 ap-northeast-1c 4h6m qe-zhsun-1-2wd26-master-2 i-0e46254f17deca874 running m4.xlarge ap-northeast-1 ap-northeast-1d 4h6m qe-zhsun-1-2wd26-worker-ap-northeast-1a 42s qe-zhsun-1-2wd26-worker-ap-northeast-1a-5xvff i-07a97a1699563a4aa running m5.xlarge ap-northeast-1 ap-northeast-1a 4h4m qe-zhsun-1-2wd26-worker-ap-northeast-1c-d2tzt i-07bc8e9556cfd5a5b running m5.xlarge ap-northeast-1 ap-northeast-1c 4h4m qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9 i-01286f8d41249b7bc running m5.xlarge ap-northeast-1 ap-northeast-1d 94m qe-zhsun-1-2wd26-worker-ap-northeast-1d m5.xlarge ap-northeast-1 ap-northeast-1d 6s $ oc delete machine qe-zhsun-1-2wd26-worker-ap-northeast-1a machine.machine.openshift.io "qe-zhsun-1-2wd26-worker-ap-northeast-1a" deleted $ oc delete machine qe-zhsun-1-2wd26-worker-ap-northeast-1d machine.machine.openshift.io "qe-zhsun-1-2wd26-worker-ap-northeast-1d" deleted $ oc get event LAST SEEN TYPE REASON OBJECT MESSAGE 4m19s Normal Updated machine/qe-zhsun-1-2wd26-master-0 Updated machine qe-zhsun-1-2wd26-master-0 4m18s Normal Updated machine/qe-zhsun-1-2wd26-master-1 Updated machine qe-zhsun-1-2wd26-master-1 4m17s Normal Updated machine/qe-zhsun-1-2wd26-master-2 Updated machine qe-zhsun-1-2wd26-master-2 4m17s Normal Updated machine/qe-zhsun-1-2wd26-worker-ap-northeast-1a-5xvff Updated machine qe-zhsun-1-2wd26-worker-ap-northeast-1a-5xvff 50s Normal Created machine/qe-zhsun-1-2wd26-worker-ap-northeast-1a-a-2rktq Created Machine qe-zhsun-1-2wd26-worker-ap-northeast-1a-a-2rktq 29s Normal Updated machine/qe-zhsun-1-2wd26-worker-ap-northeast-1a-a-2rktq Updated machine qe-zhsun-1-2wd26-worker-ap-northeast-1a-a-2rktq 8m57s Warning FailedValidate machine/qe-zhsun-1-2wd26-worker-ap-northeast-1a "qe-zhsun-1-2wd26-worker-ap-northeast-1a" machine validation failed: spec.spec.providerspec: Invalid value: v1beta1.ProviderSpec{Value:(*runtime.RawExtension)(nil)}: value field must be set 7m31s Warning FailedValidate machine/qe-zhsun-1-2wd26-worker-ap-northeast-1a "qe-zhsun-1-2wd26-worker-ap-northeast-1a" machine validation failed: spec.spec.providerspec: Invalid value: v1beta1.ProviderSpec{Value:(*runtime.RawExtension)(nil)}: value field must be set 4m17s Normal Updated machine/qe-zhsun-1-2wd26-worker-ap-northeast-1c-d2tzt Updated machine qe-zhsun-1-2wd26-worker-ap-northeast-1c-d2tzt 104m Normal Updated machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d-klfjx Updated machine qe-zhsun-1-2wd26-worker-ap-northeast-1d-klfjx 101m Normal Deleted machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d-klfjx Deleted machine qe-zhsun-1-2wd26-worker-ap-northeast-1d-klfjx 84m Warning FailedCreate machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9 CreateError 84m Normal Created machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9 Created Machine qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9 4m21s Normal Updated machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9 Updated machine qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9 6m3s Warning FailedValidate machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d "qe-zhsun-1-2wd26-worker-ap-northeast-1d" machine validation failed: spec.labels: Invalid value: map[string]string(nil): missing machine.openshift.io/cluster-api-cluster label.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922