Bug 1666558 - [cloud] Failed to delete machine that has no label or providerSpec
Summary: [cloud] Failed to delete machine that has no label or providerSpec
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.2.0
Assignee: Jan Chaloupka
QA Contact: sunzhaohua
URL:
Whiteboard:
: 1666556 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-16 03:09 UTC by sunzhaohua
Modified: 2019-10-16 06:27 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:27:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:27:56 UTC

Description sunzhaohua 2019-01-16 03:09:32 UTC
Description of problem:
Couldn't delete the machine resource if the machine has no labels or providerSpec filed.

Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION     AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.1   True        False         20h       Cluster version is 4.0.0-0.1

How reproducible:
Always

Steps to Reproduce:
1. Create a machine without labels
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  finalizers:
  - machine.cluster.k8s.io
  name: machine-fail1
  namespace: openshift-cluster-api
spec:
  metadata:
    creationTimestamp: null
  providerSpec:
    value:
      ami:
        arn: null
        filters: null
        id: ami-085b89e82b74a76b5
      apiVersion: awsproviderconfig.k8s.io/v1alpha1
      credentialsSecret: null
      deviceIndex: 0
      iamInstanceProfile:
        arn: null
        filters: null
        id: qe-jialiu-worker-profile
      instanceType: m4.large
      keyName: null
      kind: AWSMachineProviderConfig
      loadBalancers: null
      metadata:
        creationTimestamp: null
      placement:
        availabilityZone: us-east-2a
        region: us-east-2
      publicIp: null
      securityGroups:
      - arn: null
        filters:
        - name: tag:Name
          values:
          - qe-jialiu_worker_sg
        id: null
      subnet:
        arn: null
        filters:
        - name: tag:Name
          values:
          - qe-jialiu-worker-us-east-2a
        id: null
      tags:
      - name: openshiftClusterID
        value: d9f17038-4b08-42e1-8773-cf36a4375a15
      - name: kubernetes.io/cluster/qe-jialiu
        value: owned
      userDataSecret:
        name: worker-user-data
  versions:
    kubelet: ""
    
2. Create a machine without providerSpec field 
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  finalizers:
  - machine.cluster.k8s.io
  name: machine-fail
  namespace: openshift-cluster-api
  labels:
    sigs.k8s.io/cluster-api-cluster: qe-jialiu
    sigs.k8s.io/cluster-api-machine-role: worker
    sigs.k8s.io/cluster-api-machine-type: worker
spec:
  metadata:
    creationTimestamp: null
  providerSpec: {}
  versions:
    kubelet: ""
    
3. Delete machine 

Actual results:
Machine has no labels or providerSpec field can not be deleted.


$ oc create -f machine-fail.yaml 
machine.cluster.k8s.io/machine-fail created
$ oc create -f machine-fail1.yaml 
machine.cluster.k8s.io/machine-fail1 created

$ oc get machine
NAME                                INSTANCE              STATE     TYPE       REGION      ZONE         AGE
machine-fail                                                                                            31s
machine-fail1                                                                                           10s
qe-jialiu-master-0                  i-0bec598c03bfd2867   running   m4.large   us-east-2   us-east-2a   20h
qe-jialiu-master-1                  i-02ca41986fcf4d381   running   m4.large   us-east-2   us-east-2b   20h
qe-jialiu-master-2                  i-0260216e16c48bca2   running   m4.large   us-east-2   us-east-2c   20h
qe-jialiu-worker-us-east-2a-ccq9h   i-0e3190e78fb9ba6b6   running   m4.large   us-east-2   us-east-2a   2m
qe-jialiu-worker-us-east-2a-z5zj6   i-0ffe4b01024c56625   running   m4.large   us-east-2   us-east-2a   19h
qe-jialiu-worker-us-east-2b-nq95l   i-087c675599175acc9   running   m4.large   us-east-2   us-east-2b   20h
qe-jialiu-worker-us-east-2c-bj6c4   i-0d3b7a7d9de2c3ca7   running   m4.large   us-east-2   us-east-2c   8m
 
$ oc delete machine machine-fail
machine.cluster.k8s.io "machine-fail" deleted
^C

$ oc delete machine machine-fail1
machine.cluster.k8s.io "machine-fail" deleted
^C

$ oc logs -f clusterapi-manager-controllers-5d7f7b954c-mfwlp -c machine-controller
I0116 02:11:37.155254       1 actuator.go:401] checking if machine exists
E0116 02:11:37.155280       1 actuator.go:438] error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set
E0116 02:11:37.155291       1 actuator.go:405] error getting running instances: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set
E0116 02:11:37.155299       1 controller.go:166] Error checking existence of machine instance for machine object machine-fail; unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set
I0116 02:11:55.445145       1 actuator.go:236] deleting machine
E0116 02:11:55.445372       1 actuator.go:104] Machine error: error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set
E0116 02:11:55.448953       1 actuator.go:238] error deleting machine: error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set
E0116 02:11:55.449026       1 controller.go:141] Error deleting machine object machine-fail; error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set
I0116 02:11:57.635584       1 actuator.go:236] deleting machine
E0116 02:11:57.635649       1 actuator.go:104] Machine error: error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set
E0116 02:11:57.635658       1 actuator.go:238] error deleting machine: error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set
E0116 02:11:57.635668       1 controller.go:141] Error deleting machine object machine-fail; error decoding MachineProviderConfig: unable to find machine provider config: neither Spec.ProviderSpec.Value nor Spec.ProviderSpec.ValueFrom set


I0116 02:29:41.666344       1 actuator.go:401] checking if machine exists
E0116 02:29:41.666600       1 actuator.go:405] error getting running instances: unable to get cluster ID for machine: "machine-fail1"
E0116 02:29:41.666667       1 controller.go:166] Error checking existence of machine instance for machine object machine-fail1; unable to get cluster ID for machine: "machine-fail1"
I0116 02:29:46.537798       1 actuator.go:236] deleting machine
E0116 02:29:46.542046       1 actuator.go:310] error getting running instances: unable to get cluster ID for machine: "machine-fail1"
E0116 02:29:46.542118       1 actuator.go:238] error deleting machine: unable to get cluster ID for machine: "machine-fail1"
E0116 02:29:46.542172       1 controller.go:141] Error deleting machine object machine-fail1; unable to get cluster ID for machine: "machine-fail1"
I0116 02:29:51.907031       1 actuator.go:236] deleting machine
E0116 02:29:51.909967       1 actuator.go:310] error getting running instances: unable to get cluster ID for machine: "machine-fail1"
E0116 02:29:51.909996       1 actuator.go:238] error deleting machine: unable to get cluster ID for machine: "machine-fail1"
E0116 02:29:51.910009       1 controller.go:141] Error deleting machine object machine-fail1; unable to get cluster ID for machine: "machine-fail1"


Expected results:
Failed machine could be deleted

Additional info:

Comment 1 Jan Chaloupka 2019-01-16 11:07:54 UTC
Instead of allowing to remove a machine with not provider spec set, we need to avoid creating such machines.

Upstream PR: https://github.com/openshift/machine-api-operator/pull/178

Comment 2 Samuel Padgett 2019-01-17 20:30:06 UTC
*** Bug 1666556 has been marked as a duplicate of this bug. ***

Comment 4 Jan Chaloupka 2019-04-01 15:28:20 UTC
The only sane way how to check for missing labels is to use webhook validation. Until that time, this issue can not be resolved since it's perfectly fine to create a machine with missing labels since it's not possible to check labels on the machine CRD definition level.

Comment 5 Alberto 2019-04-08 13:08:13 UTC
moving to 4.2 to include in webhook validations

Comment 6 sunzhaohua 2019-07-12 08:06:52 UTC
verified.

create machines with no labels or providerSpec filed.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-07-10-210957   True        False         3h48m   Cluster version is 4.1.0-0.nightly-2019-07-10-210957

$ oc get machine
NAME                                            INSTANCE              STATE     TYPE        REGION           ZONE              AGE
qe-zhsun-1-2wd26-master-0                       i-0c8dd7b50f63d2125   running   m4.xlarge   ap-northeast-1   ap-northeast-1a   4h6m
qe-zhsun-1-2wd26-master-1                       i-0e7d28874a7c9dc0e   running   m4.xlarge   ap-northeast-1   ap-northeast-1c   4h6m
qe-zhsun-1-2wd26-master-2                       i-0e46254f17deca874   running   m4.xlarge   ap-northeast-1   ap-northeast-1d   4h6m
qe-zhsun-1-2wd26-worker-ap-northeast-1a                                                                                        42s
qe-zhsun-1-2wd26-worker-ap-northeast-1a-5xvff   i-07a97a1699563a4aa   running   m5.xlarge   ap-northeast-1   ap-northeast-1a   4h4m
qe-zhsun-1-2wd26-worker-ap-northeast-1c-d2tzt   i-07bc8e9556cfd5a5b   running   m5.xlarge   ap-northeast-1   ap-northeast-1c   4h4m
qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9   i-01286f8d41249b7bc   running   m5.xlarge   ap-northeast-1   ap-northeast-1d   94m
qe-zhsun-1-2wd26-worker-ap-northeast-1d                                         m5.xlarge   ap-northeast-1   ap-northeast-1d   6s

$ oc delete machine qe-zhsun-1-2wd26-worker-ap-northeast-1a
machine.machine.openshift.io "qe-zhsun-1-2wd26-worker-ap-northeast-1a" deleted


$ oc delete machine qe-zhsun-1-2wd26-worker-ap-northeast-1d
machine.machine.openshift.io "qe-zhsun-1-2wd26-worker-ap-northeast-1d" deleted


$ oc get event
LAST SEEN   TYPE      REASON           OBJECT                                                    MESSAGE
4m19s       Normal    Updated          machine/qe-zhsun-1-2wd26-master-0                         Updated machine qe-zhsun-1-2wd26-master-0
4m18s       Normal    Updated          machine/qe-zhsun-1-2wd26-master-1                         Updated machine qe-zhsun-1-2wd26-master-1
4m17s       Normal    Updated          machine/qe-zhsun-1-2wd26-master-2                         Updated machine qe-zhsun-1-2wd26-master-2
4m17s       Normal    Updated          machine/qe-zhsun-1-2wd26-worker-ap-northeast-1a-5xvff     Updated machine qe-zhsun-1-2wd26-worker-ap-northeast-1a-5xvff
50s         Normal    Created          machine/qe-zhsun-1-2wd26-worker-ap-northeast-1a-a-2rktq   Created Machine qe-zhsun-1-2wd26-worker-ap-northeast-1a-a-2rktq
29s         Normal    Updated          machine/qe-zhsun-1-2wd26-worker-ap-northeast-1a-a-2rktq   Updated machine qe-zhsun-1-2wd26-worker-ap-northeast-1a-a-2rktq
8m57s       Warning   FailedValidate   machine/qe-zhsun-1-2wd26-worker-ap-northeast-1a           "qe-zhsun-1-2wd26-worker-ap-northeast-1a" machine validation failed: spec.spec.providerspec: Invalid value: v1beta1.ProviderSpec{Value:(*runtime.RawExtension)(nil)}: value field must be set
7m31s       Warning   FailedValidate   machine/qe-zhsun-1-2wd26-worker-ap-northeast-1a           "qe-zhsun-1-2wd26-worker-ap-northeast-1a" machine validation failed: spec.spec.providerspec: Invalid value: v1beta1.ProviderSpec{Value:(*runtime.RawExtension)(nil)}: value field must be set
4m17s       Normal    Updated          machine/qe-zhsun-1-2wd26-worker-ap-northeast-1c-d2tzt     Updated machine qe-zhsun-1-2wd26-worker-ap-northeast-1c-d2tzt
104m        Normal    Updated          machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d-klfjx     Updated machine qe-zhsun-1-2wd26-worker-ap-northeast-1d-klfjx
101m        Normal    Deleted          machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d-klfjx     Deleted machine qe-zhsun-1-2wd26-worker-ap-northeast-1d-klfjx
84m         Warning   FailedCreate     machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9     CreateError
84m         Normal    Created          machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9     Created Machine qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9
4m21s       Normal    Updated          machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9     Updated machine qe-zhsun-1-2wd26-worker-ap-northeast-1d-wm5s9
6m3s        Warning   FailedValidate   machine/qe-zhsun-1-2wd26-worker-ap-northeast-1d           "qe-zhsun-1-2wd26-worker-ap-northeast-1d" machine validation failed: spec.labels: Invalid value: map[string]string(nil): missing machine.openshift.io/cluster-api-cluster label.

Comment 8 errata-xmlrpc 2019-10-16 06:27:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.