Bug 1744049

Summary: [AWS] machine-controller can not delete a machine when the machine's providerSpec is malformed
Product: OpenShift Container Platform Reporter: Jianwei Hou <jhou>
Component: Cloud ComputeAssignee: Michael Gugino <mgugino>
Status: CLOSED ERRATA QA Contact: Jianwei Hou <jhou>
Severity: low Docs Contact:
Priority: medium    
Version: 4.2.0CC: agarcial, jchaloup
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: aws
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:36:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jianwei Hou 2019-08-21 08:17:45 UTC
Description of problem:
On an AWS UPI, update a machineSet to manage node provisioning. When the machineset.spec.providerSpec.value is updated to a malformed format like the following:
```
      securityGroups:
      - filters:
          id: sg-0aa9d32d71a356b25
      subnet:
        filters:
          id: subnet-0c5ce5241159a7676
```
The updating is accepted. New machine could be created as machineset scales but the machine has no events and could not be deleted after the machineset is scaled to 0.

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-08-20-213632

How reproducible:
Always

Steps to Reproduce:
1. oc edit machineset, set a malformed securityGroups and subnet

      securityGroups:
      - filters:
          id: sg-0aa9d32d71a356b25
      subnet:
        filters:
          id: subnet-0c5ce5241159a7676
2. Machineset is updated, then scale the machineset from 0 to 1
3. New machine is is created
4. Scale the machineset from 1 to 0

Actual results:
Machine can not be deleted by machine-controller

I0821 07:57:55.020902       1 controller.go:205] Reconciling machine "qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps" triggers delete
I0821 07:57:55.020910       1 actuator.go:333] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: deleting machine
E0821 07:57:55.021015       1 actuator.go:107] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: Machine error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|...
E0821 07:57:55.021026       1 actuator.go:335] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: error deleting machine: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|...
E0821 07:57:55.021035       1 controller.go:220] Failed to delete machine "qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps": error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|...


Also `oc delete machine` hangs forever.


Expected results:
Could be able to delete the machine or reject the mailformed machineset.

Additional info:
Work around to deleting the machine, is to update the its providerSpec to the correct format:

       securityGroups:
       - filters:
           - name: tag:Name
             values:
             - sg-0aa9d32d71a356b25
       subnet:
         filters:
           - name: tag:Name
             values:
             - subnet-0c5ce5241159a7676

Comment 1 Jan Chaloupka 2019-08-28 11:21:01 UTC
Unfortunately, any machine provider config can be malformed. Given the provider config is a raw string, we can't do any validation outside of an actuator.

Michael Gugino's PR removing machine that has invalid machine config and no node addresses is the closest fix, we can provider right now: https://github.com/openshift/cluster-api/pull/67.

We can't do any machine provider config validation until it has its own CRD definition. Definitely not something to be done in 4.2.

Comment 3 Jianwei Hou 2019-09-03 09:10:15 UTC
Verified in 4.2.0-0.nightly-2019-09-02-172410.

Machine can be deleted with a malformed provider config.

```
I0903 09:09:40.511340       1 actuator.go:333] test: deleting machine
E0903 09:09:40.511488       1 actuator.go:107] test: Machine error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|...
E0903 09:09:40.511507       1 actuator.go:335] test: error deleting machine: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|...
I0903 09:09:40.511520       1 controller.go:384] Actuator returned invalid configuration error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|...
I0903 09:09:40.525402       1 controller.go:249] Machine "test" deletion successful
```

Comment 4 errata-xmlrpc 2019-10-16 06:36:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922