Bug 1744049 - [AWS] machine-controller can not delete a machine when the machine's providerSpec is malformed
Summary: [AWS] machine-controller can not delete a machine when the machine's provider...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.2.0
Assignee: Michael Gugino
QA Contact: Jianwei Hou
URL:
Whiteboard: aws
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-21 08:17 UTC by Jianwei Hou
Modified: 2019-10-16 06:37 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:36:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-aws pull 256 0 None closed bug 1744049: Vendor cluster-api to update InvalidConfigurationMachineError 2020-07-09 02:37:02 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:37:07 UTC

Description Jianwei Hou 2019-08-21 08:17:45 UTC
Description of problem:
On an AWS UPI, update a machineSet to manage node provisioning. When the machineset.spec.providerSpec.value is updated to a malformed format like the following:
```
      securityGroups:
      - filters:
          id: sg-0aa9d32d71a356b25
      subnet:
        filters:
          id: subnet-0c5ce5241159a7676
```
The updating is accepted. New machine could be created as machineset scales but the machine has no events and could not be deleted after the machineset is scaled to 0.

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-08-20-213632

How reproducible:
Always

Steps to Reproduce:
1. oc edit machineset, set a malformed securityGroups and subnet

      securityGroups:
      - filters:
          id: sg-0aa9d32d71a356b25
      subnet:
        filters:
          id: subnet-0c5ce5241159a7676
2. Machineset is updated, then scale the machineset from 0 to 1
3. New machine is is created
4. Scale the machineset from 1 to 0

Actual results:
Machine can not be deleted by machine-controller

I0821 07:57:55.020902       1 controller.go:205] Reconciling machine "qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps" triggers delete
I0821 07:57:55.020910       1 actuator.go:333] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: deleting machine
E0821 07:57:55.021015       1 actuator.go:107] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: Machine error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|...
E0821 07:57:55.021026       1 actuator.go:335] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: error deleting machine: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|...
E0821 07:57:55.021035       1 controller.go:220] Failed to delete machine "qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps": error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|...


Also `oc delete machine` hangs forever.


Expected results:
Could be able to delete the machine or reject the mailformed machineset.

Additional info:
Work around to deleting the machine, is to update the its providerSpec to the correct format:

       securityGroups:
       - filters:
           - name: tag:Name
             values:
             - sg-0aa9d32d71a356b25
       subnet:
         filters:
           - name: tag:Name
             values:
             - subnet-0c5ce5241159a7676

Comment 1 Jan Chaloupka 2019-08-28 11:21:01 UTC
Unfortunately, any machine provider config can be malformed. Given the provider config is a raw string, we can't do any validation outside of an actuator.

Michael Gugino's PR removing machine that has invalid machine config and no node addresses is the closest fix, we can provider right now: https://github.com/openshift/cluster-api/pull/67.

We can't do any machine provider config validation until it has its own CRD definition. Definitely not something to be done in 4.2.

Comment 3 Jianwei Hou 2019-09-03 09:10:15 UTC
Verified in 4.2.0-0.nightly-2019-09-02-172410.

Machine can be deleted with a malformed provider config.

```
I0903 09:09:40.511340       1 actuator.go:333] test: deleting machine
E0903 09:09:40.511488       1 actuator.go:107] test: Machine error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|...
E0903 09:09:40.511507       1 actuator.go:335] test: error deleting machine: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|...
I0903 09:09:40.511520       1 controller.go:384] Actuator returned invalid configuration error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|...
I0903 09:09:40.525402       1 controller.go:249] Machine "test" deletion successful
```

Comment 4 errata-xmlrpc 2019-10-16 06:36:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.