Description of problem: On an AWS UPI, update a machineSet to manage node provisioning. When the machineset.spec.providerSpec.value is updated to a malformed format like the following: ``` securityGroups: - filters: id: sg-0aa9d32d71a356b25 subnet: filters: id: subnet-0c5ce5241159a7676 ``` The updating is accepted. New machine could be created as machineset scales but the machine has no events and could not be deleted after the machineset is scaled to 0. Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-08-20-213632 How reproducible: Always Steps to Reproduce: 1. oc edit machineset, set a malformed securityGroups and subnet securityGroups: - filters: id: sg-0aa9d32d71a356b25 subnet: filters: id: subnet-0c5ce5241159a7676 2. Machineset is updated, then scale the machineset from 0 to 1 3. New machine is is created 4. Scale the machineset from 1 to 0 Actual results: Machine can not be deleted by machine-controller I0821 07:57:55.020902 1 controller.go:205] Reconciling machine "qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps" triggers delete I0821 07:57:55.020910 1 actuator.go:333] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: deleting machine E0821 07:57:55.021015 1 actuator.go:107] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: Machine error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|... E0821 07:57:55.021026 1 actuator.go:335] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: error deleting machine: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|... E0821 07:57:55.021035 1 controller.go:220] Failed to delete machine "qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps": error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|... Also `oc delete machine` hangs forever. Expected results: Could be able to delete the machine or reject the mailformed machineset. Additional info: Work around to deleting the machine, is to update the its providerSpec to the correct format: securityGroups: - filters: - name: tag:Name values: - sg-0aa9d32d71a356b25 subnet: filters: - name: tag:Name values: - subnet-0c5ce5241159a7676
Unfortunately, any machine provider config can be malformed. Given the provider config is a raw string, we can't do any validation outside of an actuator. Michael Gugino's PR removing machine that has invalid machine config and no node addresses is the closest fix, we can provider right now: https://github.com/openshift/cluster-api/pull/67. We can't do any machine provider config validation until it has its own CRD definition. Definitely not something to be done in 4.2.
Verified in 4.2.0-0.nightly-2019-09-02-172410. Machine can be deleted with a malformed provider config. ``` I0903 09:09:40.511340 1 actuator.go:333] test: deleting machine E0903 09:09:40.511488 1 actuator.go:107] test: Machine error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|... E0903 09:09:40.511507 1 actuator.go:335] test: error deleting machine: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|... I0903 09:09:40.511520 1 controller.go:384] Actuator returned invalid configuration error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|... I0903 09:09:40.525402 1 controller.go:249] Machine "test" deletion successful ```
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922