Bug 1744049
| Summary: | [AWS] machine-controller can not delete a machine when the machine's providerSpec is malformed | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jianwei Hou <jhou> |
| Component: | Cloud Compute | Assignee: | Michael Gugino <mgugino> |
| Status: | CLOSED ERRATA | QA Contact: | Jianwei Hou <jhou> |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.2.0 | CC: | agarcial, jchaloup |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | aws | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-16 06:36:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Unfortunately, any machine provider config can be malformed. Given the provider config is a raw string, we can't do any validation outside of an actuator. Michael Gugino's PR removing machine that has invalid machine config and no node addresses is the closest fix, we can provider right now: https://github.com/openshift/cluster-api/pull/67. We can't do any machine provider config validation until it has its own CRD definition. Definitely not something to be done in 4.2. Verified in 4.2.0-0.nightly-2019-09-02-172410.
Machine can be deleted with a malformed provider config.
```
I0903 09:09:40.511340 1 actuator.go:333] test: deleting machine
E0903 09:09:40.511488 1 actuator.go:107] test: Machine error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|...
E0903 09:09:40.511507 1 actuator.go:335] test: error deleting machine: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|...
I0903 09:09:40.511520 1 controller.go:384] Actuator returned invalid configuration error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.Subnet: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"subn|..., bigger context ...|"jhou1-7gf6r-worker-sg"]}]}],"subnet":{"filters":{"id":"subnet-0c5ce5241159a7676"}},"tags":[{"name":|...
I0903 09:09:40.525402 1 controller.go:249] Machine "test" deletion successful
```
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |
Description of problem: On an AWS UPI, update a machineSet to manage node provisioning. When the machineset.spec.providerSpec.value is updated to a malformed format like the following: ``` securityGroups: - filters: id: sg-0aa9d32d71a356b25 subnet: filters: id: subnet-0c5ce5241159a7676 ``` The updating is accepted. New machine could be created as machineset scales but the machine has no events and could not be deleted after the machineset is scaled to 0. Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-08-20-213632 How reproducible: Always Steps to Reproduce: 1. oc edit machineset, set a malformed securityGroups and subnet securityGroups: - filters: id: sg-0aa9d32d71a356b25 subnet: filters: id: subnet-0c5ce5241159a7676 2. Machineset is updated, then scale the machineset from 0 to 1 3. New machine is is created 4. Scale the machineset from 1 to 0 Actual results: Machine can not be deleted by machine-controller I0821 07:57:55.020902 1 controller.go:205] Reconciling machine "qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps" triggers delete I0821 07:57:55.020910 1 actuator.go:333] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: deleting machine E0821 07:57:55.021015 1 actuator.go:107] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: Machine error: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|... E0821 07:57:55.021026 1 actuator.go:335] qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps: error deleting machine: error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|... E0821 07:57:55.021035 1 controller.go:220] Failed to delete machine "qe-jhou-0821-ch89b-worker-us-east-2a-pz9ps": error decoding MachineProviderConfig: decoding failure: v1beta1.AWSMachineProviderConfig.SecurityGroups: []v1beta1.AWSResourceReference: v1beta1.AWSResourceReference.Filters: []v1beta1.Filter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|filters":{"id":"sg-0|..., bigger context ...|2"},"publicIp":null,"securityGroups":[{"filters":{"id":"sg-0aa9d32d71a356b25"}}],"subnet":{"filters"|... Also `oc delete machine` hangs forever. Expected results: Could be able to delete the machine or reject the mailformed machineset. Additional info: Work around to deleting the machine, is to update the its providerSpec to the correct format: securityGroups: - filters: - name: tag:Name values: - sg-0aa9d32d71a356b25 subnet: filters: - name: tag:Name values: - subnet-0c5ce5241159a7676