Bug 1826017
| Summary: | [vsphere]Machine status should be "Failed" with an invalid configuration | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Joel Speed <jspeed> | 
| Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> | 
| Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> | 
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | agarcial, jhou, jspeed, mgugino, miyadav, zhsun | 
| Version: | 4.5 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Cause: Errors returned from the cloud-provider actuator no longer matched the expected type due to being wrapped using github.com/pkg/errors
Consequence: The Machine controller could not determine that the Machine should be marked as failed
Fix: Use error wrapping from the standard library to check the error types
Result: Machine controller can now determine when Machines should be marked Failed | Story Points: | --- | 
| Clone Of: | 1824497 | Environment: | |
| Last Closed: | 2020-07-13 17:29:09 UTC | Type: | --- | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1824497, 1833256 | ||
| Bug Blocks: | |||
| 
        
          Description
        
        
          Joel Speed
        
        
        
        
        
          2020-04-20 17:09:37 UTC
        
       Description of problem:
Machine status should be "Failed" when creating machineset with invalid configuration
 
Version-Release number of selected component (if applicable):
Cluster version is 4.5.0-0.nightly-2020-05-08-015855
How reproducible:
Always
Step1.:Create a machineset with invalid specs.
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  annotations:
    autoscaling.openshift.io/machineautoscaler: openshift-machine-api/machineautoscaler
    machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "3"
    machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "1"
  creationTimestamp: "2020-05-12T09:18:42Z"
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: miyadav-12ipi-vhcs5
  name: miyadav-12ipi-vhcs5-worker-invalid
  namespace: openshift-machine-api
  resourceVersion: "161152"
  selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinesets/miyadav-12ipi-vhcs5-worker-invalid
  uid: 0254a43b-637c-45ff-8b83-f910b1ebda9e
spec:
  replicas: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: miyadav-12ipi-vhcs5
      machine.openshift.io/cluster-api-machineset: miyadav-12ipi-vhcs5-worker
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: miyadav-12ipi-vhcs5
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: miyadav-12ipi-vhcs5-worker
    spec:
      metadata: {}
      providerSpec:
        value:
          apiVersion: vsphereprovider.openshift.io/v1beta1
          credentialsSecret:
            name: vsphere-cloud-credentials
          diskGiB: 50
          kind: VSphereMachineProviderSpec
          memoryMiB: 8192
          metadata:
            creationTimestamp: null
          network:
            devices:
            - networkName: VM Network
          numCPUs: 4
          numCoresPerSocket: 1
          snapshot: ""
          spotMarketOptions:
            maxPrice: "0.01"
          template: miyadav-12ipi-vhcs5-rhcos
          userDataSecret:
            name: worker-user-data
          workspace:
            datacenter: dc1
            datastore: nvme-ds1
            folder: /dc1/vm/miyadav-12ipi-vhcs5
            server: vcsa-qe.vmware.devcluster.openshift.com
2.oc create -f <invalid-machineset.yml 
Actual : machineset created successfully [miyadav@miyadav ManualRun]$ oc get machineset
NAME                                 DESIRED   CURRENT   READY   AVAILABLE   AGE
miyadav-12ipi-vhcs5-worker           2         2         2       2           6h22m
miyadav-12ipi-vhcs5-worker-invalid   2         2                             5m22s
3.[miyadav@miyadav ManualRun]$ oc get machines
NAME                                       PHASE          TYPE   REGION   ZONE   AGE
miyadav-12ipi-vhcs5-master-0               Running                               6h32m
miyadav-12ipi-vhcs5-master-1               Running                               6h32m
miyadav-12ipi-vhcs5-master-2               Running                               6h32m
miyadav-12ipi-vhcs5-worker-g79rl           Running                               6h20m
miyadav-12ipi-vhcs5-worker-invalid-n29vv   Provisioning                          15m
miyadav-12ipi-vhcs5-worker-qfj77           Running                               6h20m
Actual : machines not in failed status , but stuck in provisioning
Expected : Machine should be in failed status
Additional info:
machine-controller logs :
.
.
.
ta:{}} DeviceName:VM Network UseAutoDetect:<nil>} Network:<nil> InPassthroughMode:<nil>}
I0512 09:47:05.144773       1 reconciler.go:523] miyadav-12ipi-vhcs5-worker-invalid-n29vv: running task: task-135147
I0512 09:47:05.144861       1 reconciler.go:620] miyadav-12ipi-vhcs5-worker-invalid-n29vv: Updating provider status
I0512 09:47:05.144907       1 machine_scope.go:99] miyadav-12ipi-vhcs5-worker-invalid-n29vv: patching machine
I0512 09:47:05.156276       1 controller.go:325] miyadav-12ipi-vhcs5-worker-invalid-n29vv: created instance, requeuing
I0512 09:47:05.156429       1 controller.go:169] miyadav-12ipi-vhcs5-worker-invalid-n29vv: reconciling Machine
I0512 09:47:05.156462       1 actuator.go:80] miyadav-12ipi-vhcs5-worker-invalid-n29vv: actuator checking if machine exists
I0512 09:47:05.163046       1 session.go:114] Find template by instance uuid: a051b036-fffd-4848-bf0f-7dcd5a0e2f7a
I0512 09:47:05.181298       1 reconciler.go:155] miyadav-12ipi-vhcs5-worker-invalid-n29vv: does not exist
I0512 09:47:05.181405       1 controller.go:313] miyadav-12ipi-vhcs5-worker-invalid-n29vv: reconciling machine triggers idempotent create
I0512 09:47:05.181430       1 actuator.go:59] miyadav-12ipi-vhcs5-worker-invalid-n29vv: actuator creating machine
I0512 09:47:05.190393       1 reconciler.go:604] task: task-135148, state: error, description-id: VirtualMachine.clone
I0512 09:47:05.190475       1 session.go:114] Find template by instance uuid: a051b036-fffd-4848-bf0f-7dcd5a0e2f7a
I0512 09:47:05.208697       1 reconciler.go:83] miyadav-12ipi-vhcs5-worker-invalid-n29vv: cloning
I0512 09:47:05.208738       1 session.go:111] Invalid UUID for VM "miyadav-12ipi-vhcs5-rhcos": , trying to find by name
I0512 09:47:05.227797       1 reconciler.go:399] miyadav-12ipi-vhcs5-worker-invalid-n29vv: no snapshot name provided, getting snapshot using template
I0512 09:47:05.249183       1 reconciler.go:478] Getting network devices
I0512 09:47:05.249283       1 reconciler.go:555] Adding device: VM Network
I0512 09:47:05.254066       1 reconciler.go:584] Adding device: eth card type: vmxnet3, network spec: &{NetworkName:VM Network}, device info: &{VirtualDeviceDeviceBackingInfo:{VirtualDeviceBackingInfo:{DynamicData:{}} DeviceName:VM Network UseAutoDetect:<nil>} Network:<nil> InPassthroughMode:<nil>}
I0512 09:47:05.257745       1 reconciler.go:523] miyadav-12ipi-vhcs5-worker-invalid-n29vv: running task: task-135148
I0512 09:47:05.257811       1 reconciler.go:620] miyadav-12ipi-vhcs5-worker-invalid-n29vv: Updating provider status
I0512 09:47:05.257844       1 machine_scope.go:99] miyadav-12ipi-vhcs5-worker-invalid-n29vv: patching machine
I0512 09:47:05.269717       1 controller.go:325] miyadav-12ipi-vhcs5-worker-invalid-n29vv: created instance, requeuing
I0512 09:47:05.269816       1 controller.go:169] miyadav-12ipi-vhcs5-worker-invalid-n29vv: reconciling Machine
I0512 09:47:05.269842       1 actuator.go:80] miyadav-12ipi-vhcs5-worker-invalid-n29vv: actuator checking if machine exists
I0512 09:47:05.277706       1 session.go:114] Find template by instance uuid: a051b036-fffd-4848-bf0f-7dcd5a0e2f7a
I0512 09:47:05.297275       1 reconciler.go:155] miyadav-12ipi-vhcs5-worker-invalid-n29vv: does not exist
.
.
.
This bug does not seem to be directly related to the bug from which it is cloned. Not everything that fails to provision is necessarily going to trigger a 'failure.' Please describe what it is you're attempting to do in more detail. As with the description , I tried to put invalid configuration , but it seems it drops it gracefully if the options that are invalid and provision the machines , checked with Joel and did below :
Removing the machine.openshift.io/cluster-api-cluster from spec.template.metadata.labels
After that it never created any machines or logs
[miyadav@miyadav bugvsphere]$ oc get machineset
NAME                              DESIRED   CURRENT   READY   AVAILABLE   AGE
miyadav-11-rdnr9-worker           2         2         2       2           134m
miyadav-11-rdnr9-worker-invalid   1                                       56s
miyadav-11-rdnr9-worker-rz        1         1         1       1           26m
.
.
Then manually tried creating a machine that doesn't have the label rather than using a machineset.
Getting this :
.
.
.
E0513 08:47:29.127626       1 controller.go:173] miyadav-11-rdnr9-worker-sam: machine validation failed: spec.labels: Invalid value: map[string]string{"machine.openshift.io/cluster-api-machine-role":"worker", "machine.openshift.io/cluster-api-machine-type":"worker", "machine.openshift.io/cluster-api-machineset":"miyadav-11-rdnr9-worker", "machine.openshift.io/region":"", "machine.openshift.io/zone":""}: missing machine.openshift.io/cluster-api-cluster label.
So , I would need more help to understand the reason to create this bug ..
This will be easier to validate once BZ#1833256 is merged. Perhaps we could add that as a dependency of this and hold off verifying this for now. If that BZ can be verified, then this one is also working Making this depend on BZ#1833256 as we need it before we can verify this [miyadav@miyadav bugvsphere]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-05-26-051016 True False 37m Cluster version is 4.5.0-0.nightly-2020-05-26-051016 Validated both scenarios as mentioned above machines never went to provisioning stuck phase . After checking with Joel, As the invalid configuration is being reproduced by the dependent bug mentioned in this one . https://bugzilla.redhat.com/show_bug.cgi?id=1833256 With that being Failed , with valid "invalid configuration" validates this bug as well, as the invalid configurations mentioned in this bug to reproduce are not something which will be created during installation or running cluster. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |