Description of problem: When update a worker machine, the machine-controller logs "master inplace update failed: not support master in place update now". Check the machine event, the message is "Updated machine zhsun-8p6x5-worker-aaa%!(EXTRA string=Update)", seems parameters & arguments are mismatched. Version-Release number of selected component (if applicable): $ ./oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-09-11-202233 True False 64m Cluster version is 4.2.0-0.nightly-2019-09-11-202233 How reproducible: Always Steps to Reproduce: 1. Update a machine's Flavor to "ci.m1.xlarge-invalid" 2. Check machine-controller logs 3. Check machine event Actual results: $ ./oc logs -f machine-api-controllers-56756968cf-tjf9r -c machine-controller I0912 02:07:15.491373 1 controller.go:238] Reconciling machine "zhsun-8p6x5-worker-aaa" triggers idempotent update I0912 02:15:19.342174 1 controller.go:129] Reconciling Machine "zhsun-8p6x5-worker-aaa" I0912 02:15:19.342347 1 controller.go:298] Machine "zhsun-8p6x5-worker-aaa" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0912 02:15:19.896148 1 controller.go:238] Reconciling machine "zhsun-8p6x5-worker-aaa" triggers idempotent update E0912 02:15:19.896390 1 actuator.go:368] master inplace update failed: not support master in place update now $ ./oc describe machine zhsun-8p6x5-worker-aaa Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Created 22m openstack_controller Created Machine zhsun-8p6x5-worker-aaa Normal Updated 8s (x2 over 10m) openstack_controller Updated machine zhsun-8p6x5-worker-aaa%!(EXTRA string=Update) Expected results: Output correct information. Additional info:
We decided to move it to 4.3, since we can't fix the bug properly without enabled machine-healthcheck-controller. Related to: https://bugzilla.redhat.com/show_bug.cgi?id=1746369
Description: While trying to update the worker machine with the invalid flavor , the machine gets deleted . Version : NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-02-03-005212 True False 3h48m Cluster version is 4.4.0-0.nightly-2020-02-03-005212 How reproducible: Always Steps to Reproduce: 1. Update a machine's Flavor to "ci.m1.xlarge-invalid" 2. Check machine-controller logs 3. Check machine event Phase: Failed Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Created 39m openstack_controller Created Machine miyadav-4pzrg-worker-4d254 Normal Deleted 5m28s openstack_controller Deleted machine miyadav-4pzrg-worker-4d254 The machine gets deleted from the openstack cloud.(Check from UI also) Expected : Only update event should trigger and not delete.
I am not sure I understand what the expected behaviour is. Are you referring to the error message inconsistency? (i.e. "it is an update event, the error message should not mention 'Delete'") Or do you think that the actual behaviour is wrong (i.e. "the machine should not be deleted if the new flavor is invalid")?
Both issues needs to be resolved , 'describe machine' should give message of update failed and the machine should not be deleted
I have addressed "the machine should not be deleted if the new flavor is invalid". Please report the logging issue as a separate bug, so that we can analyse and prioritise separately. Thanks!
Tested on 4.4.0-0.nightly-2020-02-10-013941 Steps: 1. Update a worker machine's providerSpec to an invalid flavor, for example "ci.m1.xlarge.invalid" 2. Monitor the machine-controller. The update failed but machine went into 'Failed' phase. 3. Correct the invalid flavor, machine won't come back because in 'Failed' phase a machine won't reconcile. According to https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/machine-instance-lifecycle.md#failed, the machine is not expected to be 'Failed' in this scenario. machine-controller log after step 2 ``` I0210 06:49:16.361944 1 controller.go:284] Reconciling machine "miyadav-1002-vwppm-worker-b6kwn" triggers idempotent update I0210 06:57:35.763393 1 controller.go:164] Reconciling Machine "miyadav-1002-vwppm-worker-9pt2f" I0210 06:57:35.763592 1 controller.go:376] Machine "miyadav-1002-vwppm-worker-9pt2f" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0210 06:57:35.776324 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:57:36.334016 1 controller.go:284] Reconciling machine "miyadav-1002-vwppm-worker-9pt2f" triggers idempotent update I0210 06:57:36.334230 1 actuator.go:373] re-creating machine miyadav-1002-vwppm-worker-9pt2f for update. I0210 06:57:36.346211 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:57:36.399681 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:57:36.769712 1 actuator.go:146] Skipped creating a VM that already exists. I0210 06:57:36.777235 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:57:36.840589 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle E0210 06:58:00.224892 1 actuator.go:382] delete machine miyadav-1002-vwppm-worker-9pt2f for update failed: unable to update machine status: Operation cannot be fulfilled on machines.machine.openshift.io "miyadav-1002-vwppm-worker-9pt2f": the object has been modified; please apply your changes to the latest version and try again E0210 06:58:00.225227 1 controller.go:286] Error updating machine "openshift-machine-api/miyadav-1002-vwppm-worker-9pt2f": Cannot delete machine miyadav-1002-vwppm-worker-9pt2f: unable to update machine status: Operation cannot be fulfilled on machines.machine.openshift.io "miyadav-1002-vwppm-worker-9pt2f": the object has been modified; please apply your changes to the latest version and try again I0210 06:58:01.225786 1 controller.go:164] Reconciling Machine "miyadav-1002-vwppm-worker-9pt2f" I0210 06:58:01.226038 1 controller.go:376] Machine "miyadav-1002-vwppm-worker-9pt2f" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0210 06:58:01.240726 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:58:06.590669 1 controller.go:284] Reconciling machine "miyadav-1002-vwppm-worker-9pt2f" triggers idempotent update I0210 06:58:06.591051 1 actuator.go:373] re-creating machine miyadav-1002-vwppm-worker-9pt2f for update. I0210 06:58:06.605211 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:58:06.651868 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:58:11.898942 1 actuator.go:146] Skipped creating a VM that already exists. I0210 06:58:11.906541 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:58:11.958761 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:58:12.974534 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:58:23.293494 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:58:23.518163 1 actuator.go:398] Successfully updated machine miyadav-1002-vwppm-worker-9pt2f I0210 06:58:23.518285 1 controller.go:164] Reconciling Machine "miyadav-1002-vwppm-worker-9pt2f" I0210 06:58:23.518300 1 controller.go:376] Machine "miyadav-1002-vwppm-worker-9pt2f" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0210 06:58:23.528024 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0210 06:58:23.770735 1 controller.go:428] Machine "miyadav-1002-vwppm-worker-9pt2f" going into phase "Failed" I0210 06:58:23.783438 1 controller.go:164] Reconciling Machine "miyadav-1002-vwppm-worker-9pt2f" I0210 06:58:23.783462 1 controller.go:376] Machine "miyadav-1002-vwppm-worker-9pt2f" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster W0210 06:58:23.783473 1 controller.go:273] Machine "miyadav-1002-vwppm-worker-9pt2f" has gone "Failed" phase. It won't reconcile ```
Milind, Jianwei, This BZ originally reported two problems: - When update a worker machine, the machine-controller logs "master inplace update failed: not support master in place update now". - Check the machine event, the message is "Updated machine zhsun-8p6x5-worker-aaa%!(EXTRA string=Update)", seems parameters & arguments are mismatched. Both issues were addressed in the linked Github pull requests. Please report any outstanding issues as separate BZs, so that we can properly track them. I understand that from a user perspective, they are closely linked; however in code they have to be addressed separately. Thank you!
Thanks Pierre, I think the PRs addressed the fix to this bug, I'll move this to verified. For the issue that machine becomes failed, will track with 1820421.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581