Description of problem: With 2 Windows nodes created by one machineset, replacing the private key will delete both the 2 Windows nodes at the beginning and then recreate them,it doesn’t follow maxUnhealthyCount rule, this will cause service breaking. $ oc logs -f deployment.apps/windows-machine-config-operator -n openshift-windows-machine-config-operator ... {"level":"info","ts":1634569171.675713,"logger":"controller.windowsmachine","msg":"deleting machine","machine":"openshift-machine-api/winworker-hwgqc"} {"level":"info","ts":1634569171.685051,"logger":"controller.secret","msg":"updating secret","secret":"openshift-windows-machine-config-operator/cloud-private-key","name":"windows-user-data"} {"level":"info","ts":1634569171.7818367,"logger":"controller.windowsmachine","msg":"unhealthy machine count for machineset","name":"winworker","total":2,"unhealthy":0} {"level":"info","ts":1634569171.7961621,"logger":"controller.windowsmachine","msg":"machine has been remediated by deletion","name":"winworker-hwgqc"} {"level":"info","ts":1634569171.7965193,"logger":"controller.windowsmachine","msg":"deleting machine","machine":"openshift-machine-api/winworker-nch44"} {"level":"info","ts":1634569171.8028684,"logger":"controller.windowsmachine","msg":"unhealthy machine count for machineset","name":"winworker","total":2,"unhealthy":0} {"level":"info","ts":1634569171.8165264,"logger":"controller.windowsmachine","msg":"machine has been remediated by deletion","name":"winworker-nch44"} {"level":"info","ts":1634569183.5262113,"logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"} {"level":"info","ts":1634569186.2801213,"logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"} {"level":"info","ts":1634569278.3233094,"logger":"controller.windowsmachine","msg":"processing","machine":"openshift-machine-api/winworker-r7nvc","address":"172.31.249.29"} ... Version-Release number of selected component (if applicable): OCP version: 4.9.0-0.nightly-2021-10-16-173626 WMCO version: 4.0.0+7991f6f0 How reproducible: Always Steps to Reproduce: 1, Scale up 2 Windows nodes by one machineset 2, Replace private key, e.g. change openshift-qe.pem to openshift-dev.pem 3, Check WMCO log Actual results: both the 2 Windows nodes are deleted at the beginning and recreated Expected results: The 2 Windows nodes should be deleted one by one following maxUnhealthyCount rule Additional info:
Marking VERIFIED for release-4.9 PR to merge, will revert back.
Since this bug has been verified on OCP 4.9 (Bug 2017822), marketed this bug as VERIFIED, thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Windows Container Support for Red Hat OpenShift 5.0.0 [security update]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0577