Bug 2015772

Summary: Replacing private key reconcile 2 Windows nodes in parallel
Product: OpenShift Container Platform Reporter: gaoshang <sgao>
Component: Windows ContainersAssignee: Mansi Kulkarni <mankulka>
Status: CLOSED ERRATA QA Contact: Ronnie Rasouli <rrasouli>
Severity: high Docs Contact:
Priority: high    
Version: 4.9CC: aos-bugs, mankulka
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2017822 (view as bug list) Environment:
Last Closed: 2022-03-28 09:36:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2017822    

Description gaoshang 2021-10-20 05:16:57 UTC
Description of problem: With 2 Windows nodes created by one machineset, replacing the private key will delete both the 2 Windows nodes at the beginning and then recreate them,it doesn’t follow maxUnhealthyCount rule, this will cause service breaking.

$ oc logs -f deployment.apps/windows-machine-config-operator -n openshift-windows-machine-config-operator
...
{"level":"info","ts":1634569171.675713,"logger":"controller.windowsmachine","msg":"deleting machine","machine":"openshift-machine-api/winworker-hwgqc"}
{"level":"info","ts":1634569171.685051,"logger":"controller.secret","msg":"updating secret","secret":"openshift-windows-machine-config-operator/cloud-private-key","name":"windows-user-data"}
{"level":"info","ts":1634569171.7818367,"logger":"controller.windowsmachine","msg":"unhealthy machine count for machineset","name":"winworker","total":2,"unhealthy":0}
{"level":"info","ts":1634569171.7961621,"logger":"controller.windowsmachine","msg":"machine has been remediated by deletion","name":"winworker-hwgqc"}
{"level":"info","ts":1634569171.7965193,"logger":"controller.windowsmachine","msg":"deleting machine","machine":"openshift-machine-api/winworker-nch44"}
{"level":"info","ts":1634569171.8028684,"logger":"controller.windowsmachine","msg":"unhealthy machine count for machineset","name":"winworker","total":2,"unhealthy":0}
{"level":"info","ts":1634569171.8165264,"logger":"controller.windowsmachine","msg":"machine has been remediated by deletion","name":"winworker-nch44"}
{"level":"info","ts":1634569183.5262113,"logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"}
{"level":"info","ts":1634569186.2801213,"logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"}
{"level":"info","ts":1634569278.3233094,"logger":"controller.windowsmachine","msg":"processing","machine":"openshift-machine-api/winworker-r7nvc","address":"172.31.249.29"}
...

Version-Release number of selected component (if applicable):
OCP version: 4.9.0-0.nightly-2021-10-16-173626
WMCO version: 4.0.0+7991f6f0

How reproducible:
Always

Steps to Reproduce:
1, Scale up 2 Windows nodes by one machineset
2, Replace private key, e.g. change openshift-qe.pem to openshift-dev.pem
3, Check WMCO log

Actual results:
both the 2 Windows nodes are deleted at the beginning and recreated

Expected results:
The 2 Windows nodes should be deleted one by one following maxUnhealthyCount rule

Additional info:

Comment 3 Mansi Kulkarni 2021-10-27 13:31:25 UTC
Marking VERIFIED for release-4.9 PR to merge, will revert back.

Comment 4 gaoshang 2021-10-28 02:27:08 UTC
Since this bug has been verified on OCP 4.9 (Bug 2017822), marketed this bug as VERIFIED, thanks.

Comment 7 errata-xmlrpc 2022-03-28 09:36:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Windows Container Support for Red Hat OpenShift 5.0.0 [security update]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0577