Bug 2015772 - Replacing private key reconcile 2 Windows nodes in parallel
Summary: Replacing private key reconcile 2 Windows nodes in parallel
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Windows Containers
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Mansi Kulkarni
QA Contact: Ronnie Rasouli
URL:
Whiteboard:
Depends On:
Blocks: 2017822
TreeView+ depends on / blocked
 
Reported: 2021-10-20 05:16 UTC by gaoshang
Modified: 2022-03-28 09:36 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2017822 (view as bug list)
Environment:
Last Closed: 2022-03-28 09:36:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift windows-machine-config-operator pull 761 0 None open Bug 2015772: [wc_controller] Fix upgrading nodes one at a time 2021-10-26 19:42:20 UTC
Red Hat Product Errata RHSA-2022:0577 0 None None None 2022-03-28 09:36:45 UTC

Description gaoshang 2021-10-20 05:16:57 UTC
Description of problem: With 2 Windows nodes created by one machineset, replacing the private key will delete both the 2 Windows nodes at the beginning and then recreate them,it doesn’t follow maxUnhealthyCount rule, this will cause service breaking.

$ oc logs -f deployment.apps/windows-machine-config-operator -n openshift-windows-machine-config-operator
...
{"level":"info","ts":1634569171.675713,"logger":"controller.windowsmachine","msg":"deleting machine","machine":"openshift-machine-api/winworker-hwgqc"}
{"level":"info","ts":1634569171.685051,"logger":"controller.secret","msg":"updating secret","secret":"openshift-windows-machine-config-operator/cloud-private-key","name":"windows-user-data"}
{"level":"info","ts":1634569171.7818367,"logger":"controller.windowsmachine","msg":"unhealthy machine count for machineset","name":"winworker","total":2,"unhealthy":0}
{"level":"info","ts":1634569171.7961621,"logger":"controller.windowsmachine","msg":"machine has been remediated by deletion","name":"winworker-hwgqc"}
{"level":"info","ts":1634569171.7965193,"logger":"controller.windowsmachine","msg":"deleting machine","machine":"openshift-machine-api/winworker-nch44"}
{"level":"info","ts":1634569171.8028684,"logger":"controller.windowsmachine","msg":"unhealthy machine count for machineset","name":"winworker","total":2,"unhealthy":0}
{"level":"info","ts":1634569171.8165264,"logger":"controller.windowsmachine","msg":"machine has been remediated by deletion","name":"winworker-nch44"}
{"level":"info","ts":1634569183.5262113,"logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"}
{"level":"info","ts":1634569186.2801213,"logger":"metrics","msg":"Prometheus configured","endpoints":"windows-exporter","port":9182,"name":"metrics"}
{"level":"info","ts":1634569278.3233094,"logger":"controller.windowsmachine","msg":"processing","machine":"openshift-machine-api/winworker-r7nvc","address":"172.31.249.29"}
...

Version-Release number of selected component (if applicable):
OCP version: 4.9.0-0.nightly-2021-10-16-173626
WMCO version: 4.0.0+7991f6f0

How reproducible:
Always

Steps to Reproduce:
1, Scale up 2 Windows nodes by one machineset
2, Replace private key, e.g. change openshift-qe.pem to openshift-dev.pem
3, Check WMCO log

Actual results:
both the 2 Windows nodes are deleted at the beginning and recreated

Expected results:
The 2 Windows nodes should be deleted one by one following maxUnhealthyCount rule

Additional info:

Comment 3 Mansi Kulkarni 2021-10-27 13:31:25 UTC
Marking VERIFIED for release-4.9 PR to merge, will revert back.

Comment 4 gaoshang 2021-10-28 02:27:08 UTC
Since this bug has been verified on OCP 4.9 (Bug 2017822), marketed this bug as VERIFIED, thanks.

Comment 7 errata-xmlrpc 2022-03-28 09:36:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Windows Container Support for Red Hat OpenShift 5.0.0 [security update]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0577


Note You need to log in before you can comment on or make changes to this bug.