Bug 1889589

Summary: Please highlight and warn of the automated reboots triggered by MCO
Product: OpenShift Container Platform Reporter: yaoli
Component: DocumentationAssignee: Michael Burke <mburke>
Status: CLOSED CURRENTRELEASE QA Contact: Micah Abbott <miabbott>
Severity: high Docs Contact: Vikram Goyal <vigoyal>
Priority: high    
Version: 4.5CC: aos-bugs, jizhu, jokerman, miabbott
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-05 18:35:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yaoli 2020-10-20 06:32:48 UTC
Document URL: 

https://docs.openshift.com/container-platform/4.5/architecture/control-plane.html#understanding-machine-config-operator_control-plane

Section Number and Name: 

Understanding the Machine Config Operator

Describe the issue: 

As MCO design of 4.x, configuration changes will cause nodes reboots. We understand it's the current design, and if nodes are adopted with HA, it should cause no service downtime.  But any automated reboot is considered a big impact in China. So what our request is:  Can we add a big&bold warning in the documentation to warn customers of the auto reboots so that they will be aware of this and get better prepared for this.

To share more background, many customers in China don't know about auto reboots triggered by MCO, so sometimes they were surprised by the attack. This has caused concern among many China customers. Thus we need a warning which is easier for customers to catch.

This request is targeted to all 4.x versions.

Thanks

Comment 2 Michael Burke 2020-10-28 12:22:50 UTC
Yao --

You filed a similar BZ [1] which is being addressed by another writer. Does this warning cover your concerns here? [2]

"To prevent control plane nodes from autorebooting after machine config changes are applied, you must pause the autoreboot process by setting the `spec.paused` field to `true` in the machine pool config."

Michael


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1886712

[2] https://github.com/openshift/openshift-docs/pull/26829/files#diff-0301ee03744fc7c0a41c613967fe9a86f19a8089bbc40e7a4ed15b17bd35abbeR42-R45

Comment 3 Michael Burke 2020-11-05 18:24:06 UTC
Yao --

I added the first sentence to the note I mentioned in comment 2:

[IMPORTANT]
====
Changes to the machine configuration cause automatic nodes reboots.

To prevent control plane nodes from autorebooting after machine config changes are applied, you must pause the autoreboot process by setting the `spec.paused` field to `true` in the machine pool config.
====

Please take a look. Thank you.

https://github.com/openshift/openshift-docs/pull/27118

Comment 4 Judith Zhu 2020-11-11 04:48:12 UTC
Hi Michael,

This is Judith. Thanks a lot for your support on this bug. 

We see that the product documentation has been updated(https://docs.openshift.com/container-platform/4.6/architecture/control-plane.html#understanding-machine-config-operator_control-plane). however, the sentence "Changes to the machine configuration cause automatic nodes reboots." is missing ? Can you pls also include that ? It's also important.

Regards,
Judith

Comment 5 Judith Zhu 2020-11-11 06:45:24 UTC
Also, after "you must pause the autoreboot process by setting the `spec.paused` field to `true` in the machine pool config.", can we add something like "Please refer to the section of "Disabling Machine Config Operator from automatically rebooting (https://docs.openshift.com/container-platform/4.6/support/troubleshooting/troubleshooting-operator-issues.html#troubleshooting-disabling-autoreboot-mco_troubleshooting-operator-issues)""? It will make customers easier to understand how to pause/unpause.

Comment 6 Michael Burke 2020-11-11 16:21:50 UTC
Judith --

I added a reference to the "Disabling Machine Config Operator from automatically rebooting" section (we are not allowed to add links to the docs at the level where the note is). However, this section was added in 4.5 and does not appear in 4.4; also, the `spec.paused` field is not mentioned in the 4.4 docs. 

Would it be acceptable to make these changes in 4.5+?

Michael

https://github.com/openshift/openshift-docs/pull/27118/files

Comment 7 Judith Zhu 2020-11-12 10:37:02 UTC
Hi Michael, Please go ahead. Thank you for your quick actions!

Comment 8 Micah Abbott 2020-11-13 21:27:12 UTC
I commented on the PR and tagged some MCO folks for additional input.

Comment 10 Red Hat Bugzilla 2023-09-15 00:49:54 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days