Bug 1889589 - Please highlight and warn of the automated reboots triggered by MCO [NEEDINFO]
Summary: Please highlight and warn of the automated reboots triggered by MCO
Keywords:
Status: ON_QA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 4.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.3.z
Assignee: Michael Burke
QA Contact: Micah Abbott
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-20 06:32 UTC by yaoli
Modified: 2020-11-13 21:27 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
mburke: needinfo? (yaoli)


Attachments (Terms of Use)

Description yaoli 2020-10-20 06:32:48 UTC
Document URL: 

https://docs.openshift.com/container-platform/4.5/architecture/control-plane.html#understanding-machine-config-operator_control-plane

Section Number and Name: 

Understanding the Machine Config Operator

Describe the issue: 

As MCO design of 4.x, configuration changes will cause nodes reboots. We understand it's the current design, and if nodes are adopted with HA, it should cause no service downtime.  But any automated reboot is considered a big impact in China. So what our request is:  Can we add a big&bold warning in the documentation to warn customers of the auto reboots so that they will be aware of this and get better prepared for this.

To share more background, many customers in China don't know about auto reboots triggered by MCO, so sometimes they were surprised by the attack. This has caused concern among many China customers. Thus we need a warning which is easier for customers to catch.

This request is targeted to all 4.x versions.

Thanks

Comment 2 Michael Burke 2020-10-28 12:22:50 UTC
Yao --

You filed a similar BZ [1] which is being addressed by another writer. Does this warning cover your concerns here? [2]

"To prevent control plane nodes from autorebooting after machine config changes are applied, you must pause the autoreboot process by setting the `spec.paused` field to `true` in the machine pool config."

Michael


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1886712

[2] https://github.com/openshift/openshift-docs/pull/26829/files#diff-0301ee03744fc7c0a41c613967fe9a86f19a8089bbc40e7a4ed15b17bd35abbeR42-R45

Comment 3 Michael Burke 2020-11-05 18:24:06 UTC
Yao --

I added the first sentence to the note I mentioned in comment 2:

[IMPORTANT]
====
Changes to the machine configuration cause automatic nodes reboots.

To prevent control plane nodes from autorebooting after machine config changes are applied, you must pause the autoreboot process by setting the `spec.paused` field to `true` in the machine pool config.
====

Please take a look. Thank you.

https://github.com/openshift/openshift-docs/pull/27118

Comment 4 Judith Zhu 2020-11-11 04:48:12 UTC
Hi Michael,

This is Judith. Thanks a lot for your support on this bug. 

We see that the product documentation has been updated(https://docs.openshift.com/container-platform/4.6/architecture/control-plane.html#understanding-machine-config-operator_control-plane). however, the sentence "Changes to the machine configuration cause automatic nodes reboots." is missing ? Can you pls also include that ? It's also important.

Regards,
Judith

Comment 5 Judith Zhu 2020-11-11 06:45:24 UTC
Also, after "you must pause the autoreboot process by setting the `spec.paused` field to `true` in the machine pool config.", can we add something like "Please refer to the section of "Disabling Machine Config Operator from automatically rebooting (https://docs.openshift.com/container-platform/4.6/support/troubleshooting/troubleshooting-operator-issues.html#troubleshooting-disabling-autoreboot-mco_troubleshooting-operator-issues)""? It will make customers easier to understand how to pause/unpause.

Comment 6 Michael Burke 2020-11-11 16:21:50 UTC
Judith --

I added a reference to the "Disabling Machine Config Operator from automatically rebooting" section (we are not allowed to add links to the docs at the level where the note is). However, this section was added in 4.5 and does not appear in 4.4; also, the `spec.paused` field is not mentioned in the 4.4 docs. 

Would it be acceptable to make these changes in 4.5+?

Michael

https://github.com/openshift/openshift-docs/pull/27118/files

Comment 7 Judith Zhu 2020-11-12 10:37:02 UTC
Hi Michael, Please go ahead. Thank you for your quick actions!

Comment 8 Micah Abbott 2020-11-13 21:27:12 UTC
I commented on the PR and tagged some MCO folks for additional input.


Note You need to log in before you can comment on or make changes to this bug.