Bug 1956439 - In the cluster update documentation, it does not describe to check the node status.
Summary: In the cluster update documentation, it does not describe to check the node s...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 4.6
Hardware: x86_64
OS: Linux
Target Milestone: ---
: ---
Assignee: Cody Hoag
QA Contact: liujia
Vikram Goyal
Depends On:
TreeView+ depends on / blocked
Reported: 2021-05-03 17:15 UTC by Vinu K
Modified: 2021-05-06 13:42 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2021-05-06 13:42:39 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Vinu K 2021-05-03 17:15:06 UTC
Document URL: https://docs.openshift.com/container-platform/4.6/updating/updating-cluster-cli.html#update-upgrading-cli_updating-cluster-cli

Section Number and Name: 7

Describe the issue: It does not describe checking the node's status after checking the 'oc get clusterversion' command as it only shows the control plane's update progress.

Suggestions for improvement: Add 8th procedure step to check the nodes status and its version.

Additional information: https://coreos.slack.com/archives/C2ZA5QGMV/p1620053053198000

Comment 1 Cody Hoag 2021-05-03 18:52:08 UTC
@Trevor can you confirm that adding an additional step for verifying the node statuses during a cluster upgrade (oc get nodes) is useful on top of the current recommendation of checking the cluster version (oc get clusterversion)?



Comment 2 W. Trevor King 2021-05-03 20:53:38 UTC
Querying nodes (or MachineConfigPools, for machine-config-managed nodes) might be useful in some cases (e.g. you want to deploy a workload that requires your compute to all be v1.2.3 or greater).  I'm fuzzy on how bring-your-own RHEL and such fit in, but for MachineConfigPools, the machine-config operator will complain about them if they get stuck.  So if folks are blocking some action on "wait until $POOL reaches $VERSION", then yeah, some kind of polling/waiting command suggestion seems useful.  But for folks who are not blocking an action, it seems easier to wait and react to alert push-notifications.  Maybe ask the machine-config folks if they have opinions?

Comment 3 Cody Hoag 2021-05-04 15:08:25 UTC
@Jerry do you have any opinions for the question I posed in comment#1? And/or suggestions for use cases we should suggest checking nodes that would be helpful outside of the generic 'oc get clusterversion' upgrade confirmation? Thanks!

Comment 4 Yu Qi Zhang 2021-05-05 00:16:40 UTC
So as of 4.8, the MCO team has modified that a bit such that a degraded worker pool will now block upgrade completion, so in that sense "oc get clusterversion" will also be reporting if workers fail. In the future the worker pool will also be considered required for upgrade, or at least that is still in discussion.

With that in mind, I think I'm ok with either way. For older versions, `oc get nodes` will serve to double check if nodes have completed fully, so maybe its worth having there as a just-in-case

Comment 5 Cody Hoag 2021-05-05 18:36:39 UTC
@Jia can you confirm this doc change? https://github.com/openshift/openshift-docs/pull/32232. Thanks!

Comment 6 liujia 2021-05-06 02:21:50 UTC
Yes, the pr lgtm.

Comment 7 Cody Hoag 2021-05-06 13:15:30 UTC
This has been merged. I'll provide the live links when they're available.

Note You need to log in before you can comment on or make changes to this bug.