Bug 1956439

Summary: In the cluster update documentation, it does not describe to check the node status.
Product: OpenShift Container Platform Reporter: Vinu K <vkochuku>
Component: DocumentationAssignee: Cody Hoag <choag>
Status: CLOSED CURRENTRELEASE QA Contact: liujia <jiajliu>
Severity: medium Docs Contact: Vikram Goyal <vigoyal>
Priority: unspecified    
Version: 4.6CC: aos-bugs, choag, jerzhang, jiajliu, jokerman, wking
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-06 13:42:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vinu K 2021-05-03 17:15:06 UTC
Document URL: https://docs.openshift.com/container-platform/4.6/updating/updating-cluster-cli.html#update-upgrading-cli_updating-cluster-cli

Section Number and Name: 7

Describe the issue: It does not describe checking the node's status after checking the 'oc get clusterversion' command as it only shows the control plane's update progress.

Suggestions for improvement: Add 8th procedure step to check the nodes status and its version.

Additional information: https://coreos.slack.com/archives/C2ZA5QGMV/p1620053053198000

Comment 1 Cody Hoag 2021-05-03 18:52:08 UTC
@Trevor can you confirm that adding an additional step for verifying the node statuses during a cluster upgrade (oc get nodes) is useful on top of the current recommendation of checking the cluster version (oc get clusterversion)?

https://docs.openshift.com/container-platform/4.6/updating/updating-cluster-cli.html#update-upgrading-cli_updating-cluster-cli

Thanks!

Comment 2 W. Trevor King 2021-05-03 20:53:38 UTC
Querying nodes (or MachineConfigPools, for machine-config-managed nodes) might be useful in some cases (e.g. you want to deploy a workload that requires your compute to all be v1.2.3 or greater).  I'm fuzzy on how bring-your-own RHEL and such fit in, but for MachineConfigPools, the machine-config operator will complain about them if they get stuck.  So if folks are blocking some action on "wait until $POOL reaches $VERSION", then yeah, some kind of polling/waiting command suggestion seems useful.  But for folks who are not blocking an action, it seems easier to wait and react to alert push-notifications.  Maybe ask the machine-config folks if they have opinions?

Comment 3 Cody Hoag 2021-05-04 15:08:25 UTC
@Jerry do you have any opinions for the question I posed in comment#1? And/or suggestions for use cases we should suggest checking nodes that would be helpful outside of the generic 'oc get clusterversion' upgrade confirmation? Thanks!

Comment 4 Yu Qi Zhang 2021-05-05 00:16:40 UTC
So as of 4.8, the MCO team has modified that a bit such that a degraded worker pool will now block upgrade completion, so in that sense "oc get clusterversion" will also be reporting if workers fail. In the future the worker pool will also be considered required for upgrade, or at least that is still in discussion.

With that in mind, I think I'm ok with either way. For older versions, `oc get nodes` will serve to double check if nodes have completed fully, so maybe its worth having there as a just-in-case

Comment 5 Cody Hoag 2021-05-05 18:36:39 UTC
@Jia can you confirm this doc change? https://github.com/openshift/openshift-docs/pull/32232. Thanks!

Comment 6 liujia 2021-05-06 02:21:50 UTC
Yes, the pr lgtm.

Comment 7 Cody Hoag 2021-05-06 13:15:30 UTC
This has been merged. I'll provide the live links when they're available.