Description of problem: Upgrading from 4.2.12 to 4.3.0 may make RHEL7.7 worker node NotReady,SchedulingDisabled How reproducible: Somtimes Steps to Reproduce: Initial status: Cluster with 3 RHEL7.7 worker nodes instanced $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.12 True False 85m Cluster version is 4.2.12 $ oc get no NAME STATUS ROLES AGE VERSION weinliu-1223-7674g-compute-0 Ready worker 94m v1.14.6+cebabbf4a weinliu-1223-7674g-compute-1 Ready worker 94m v1.14.6+cebabbf4a weinliu-1223-7674g-control-plane-0 Ready master 107m v1.14.6+cebabbf4a weinliu-1223-7674g-control-plane-1 Ready master 107m v1.14.6+cebabbf4a weinliu-1223-7674g-control-plane-2 Ready master 107m v1.14.6+cebabbf4a weinliu-1223-7674g-rhel-0 Ready worker 63m v1.14.6+b69672ada weinliu-1223-7674g-rhel-1 Ready worker 63m v1.14.6+b69672ada weinliu-1223-7674g-rhel-2 Ready worker 63m v1.14.6+b69672ada 1. Perform upgrading $oc adm upgrade --to-image="registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-22-223447" --allow-explicit-upgrade --force Updating to release image registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-22-223447 Actual results: Worker nodes failed to get upgraded. Operator got upgraded but, failed on the worker nodes $ oc get no NAME STATUS ROLES AGE VERSION weinliu-1223-7674g-compute-0 Ready worker 22h v1.16.2 weinliu-1223-7674g-compute-1 Ready worker 22h v1.16.2 weinliu-1223-7674g-control-plane-0 Ready master 23h v1.16.2 weinliu-1223-7674g-control-plane-1 Ready master 23h v1.16.2 weinliu-1223-7674g-control-plane-2 Ready master 23h v1.16.2 weinliu-1223-7674g-rhel-0 NotReady,SchedulingDisabled worker 22h v1.14.6+b69672ada weinliu-1223-7674g-rhel-1 Ready worker 22h v1.14.6+b69672ada weinliu-1223-7674g-rhel-2 Ready worker 22h v1.14.6+b69672ada $ oc get no -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME weinliu-1223-7674g-compute-0 Ready worker 22h v1.16.2 10.0.98.54 <none> Red Hat Enterprise Linux CoreOS 43.81.201912221553.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.16.1-7.dev.rhaos4.3.gitcee3d66.el8 weinliu-1223-7674g-compute-1 Ready worker 22h v1.16.2 10.0.97.252 <none> Red Hat Enterprise Linux CoreOS 43.81.201912221553.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.16.1-7.dev.rhaos4.3.gitcee3d66.el8 weinliu-1223-7674g-control-plane-0 Ready master 23h v1.16.2 10.0.97.217 <none> Red Hat Enterprise Linux CoreOS 43.81.201912221553.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.16.1-7.dev.rhaos4.3.gitcee3d66.el8 weinliu-1223-7674g-control-plane-1 Ready master 23h v1.16.2 10.0.98.89 <none> Red Hat Enterprise Linux CoreOS 43.81.201912221553.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.16.1-7.dev.rhaos4.3.gitcee3d66.el8 weinliu-1223-7674g-control-plane-2 Ready master 23h v1.16.2 10.0.98.160 <none> Red Hat Enterprise Linux CoreOS 43.81.201912221553.0 (Ootpa) 4.18.0-147.3.1.el8_1.x86_64 cri-o://1.16.1-7.dev.rhaos4.3.gitcee3d66.el8 weinliu-1223-7674g-rhel-0 NotReady,SchedulingDisabled worker 22h v1.14.6+b69672ada 10.0.98.83 <none> Red Hat Enterprise Linux Server 7.7 (Maipo) 3.10.0-1062.9.1.el7.x86_64 cri-o://1.14.11-2.dev.rhaos4.2.git179ea6b.el7 weinliu-1223-7674g-rhel-1 Ready worker 22h v1.14.6+b69672ada 10.0.98.170 <none> Red Hat Enterprise Linux Server 7.7 (Maipo) 3.10.0-1062.9.1.el7.x86_64 cri-o://1.14.11-2.dev.rhaos4.2.git179ea6b.el7 weinliu-1223-7674g-rhel-2 Ready worker 22h v1.14.6+b69672ada 10.0.98.65 <none> Red Hat Enterprise Linux Server 7.7 (Maipo) 3.10.0-1062.9.1.el7.x86_64 cri-o://1.14.11-2.dev.rhaos4.2.git179ea6b.el7 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.0-0.nightly-2019-12-22-223447 True False 20h Cluster version is 4.3.0-0.nightly-2019-12-22-223447 $ oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.3.0-0.nightly-2019-12-22-223447 True False False 20h cloud-credential 4.3.0-0.nightly-2019-12-22-223447 True False False 21h cluster-autoscaler 4.3.0-0.nightly-2019-12-22-223447 True False False 20h console 4.3.0-0.nightly-2019-12-22-223447 True False False 18h dns 4.3.0-0.nightly-2019-12-22-223447 True False False 21h image-registry 4.3.0-0.nightly-2019-12-22-223447 True False False 20h ingress 4.3.0-0.nightly-2019-12-22-223447 True False False 20h insights 4.3.0-0.nightly-2019-12-22-223447 True False False 21h kube-apiserver 4.3.0-0.nightly-2019-12-22-223447 True False False 21h kube-controller-manager 4.3.0-0.nightly-2019-12-22-223447 True False False 21h kube-scheduler 4.3.0-0.nightly-2019-12-22-223447 True False False 21h machine-api 4.3.0-0.nightly-2019-12-22-223447 True False False 21h machine-config 4.3.0-0.nightly-2019-12-22-223447 True False False 18h marketplace 4.3.0-0.nightly-2019-12-22-223447 True False False 18h monitoring 4.3.0-0.nightly-2019-12-22-223447 False True True 18h network 4.3.0-0.nightly-2019-12-22-223447 True True True 21h node-tuning 4.3.0-0.nightly-2019-12-22-223447 True False False 18h openshift-apiserver 4.3.0-0.nightly-2019-12-22-223447 True False False 18h openshift-controller-manager 4.3.0-0.nightly-2019-12-22-223447 True False False 21h openshift-samples 4.3.0-0.nightly-2019-12-22-223447 True False False 18h operator-lifecycle-manager 4.3.0-0.nightly-2019-12-22-223447 True False False 21h operator-lifecycle-manager-catalog 4.3.0-0.nightly-2019-12-22-223447 True False False 21h operator-lifecycle-manager-packageserver 4.3.0-0.nightly-2019-12-22-223447 True False False 125m service-ca 4.3.0-0.nightly-2019-12-22-223447 True False False 21h service-catalog-apiserver 4.3.0-0.nightly-2019-12-22-223447 True False False 18h service-catalog-controller-manager 4.3.0-0.nightly-2019-12-22-223447 True False False 20h storage 4.3.0-0.nightly-2019-12-22-223447 True False False 19h Expected results: Upgrading succeeded without errors Additional info: [kublet logs] on the rhel worker Dec 24 00:33:52 weinliu-1223-7674g-rhel-0 hyperkube[19781]: W1224 00:33:52.415479 19781 options.go:263] unknown 'kubernetes.io' or 'k8s.io' labels specified with --node-labels: [node-role.kubernetes.io/worker] Dec 24 00:33:52 weinliu-1223-7674g-rhel-0 hyperkube[19781]: W1224 00:33:52.415488 19781 options.go:264] in 1.16, --node-labels in the 'kubernetes.io' namespace must begin with an allowed prefix (kubelet.kubernetes.io, node.kubernetes.io) or be in the specifically allowed set (beta.kubernetes.io/arch, beta.kubernetes.io/instance-type, beta.kubernetes.io/os, failure-domain.beta.kubernetes.io/region, failure-domain.beta.kubernetes.io/zone, failure-domain.kubernetes.io/region, failure-domain.kubernetes.io/zone, kubernetes.io/arch, kubernetes.io/hostname, kubernetes.io/instance-type, kubernetes.io/os) Dec 24 00:33:52 weinliu-1223-7674g-rhel-0 hyperkube[19781]: Flag --minimum-container-ttl-duration has been deprecated, Use --eviction-hard or --eviction-soft instead. Will be removed in a future version. Dec 24 00:33:52 weinliu-1223-7674g-rhel-0 hyperkube[19781]: F1224 00:33:52.417436 19781 server.go:206] unrecognized feature gate: LegacyNodeRoleBehavior Dec 24 00:34:02 weinliu-1223-7674g-rhel-0 systemd[1]: kubelet.service holdoff time over, scheduling restart. Dec 24 00:34:02 weinliu-1223-7674g-rhel-0 systemd[1]: Stopped Kubernetes Kubelet. -- Subject: Unit kubelet.service has finished shutting down
Dec 24 00:33:52 weinliu-1223-7674g-rhel-0 hyperkube[19781]: F1224 00:33:52.417436 19781 server.go:206] unrecognized feature gate: LegacyNodeRoleBehavior Ryan can you take a look at this? not sure if it's the cause of the upgrade failure on rhel nodes but worth checking as I don't see anything wrong MCO-wise
openshift/api PR that put in this feature gate (new to 4.3) https://github.com/openshift/api/pull/467 The kubelet config controller in the MCO currently assumes that the set of OCP features gates is equal to the set of kube feature gates. LegacyNodeRoleBehavior, and the others introduced in that PR, are the first to introduce a OCP feature gate that is not a kube feature gate, leading to this issue. https://github.com/openshift/machine-config-operator/blob/master/pkg/controller/kubelet-config/kubelet_config_features.go#L189-L212
Wow, ok completely wrong. LegacyNodeRoleBehavior _is_ an upstream feature gate introduced in 1.16: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/ So this is a skew issue. The kubelet needs to be updated on the node before the the machine-config operator is updated. Updating the MCO will update the MCC thus the kubelet-config controller, which imports the new openshift/api and will include new feature gates in the config that may (and in this case, are) incompatible with the old kubelet. https://github.com/openshift/api/blob/master/config/v1/types_feature.go#L113-L124
Additionally, we test this upgrade with RHCOS workers in CI and it works. This issue is only for RHEL workers. I'm pretty sure that when the MachineConfig changes during RHCOS upgrade, we pivot the ostree first, upgrading the kubelet, then reboot and get the new files, including the new kubelet config file. That explains why we don't see this in the RHCOS worker case. Antonio, can you confirm this?
(In reply to Seth Jennings from comment #9) > Additionally, we test this upgrade with RHCOS workers in CI and it works. > This issue is only for RHEL workers. > > I'm pretty sure that when the MachineConfig changes during RHCOS upgrade, we > pivot the ostree first, upgrading the kubelet, then reboot and get the new > files, including the new kubelet config file. That explains why we don't > see this in the RHCOS worker case. > > Antonio, can you confirm this? That is the case indeed and explains why we only see this on RHEL7 workers, is this something we need to take care on the MCC-kubelet controller or is RHEL/ansible responsability to do this?
It seems to me that the RHEL worker upgrade pattern should be to upgrade the workers first, then upgrade the cluster. Newer kubelet will be compatible with the older config, at least n-1 skewed. Am I missing some obvious issue with that?
Attempting this, the playbook currently reads the running cluster version and will only install openshift rpms that match the cluster version, even if a repo that has the newer version is installed https://github.com/openshift/openshift-ansible/blob/91645ed18b8e0b6c84dcc0229d02aee77db3fae2/roles/openshift_node/tasks/install.yml#L23-L60 This currently forces the "upgrade cluster then upgrade workers" ordering.
(In reply to Seth Jennings from comment #11) > It seems to me that the RHEL worker upgrade pattern should be to upgrade the > workers first, then upgrade the cluster. Newer kubelet will be compatible > with the older config, at least n-1 skewed. > > Am I missing some obvious issue with that? Just that we'd have to ensure that the API were upgraded prior to kubelet because we don't support kubelet > api.
The behavior during a 4.2 to 4.3 upgrade is that when the MCO rolls out the new configuration it will cordon and mark unavailable the number of hosts specified by the `maxUnavailable` field on the machine configuration pool. It will then apply new configuration and reboot the host. When doing so on a RHEL Worker this process does not update the kubelet therefore configuration specified by 4.3 will be applied to a 4.2 kubelet, because of this the host never returns to Ready state. This will stop the rollout until that host becomes available again and under the assumption that maxUnavailable, which defaults to 1, has been configured at a level acceptable to ensure normal cluster operation this should not be seen as a critical situation. Therefore, we will amend the documentation to make it clear that this will happen during 4.2 to 4.3 upgrades in clusters with RHEL workers and that the admin will need to run the RHEL Worker upgrade playbooks to complete the upgrade. Running the upgrade playbooks will update the kubelet on all specified RHEL workers and reboot them one by one. Once the RHEL worker has been updated it will return to ready state and the upgrade will complete as expected. This process also ensures that the API will have been upgraded prior to upgrading the kubelets where as other patterns may not. We will evaluate additional changes in the future to make this a more seamless upgrade.
Junqi, did you install the 4.3 repo on the RHEL worker? I would have thought the upgrade playbook would fail if you had not, but maybe not
(In reply to Seth Jennings from comment #18) > Junqi, did you install the 4.3 repo on the RHEL worker? I would have thought > the upgrade playbook would fail if you had not, but maybe not Hmm, I'd assumed that's in the docs for rhel worker upgrade but I'm not finding that. We need to make sure that the repo toggling bits are added too. Roughly the same as the subscription manager snippet here https://docs.openshift.com/container-platform/3.11/upgrading/automated_upgrades.html#preparing-for-an-automated-upgrade
PR's here: https://github.com/openshift/openshift-docs/pull/19059 Gaoyun Pei, will you PTAL?
Add comment to the doc PR.
The proposed doc PR lgtm, move this bug to verified for 4.3.0. And also cloned the bug to 4.4.0 to see if we could have some better solution.
This change is live on docs.openshift: https://docs.openshift.com/container-platform/4.3/updating/updating-cluster-rhel-compute.html#rhel-compute-updating_updating-cluster-rhel-compute And on the portal: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html-single/updating_clusters/index#rhel-compute-updating_updating-cluster-rhel-compute