Bug 1998552 - Enforce OpenShift's defined kubelet version skew policies
Summary: Enforce OpenShift's defined kubelet version skew policies
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Luis Sanchez
QA Contact: Rahul Gangwar
URL:
Whiteboard:
Depends On:
Blocks: 2001244
TreeView+ depends on / blocked
 
Reported: 2021-08-27 14:45 UTC by Luis Sanchez
Modified: 2021-10-29 23:06 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:49:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 1199 0 None None None 2021-08-27 14:46:02 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:49:47 UTC

Description Luis Sanchez 2021-08-27 14:45:09 UTC
The API Server Operator will set Upgradeable=False whenever any of the nodes within the cluster are at the skew limit; that is, when an upgrade of the API Server would exceed the allowable kubelet version skew.

Comment 29 W. Trevor King 2021-09-15 18:01:17 UTC
One possible verification sketch:

1. Install $VERSION_1
2. Pause the compute MachineConfigPool.
3. Update to $VERSION_2 -> $VERSION_3 -> 4.9.0-rc.1 (or other recent 4.9).
4. Check Upgradeable on the kube-apiserver ClusterOperator.

For example, installing 4.7.30, pausing the pool, updating to 4.8.11, and updating to 4.9.0-rc.1 would give you a skew of 2, which for the odd 4.9 release is behind the 0-or-1 acceptable skew [1], so it should get Upgradeable=False, reason=KubeletMinorVersionUnsupported, with a message like:

  Unsupported kubelet minor versions on nodes $NODES are too far behind the target API server version ($4_9_API_SERVER_VERSION).

Getting at KubeletVersionUnknown might be harder; maybe the node folks have ideas at how you could set bogus information in the Node resource?

[1]: https://github.com/openshift/cluster-kube-apiserver-operator/pull/1199/files#diff-22001281e3b968448f2558fd87069f7dbe886ce349047d0270433e17ece4372aR37

Comment 31 Rahul Gangwar 2021-09-17 16:28:14 UTC
Upgrade is success 4.7 to 4.9 by pausing machine-config-pool for worker, and get message for OCP odd version: 

'KubeletMinorVersionUpgradeable: Unsupported kubelet minor versions on nodes ip-10-0-134-126.us-east-2.compute.internal, ip-10-0-177-196.us-east-2.compute.internal, and ip-10-0-197-69.us-east-2.compute.internal are too far behind the target API server version (1.22.1).'
    reason: KubeletMinorVersion_KubeletMinorVersionUnsupported
    status: "False"

NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-rc.1   True        False         80m     Cluster version is 4.9.0-rc.1

oc get node -A                  
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-131-50.us-east-2.compute.internal    Ready    master   5h1m    v1.22.0-rc.0+75ee307
ip-10-0-134-126.us-east-2.compute.internal   Ready    worker   4h56m   v1.20.0+9689d22
ip-10-0-170-97.us-east-2.compute.internal    Ready    master   5h2m    v1.22.0-rc.0+75ee307
ip-10-0-177-196.us-east-2.compute.internal   Ready    worker   4h57m   v1.20.0+9689d22
ip-10-0-197-69.us-east-2.compute.internal    Ready    worker   4h56m   v1.20.0+9689d22
ip-10-0-216-152.us-east-2.compute.internal   Ready    master   5h2m    v1.22.0-rc.0+75ee307

 oc get co kube-apiserver -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  annotations:
    exclude.release.openshift.io/internal-openshift-hosted: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  creationTimestamp: "2021-09-17T10:58:58Z"
  generation: 1
  managedFields:
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:exclude.release.openshift.io/internal-openshift-hosted: {}
          f:include.release.openshift.io/self-managed-high-availability: {}
          f:include.release.openshift.io/single-node-developer: {}
      f:spec: {}
      f:status:
        .: {}
        f:extension: {}
    manager: cluster-version-operator
    operation: Update
    time: "2021-09-17T10:58:58Z"
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:relatedObjects: {}
    manager: cluster-kube-apiserver-operator
    operation: Update
    time: "2021-09-17T11:54:03Z"
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions: {}
        f:versions: {}
    manager: cluster-kube-apiserver-operator
    operation: Update
    subresource: status
    time: "2021-09-17T13:46:53Z"
  name: kube-apiserver
  resourceVersion: "135703"
  uid: ac30d522-6486-44be-97fc-aa77f283ae12
spec: {}
status:
  conditions:
  - lastTransitionTime: "2021-09-17T14:48:44Z"
    message: 'NodeControllerDegraded: All master nodes are ready'
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2021-09-17T13:46:53Z"
    message: 'NodeInstallerProgressing: 3 nodes are at revision 10'
    reason: AsExpected
    status: "False"
    type: Progressing
  - lastTransitionTime: "2021-09-17T11:10:17Z"
    message: 'StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 10'
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2021-09-17T13:27:45Z"
    message: 'KubeletMinorVersionUpgradeable: Unsupported kubelet minor versions on nodes ip-10-0-134-126.us-east-2.compute.internal, ip-10-0-177-196.us-east-2.compute.internal, and ip-10-0-197-69.us-east-2.compute.internal are too far behind the target API server version (1.22.1).'
    reason: KubeletMinorVersion_KubeletMinorVersionUnsupported
    status: "False"
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: kubeapiservers
  - group: apiextensions.k8s.io
    name: ""
    resource: customresourcedefinitions
  - group: security.openshift.io
    name: ""
    resource: securitycontextconstraints
  - group: ""
    name: openshift-config
    resource: namespaces
  - group: ""
    name: openshift-config-managed
    resource: namespaces
  - group: ""
    name: openshift-kube-apiserver-operator
    resource: namespaces
  - group: ""
    name: openshift-kube-apiserver
    resource: namespaces
  - group: admissionregistration.k8s.io
    name: ""
    resource: mutatingwebhookconfigurations
  - group: admissionregistration.k8s.io
    name: ""
    resource: validatingwebhookconfigurations
  - group: controlplane.operator.openshift.io
    name: ""
    namespace: openshift-kube-apiserver
    resource: podnetworkconnectivitychecks
  - group: apiserver.openshift.io
    name: ""
    resource: apirequestcounts
  versions:
  - name: raw-internal
    version: 4.9.0-rc.1
  - name: kube-apiserver
    version: 1.22.1
  - name: operator
    version: 4.9.0-rc.1

Comment 32 Rahul Gangwar 2021-09-17 16:38:18 UTC
After unpausing machine-config-pool for worker the skew drops to 2 to 0 and operator gradually become fine.

NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-131-50.us-east-2.compute.internal    Ready    master   5h20m   v1.22.0-rc.0+75ee307
ip-10-0-134-126.us-east-2.compute.internal   Ready    worker   5h15m   v1.22.0-rc.0+75ee307
ip-10-0-170-97.us-east-2.compute.internal    Ready    master   5h21m   v1.22.0-rc.0+75ee307
ip-10-0-177-196.us-east-2.compute.internal   Ready    worker   5h15m   v1.22.0-rc.0+75ee307
ip-10-0-197-69.us-east-2.compute.internal    Ready    worker   5h15m   v1.22.0-rc.0+75ee307
ip-10-0-216-152.us-east-2.compute.internal   Ready    master   5h21m   v1.22.0-rc.0+75ee307

oc get co kube-apiserver -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  annotations:
    exclude.release.openshift.io/internal-openshift-hosted: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  creationTimestamp: "2021-09-17T10:58:58Z"
  generation: 1
  managedFields:
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:exclude.release.openshift.io/internal-openshift-hosted: {}
          f:include.release.openshift.io/self-managed-high-availability: {}
          f:include.release.openshift.io/single-node-developer: {}
      f:spec: {}
      f:status:
        .: {}
        f:extension: {}
    manager: cluster-version-operator
    operation: Update
    time: "2021-09-17T10:58:58Z"
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:relatedObjects: {}
    manager: cluster-kube-apiserver-operator
    operation: Update
    time: "2021-09-17T11:54:03Z"
  - apiVersion: config.openshift.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions: {}
        f:versions: {}
    manager: cluster-kube-apiserver-operator
    operation: Update
    subresource: status
    time: "2021-09-17T13:46:53Z"
  name: kube-apiserver
  resourceVersion: "165739"
  uid: ac30d522-6486-44be-97fc-aa77f283ae12
spec: {}
status:
  conditions:
  - lastTransitionTime: "2021-09-17T14:48:44Z"
    message: 'NodeControllerDegraded: All master nodes are ready'
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2021-09-17T13:46:53Z"
    message: 'NodeInstallerProgressing: 3 nodes are at revision 10'
    reason: AsExpected
    status: "False"
    type: Progressing
  - lastTransitionTime: "2021-09-17T11:10:17Z"
    message: 'StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 10'
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2021-09-17T16:25:24Z"
    message: 'KubeletMinorVersionUpgradeable: Kubelet and API server minor versions are synced.'
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: kubeapiservers
  - group: apiextensions.k8s.io
    name: ""
    resource: customresourcedefinitions
  - group: security.openshift.io
    name: ""
    resource: securitycontextconstraints
  - group: ""
    name: openshift-config
    resource: namespaces
  - group: ""
    name: openshift-config-managed
    resource: namespaces
  - group: ""
    name: openshift-kube-apiserver-operator
    resource: namespaces
  - group: ""
    name: openshift-kube-apiserver
    resource: namespaces
  - group: admissionregistration.k8s.io
    name: ""
    resource: mutatingwebhookconfigurations
  - group: admissionregistration.k8s.io
    name: ""
    resource: validatingwebhookconfigurations
  - group: controlplane.operator.openshift.io
    name: ""
    namespace: openshift-kube-apiserver
    resource: podnetworkconnectivitychecks
  - group: apiserver.openshift.io
    name: ""
    resource: apirequestcounts
  versions:
  - name: raw-internal
    version: 4.9.0-rc.1
  - name: kube-apiserver
    version: 1.22.1
  - name: operator
    version: 4.9.0-rc.1

Comment 35 errata-xmlrpc 2021-10-18 17:49:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.