Bug 1772900 - MCO: Stale feature flags in kubelet.conf extracted from ignition file
Summary: MCO: Stale feature flags in kubelet.conf extracted from ignition file
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1771083
TreeView+ depends on / blocked
 
Reported: 2019-11-15 14:11 UTC by ravig
Modified: 2020-01-23 11:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:13:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1257 0 'None' closed Bug 1772900: bump mco deps for TLSSecurityProfile and ExperimentalCriticalPodAnnotation featuregate removal 2020-12-01 17:24:54 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:13:26 UTC

Description ravig 2019-11-15 14:11:35 UTC
Description of problem:

Stale feature gates in ignition file causing kubelet to fail to come up on the Windows nodes.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Start upstream 1.16 kubelet on a Windows node

Actual results:

The feature-gates in question were TLSSecurityProfile and ExperimentalCriticalPodAnnotation.

The PRs were merged for the same at https://github.com/openshift/api/pull/515 and https://github.com/openshift/api/pull/516

PR opened for MCO at https://github.com/openshift/machine-config-operator/pull/1257

Till the MCO PR is merged and a part of payload, feature gates mapping for TLSSecurityProfile and ExperimentalCriticalPodAnnotation need to be manually removed from the worker ignition file before running the bootstrapper


Problem occurred because of https://github.com/kubernetes/kubernetes/blob/b3875556b0edf3b5eaea32c69678edcf4117d316/cmd/kubelet/app/server.go#L206, due to which kubelet is exiting, however doesn't seem to be a problem on Linux nodes



Expected results:

Kubelet comes up fine.


Additional info:

Comment 2 Michael Nguyen 2019-11-21 16:49:58 UTC
I verified that the kubelet no longer has TLSSecurityProfile or ExperimentalCriticalPodAnnotation

oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2019-11-21-122827   True        False         16m     Cluster version is 4.3.0-0.nightly-2019-11-21-122827

$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
00-master                                                   ea51a9f88458a528a5782ca170090c201dd32ace   2.2.0             38m
00-worker                                                   ea51a9f88458a528a5782ca170090c201dd32ace   2.2.0             38m
01-master-container-runtime                                 ea51a9f88458a528a5782ca170090c201dd32ace   2.2.0             38m
01-master-kubelet                                           ea51a9f88458a528a5782ca170090c201dd32ace   2.2.0             38m
01-worker-container-runtime                                 ea51a9f88458a528a5782ca170090c201dd32ace   2.2.0             38m
01-worker-kubelet                                           ea51a9f88458a528a5782ca170090c201dd32ace   2.2.0             38m
99-master-6ca42d02-703b-48b2-89f0-de35f35ebb52-registries   ea51a9f88458a528a5782ca170090c201dd32ace   2.2.0             38m
99-master-ssh                                                                                          2.2.0             40m
99-worker-9a3f484b-a918-4545-8122-850ae33743d8-registries   ea51a9f88458a528a5782ca170090c201dd32ace   2.2.0             38m
99-worker-ssh                                                                                          2.2.0             40m
rendered-master-d0103688cb103a9be19d54ec47672145            ea51a9f88458a528a5782ca170090c201dd32ace   2.2.0             38m
rendered-worker-c79ebfdfe68a65a53f8106ad418fba98            ea51a9f88458a528a5782ca170090c201dd32ace   2.2.0             38m

$ oc get mc 01-master-kubelet -o yaml | grep -e  TLSSecurityProfile -e ExperimentalCriticalPodAnnotation
$ oc get mc 01-worker-kubelet -o yaml | grep -e  TLSSecurityProfile -e ExperimentalCriticalPodAnnotation

@ravig I don't have access to windows nodes.  Can you verify that upstream kube works with OpenShift 4.3.0-0.nightly-2019-11-21-122827 or later?  If not I will close this as verified.

Comment 3 Sebastian Soto 2019-11-21 18:16:13 UTC
I was able to run kubelet v1.16.2 on a windows node.

__________________________
$ oc get clusterversion
NAME      VERSION                        AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.ci-2019-11-21-103638   True        False         81m     Cluster version is 4.3.0-0.ci-2019-11-21-103638

$ oc describe no ip-10-0-40-142.ec2.internal |xp
Name:               ip-10-0-40-142.ec2.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m4.large
                    beta.kubernetes.io/os=windows
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1c
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ec2amaz-qqihbh7
                    kubernetes.io/os=windows
                    node.openshift.io/os_id=Windows
Annotations:        k8s.ovn.org/hybrid-overlay-hostsubnet: 10.132.3.0/24
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 21 Nov 2019 13:05:32 -0500
Taints:             os=Windows:NoSchedule
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 21 Nov 2019 13:13:32 -0500   Thu, 21 Nov 2019 13:05:32 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 21 Nov 2019 13:13:32 -0500   Thu, 21 Nov 2019 13:05:32 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 21 Nov 2019 13:13:32 -0500   Thu, 21 Nov 2019 13:05:32 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Thu, 21 Nov 2019 13:13:32 -0500   Thu, 21 Nov 2019 13:05:32 -0500   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.0.40.142
  ExternalIP:   3.231.95.32
  Hostname:     ip-10-0-40-142.ec2.internal
  InternalDNS:  ip-10-0-40-142.ec2.internal
  ExternalDNS:  ec2-3-231-95-32.compute-1.amazonaws.com
Capacity:
 attachable-volumes-aws-ebs:  39
 cpu:                         2
 ephemeral-storage:           31455228Ki
 memory:                      8388208Ki
 pods:                        250
Allocatable:
 attachable-volumes-aws-ebs:  39
 cpu:                         1500m
 ephemeral-storage:           28989138077
 memory:                      7773808Ki
 pods:                        250
System Info:
 Machine ID:                 EC2AMAZ-QQIHBH7
 System UUID:                EC2146A1-4CFC-58B6-83A6-9805E647814F
 Boot ID:                    
 Kernel Version:             10.0.17763.737
 OS Image:                   Windows Server 2019 Datacenter
 Operating System:           windows
 Architecture:               amd64
 Container Runtime Version:  docker://19.3.2
 Kubelet Version:            v1.16.2
 Kube-Proxy Version:         v1.16.2
ProviderID:                  aws:///us-east-1c/i-00b95614fca1588e1
Non-terminated Pods:         (0 in total)
  Namespace                  Name    CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----    ------------  ----------  ---------------  -------------  ---
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests  Limits
  --------                    --------  ------
  cpu                         0 (0%)    0 (0%)
  memory                      0 (0%)    0 (0%)
  ephemeral-storage           0 (0%)    0 (0%)
  attachable-volumes-aws-ebs  0         0
Events:                       <none>


_______________
The bug is fixed.

Comment 4 Michael Nguyen 2019-11-21 21:50:14 UTC
Thank you Sebastian!  Closing as verified

Comment 6 errata-xmlrpc 2020-01-23 11:13:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.