Bug 1718726

Summary: Worker node Allocatable pods capacity reverted from 500 back to 250 after update and reboot
Product: OpenShift Container Platform Reporter: Walid A. <wabouham>
Component: NodeAssignee: Robert Krawitz <rkrawitz>
Status: CLOSED ERRATA QA Contact: Weinan Liu <weinliu>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: amurdaca, aos-bugs, jokerman, mifiedle, mmccomas, rkrawitz, walters
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The controller for kubeConfig was incorrect. Consequence: If the kubeletConfig was customized by the administrator, the changes will be reverted if the cluster is upgraded to a version that uses a different OS release. Fix: Specify the correct controller in the source. Result: Customizations to the kubeletConfig will be retained.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:31:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Upgrade trace to 4.2.0-0.okd-2019-07-29-205431
none
Node log from worker node corresponding to upgrade trace.
none
journalctl -u kubelet
none
MCO log
none
machineconfig for the worker none

Comment 1 Seth Jennings 2019-06-10 13:24:42 UTC
Ryan, can you take a look?  Not sure if this is a KubeletConfig controller issue or an MC issue.

Comment 2 Ryan Phillips 2019-06-10 13:49:46 UTC
From what I can tell after looking at this on Friday, it looks like there was an RHCOS upgrade.

Next time this issue comes up we need the MCC logs and `oc get kubeletconfigs` output.

Comment 4 Colin Walters 2019-07-24 15:15:13 UTC
Offhand I think the most likely thing here is a race in the kubelet/render controller which caused the rendered MC to transiently drop the kubelet config.

I'd look at:
`oc -n openshift-machine-config-operator logs deploy/machine-config-controller`
to check.
Or you can look at `oc describe machineconfig/rendered-blah` to see what MCs it used as input.  If it's missing the kubeletconfig one, there's your smoking gun.

Comment 6 Robert Krawitz 2019-07-29 14:26:55 UTC
Did not happen after install of 4.2.0-0.nightly-2019-07-26-234318 and upgrade to 4.2.0-0.nightly-2019-07-28-222114.

Comment 7 Robert Krawitz 2019-07-29 22:32:17 UTC
Reproduced upgrading from 4.2.0-0.okd-2019-07-28-045558 to 4.2.0-0.okd-2019-07-29-205431 (which crosses an OS boundary).

Comment 8 Robert Krawitz 2019-07-29 22:48:40 UTC
The node is labeled correctly as a worker, but has a capacity of only 250 pods:

$ oc describe node ip-10-0-152-211.us-east-2.compute.internal
Name:               ip-10-0-152-211.us-east-2.compute.internal
Roles:              worker
...
Capacity:
 attachable-volumes-aws-ebs:  39
 cpu:                         2
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      8162900Ki
 pods:                        250


The machineconfigpool is labeled correctly:

$ oc describe machineconfigpool worker
Name:         worker
Namespace:    
Labels:       custom-kubelet=large-pods
              machineconfiguration.openshift.io/mco-built-in=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool


The KubeletConfig is correct:

$ oc describe kubeletconfig set-max-pods
Name:         set-max-pods
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         KubeletConfig
Metadata:
  Creation Timestamp:  2019-07-29T21:31:53Z
  Finalizers:
    99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-kubelet
    99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-kubelet
    99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-kubelet
  Generation:        1
  Resource Version:  56071
  Self Link:         /apis/machineconfiguration.openshift.io/v1/kubeletconfigs/set-max-pods
  UID:               48a1237e-b248-11e9-a6e4-0a033c674f6c
Spec:
  Kubelet Config:
    Kube API Burst:  30
    Kube APIQPS:     15
    Max Pods:        750
  Machine Config Pool Selector:
    Match Labels:
      Custom - Kubelet:  large-pods
Status:
  Conditions:
    Last Transition Time:  2019-07-29T21:31:53Z
    Message:               Success
    Status:                True
    Type:                  Success
    Last Transition Time:  2019-07-29T22:17:26Z
    Message:               Success
    Status:                True
    Type:                  Success
    Last Transition Time:  2019-07-29T22:23:52Z
    Message:               Success
    Status:                True
    Type:                  Success
Events:                    <none>

Comment 9 Robert Krawitz 2019-07-29 22:50:49 UTC
Created attachment 1594427 [details]
Upgrade trace to 4.2.0-0.okd-2019-07-29-205431

Comment 10 Antonio Murdaca 2019-07-29 22:51:56 UTC
Robert, can you grab the machine-config-controller pod logs as Colin suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1718726#c4 - that could tell us if the render controller ran w/o the kubelet config (even if it should at some point generate a new rendered with that and start a roll out afaict)

Comment 11 Antonio Murdaca 2019-07-29 22:54:18 UTC
The kubelet config MC seems to be there tho


      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet

The MCC logs would tell us more for sure

Comment 12 Antonio Murdaca 2019-07-29 22:57:11 UTC
From Robert's output tho, there seem to be rendered-MC w/o the kubelet config indeed.

Comment 13 Robert Krawitz 2019-07-29 22:58:59 UTC
Created attachment 1594428 [details]
Node log from worker node corresponding to upgrade trace.

Note that the transition from 750 to 250 pods occurred across the reboot around 22:36 UTC.

Comment 14 Robert Krawitz 2019-07-29 23:02:39 UTC
Created attachment 1594429 [details]
journalctl -u kubelet

Comment 15 Robert Krawitz 2019-07-29 23:03:13 UTC
Created attachment 1594430 [details]
MCO log

Comment 16 Robert Krawitz 2019-07-29 23:03:47 UTC
Created attachment 1594431 [details]
machineconfig for the worker

Comment 17 Antonio Murdaca 2019-07-29 23:48:06 UTC
```
1:40:07 [~] oc get machineconfigs
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
00-master                                                   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
00-worker                                                   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-master-container-runtime                                 093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-master-kubelet                                           093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-worker-container-runtime                                 093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-worker-kubelet                                           093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
99-master-cfef54fd-b23d-11e9-804f-02ab32c68f4a-registries   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
99-master-ssh                                                                                          2.2.0             3h25m
99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-registries   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
99-worker-ssh                                                                                          2.2.0             3h24m
rendered-master-28f01d3937cd8a942520c72df62f36ea            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
rendered-master-2ff3158a4ce7ed781e4a0cfda3f429c3            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             84m
rendered-worker-37078352b9543dc667ea867ea2a3a65c            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             129m
rendered-worker-a046d622eb6041e65084db6dcbeb9ae2            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
rendered-worker-cf3ba9c821b86b58407223b59e539b48            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             73m
rendered-worker-e456022d72042350c47c3828e4797f2b            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             84m
```

There's no `-kubelet` postfixed MC generated by the kubelet config controller it seems (why).

The worker MCP is targeting the resulting MC which doesn't contain the expected kubelet config MC.

Comment 18 Antonio Murdaca 2019-07-30 00:14:10 UTC
it looks like from the other rendered-worker-MC that the maxPod setting was indeed there but somehow the MC for it got lost

Comment 19 Robert Krawitz 2019-08-01 21:58:11 UTC
Fixed via https://github.com/openshift/machine-config-operator/pull/1022.

To test:

1) Install an image with underlying OS _older_ than the OS in the current release.

2) Check the OS version of this release:

$ oc image info -o json $(oc adm release info --image-for=machine-os-content) |jq '.config.config.Labels.version'
"42.80.20190731.2"

3) Increase the max pods (for example) of some or all nodes:

$ oc label machineconfigpool worker custom-kubelet=large-pods
$ oc create -f - <<'EOF'
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-max-pods
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: large-pods
  kubeletConfig:
    maxPods: 750
    KubeAPIBurst: 30
    KubeAPIQPS: 15
EOF

4) Wait for the number of pods on the relevant nodes to increase, and for the pods to be ready:

$ oc describe node |grep pods:
 pods:                        750
 pods:                        750
 pods:                        250
 pods:                        250
 pods:                        750
 pods:                        750
 pods:                        250
 pods:                        250
 pods:                        750
 pods:                        750
 pods:                        250
 pods:                        250
$ oc get nodes
NAME                                         STATUS   ROLES    AGE    VERSION
ip-10-0-138-155.us-east-2.compute.internal   Ready    worker   160m   v1.14.0+ff340d215
ip-10-0-140-107.us-east-2.compute.internal   Ready    master   165m   v1.14.0+ff340d215
ip-10-0-144-221.us-east-2.compute.internal   Ready    worker   160m   v1.14.0+ff340d215
ip-10-0-147-172.us-east-2.compute.internal   Ready    master   165m   v1.14.0+ff340d215
ip-10-0-163-193.us-east-2.compute.internal   Ready    worker   160m   v1.14.0+ff340d215
ip-10-0-174-195.us-east-2.compute.internal   Ready    master   165m   v1.14.0+ff340d215

5) Perform the upgrade.

6) Wait for the upgrade to complete.  Verify that the OS is different from the old one:

$ oc image info -o json $(oc adm release info --image-for=machine-os-content) |jq '.config.config.Labels.version'
"42.80.20190801.0"

*If the image you upgraded to has the same OS build as the previous image (hence the node OS will not be upgraded), it will not correctly test the bug.*

7) Repeat step (3) -- if the bug is fixed, you'll still see the adjusted max pods.

Comment 20 Weinan Liu 2019-08-19 12:17:36 UTC
I'm still working on this, looking for "OS _older_ than the OS in the current release", and it can get built successfully would take some time

Comment 25 errata-xmlrpc 2019-10-16 06:31:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922