Bug 1718726 - Worker node Allocatable pods capacity reverted from 500 back to 250 after update and reboot
Summary: Worker node Allocatable pods capacity reverted from 500 back to 250 after upd...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.1.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.2.0
Assignee: Robert Krawitz
QA Contact: Weinan Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-10 05:51 UTC by Walid A.
Modified: 2019-10-28 13:38 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The controller for kubeConfig was incorrect. Consequence: If the kubeletConfig was customized by the administrator, the changes will be reverted if the cluster is upgraded to a version that uses a different OS release. Fix: Specify the correct controller in the source. Result: Customizations to the kubeletConfig will be retained.
Clone Of:
Environment:
Last Closed: 2019-10-16 06:31:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Upgrade trace to 4.2.0-0.okd-2019-07-29-205431 (643.52 KB, text/plain)
2019-07-29 22:50 UTC, Robert Krawitz
no flags Details
Node log from worker node corresponding to upgrade trace. (3.93 MB, text/plain)
2019-07-29 22:58 UTC, Robert Krawitz
no flags Details
journalctl -u kubelet (17.51 MB, text/plain)
2019-07-29 23:02 UTC, Robert Krawitz
no flags Details
MCO log (12.64 KB, text/plain)
2019-07-29 23:03 UTC, Robert Krawitz
no flags Details
machineconfig for the worker (82.85 KB, text/plain)
2019-07-29 23:03 UTC, Robert Krawitz
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:31:52 UTC

Comment 1 Seth Jennings 2019-06-10 13:24:42 UTC
Ryan, can you take a look?  Not sure if this is a KubeletConfig controller issue or an MC issue.

Comment 2 Ryan Phillips 2019-06-10 13:49:46 UTC
From what I can tell after looking at this on Friday, it looks like there was an RHCOS upgrade.

Next time this issue comes up we need the MCC logs and `oc get kubeletconfigs` output.

Comment 4 Colin Walters 2019-07-24 15:15:13 UTC
Offhand I think the most likely thing here is a race in the kubelet/render controller which caused the rendered MC to transiently drop the kubelet config.

I'd look at:
`oc -n openshift-machine-config-operator logs deploy/machine-config-controller`
to check.
Or you can look at `oc describe machineconfig/rendered-blah` to see what MCs it used as input.  If it's missing the kubeletconfig one, there's your smoking gun.

Comment 6 Robert Krawitz 2019-07-29 14:26:55 UTC
Did not happen after install of 4.2.0-0.nightly-2019-07-26-234318 and upgrade to 4.2.0-0.nightly-2019-07-28-222114.

Comment 7 Robert Krawitz 2019-07-29 22:32:17 UTC
Reproduced upgrading from 4.2.0-0.okd-2019-07-28-045558 to 4.2.0-0.okd-2019-07-29-205431 (which crosses an OS boundary).

Comment 8 Robert Krawitz 2019-07-29 22:48:40 UTC
The node is labeled correctly as a worker, but has a capacity of only 250 pods:

$ oc describe node ip-10-0-152-211.us-east-2.compute.internal
Name:               ip-10-0-152-211.us-east-2.compute.internal
Roles:              worker
...
Capacity:
 attachable-volumes-aws-ebs:  39
 cpu:                         2
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      8162900Ki
 pods:                        250


The machineconfigpool is labeled correctly:

$ oc describe machineconfigpool worker
Name:         worker
Namespace:    
Labels:       custom-kubelet=large-pods
              machineconfiguration.openshift.io/mco-built-in=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool


The KubeletConfig is correct:

$ oc describe kubeletconfig set-max-pods
Name:         set-max-pods
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         KubeletConfig
Metadata:
  Creation Timestamp:  2019-07-29T21:31:53Z
  Finalizers:
    99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-kubelet
    99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-kubelet
    99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-kubelet
  Generation:        1
  Resource Version:  56071
  Self Link:         /apis/machineconfiguration.openshift.io/v1/kubeletconfigs/set-max-pods
  UID:               48a1237e-b248-11e9-a6e4-0a033c674f6c
Spec:
  Kubelet Config:
    Kube API Burst:  30
    Kube APIQPS:     15
    Max Pods:        750
  Machine Config Pool Selector:
    Match Labels:
      Custom - Kubelet:  large-pods
Status:
  Conditions:
    Last Transition Time:  2019-07-29T21:31:53Z
    Message:               Success
    Status:                True
    Type:                  Success
    Last Transition Time:  2019-07-29T22:17:26Z
    Message:               Success
    Status:                True
    Type:                  Success
    Last Transition Time:  2019-07-29T22:23:52Z
    Message:               Success
    Status:                True
    Type:                  Success
Events:                    <none>

Comment 9 Robert Krawitz 2019-07-29 22:50:49 UTC
Created attachment 1594427 [details]
Upgrade trace to 4.2.0-0.okd-2019-07-29-205431

Comment 10 Antonio Murdaca 2019-07-29 22:51:56 UTC
Robert, can you grab the machine-config-controller pod logs as Colin suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1718726#c4 - that could tell us if the render controller ran w/o the kubelet config (even if it should at some point generate a new rendered with that and start a roll out afaict)

Comment 11 Antonio Murdaca 2019-07-29 22:54:18 UTC
The kubelet config MC seems to be there tho


      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet

The MCC logs would tell us more for sure

Comment 12 Antonio Murdaca 2019-07-29 22:57:11 UTC
From Robert's output tho, there seem to be rendered-MC w/o the kubelet config indeed.

Comment 13 Robert Krawitz 2019-07-29 22:58:59 UTC
Created attachment 1594428 [details]
Node log from worker node corresponding to upgrade trace.

Note that the transition from 750 to 250 pods occurred across the reboot around 22:36 UTC.

Comment 14 Robert Krawitz 2019-07-29 23:02:39 UTC
Created attachment 1594429 [details]
journalctl -u kubelet

Comment 15 Robert Krawitz 2019-07-29 23:03:13 UTC
Created attachment 1594430 [details]
MCO log

Comment 16 Robert Krawitz 2019-07-29 23:03:47 UTC
Created attachment 1594431 [details]
machineconfig for the worker

Comment 17 Antonio Murdaca 2019-07-29 23:48:06 UTC
```
1:40:07 [~] oc get machineconfigs
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
00-master                                                   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
00-worker                                                   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-master-container-runtime                                 093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-master-kubelet                                           093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-worker-container-runtime                                 093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-worker-kubelet                                           093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
99-master-cfef54fd-b23d-11e9-804f-02ab32c68f4a-registries   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
99-master-ssh                                                                                          2.2.0             3h25m
99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-registries   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
99-worker-ssh                                                                                          2.2.0             3h24m
rendered-master-28f01d3937cd8a942520c72df62f36ea            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
rendered-master-2ff3158a4ce7ed781e4a0cfda3f429c3            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             84m
rendered-worker-37078352b9543dc667ea867ea2a3a65c            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             129m
rendered-worker-a046d622eb6041e65084db6dcbeb9ae2            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
rendered-worker-cf3ba9c821b86b58407223b59e539b48            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             73m
rendered-worker-e456022d72042350c47c3828e4797f2b            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             84m
```

There's no `-kubelet` postfixed MC generated by the kubelet config controller it seems (why).

The worker MCP is targeting the resulting MC which doesn't contain the expected kubelet config MC.

Comment 18 Antonio Murdaca 2019-07-30 00:14:10 UTC
it looks like from the other rendered-worker-MC that the maxPod setting was indeed there but somehow the MC for it got lost

Comment 19 Robert Krawitz 2019-08-01 21:58:11 UTC
Fixed via https://github.com/openshift/machine-config-operator/pull/1022.

To test:

1) Install an image with underlying OS _older_ than the OS in the current release.

2) Check the OS version of this release:

$ oc image info -o json $(oc adm release info --image-for=machine-os-content) |jq '.config.config.Labels.version'
"42.80.20190731.2"

3) Increase the max pods (for example) of some or all nodes:

$ oc label machineconfigpool worker custom-kubelet=large-pods
$ oc create -f - <<'EOF'
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-max-pods
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: large-pods
  kubeletConfig:
    maxPods: 750
    KubeAPIBurst: 30
    KubeAPIQPS: 15
EOF

4) Wait for the number of pods on the relevant nodes to increase, and for the pods to be ready:

$ oc describe node |grep pods:
 pods:                        750
 pods:                        750
 pods:                        250
 pods:                        250
 pods:                        750
 pods:                        750
 pods:                        250
 pods:                        250
 pods:                        750
 pods:                        750
 pods:                        250
 pods:                        250
$ oc get nodes
NAME                                         STATUS   ROLES    AGE    VERSION
ip-10-0-138-155.us-east-2.compute.internal   Ready    worker   160m   v1.14.0+ff340d215
ip-10-0-140-107.us-east-2.compute.internal   Ready    master   165m   v1.14.0+ff340d215
ip-10-0-144-221.us-east-2.compute.internal   Ready    worker   160m   v1.14.0+ff340d215
ip-10-0-147-172.us-east-2.compute.internal   Ready    master   165m   v1.14.0+ff340d215
ip-10-0-163-193.us-east-2.compute.internal   Ready    worker   160m   v1.14.0+ff340d215
ip-10-0-174-195.us-east-2.compute.internal   Ready    master   165m   v1.14.0+ff340d215

5) Perform the upgrade.

6) Wait for the upgrade to complete.  Verify that the OS is different from the old one:

$ oc image info -o json $(oc adm release info --image-for=machine-os-content) |jq '.config.config.Labels.version'
"42.80.20190801.0"

*If the image you upgraded to has the same OS build as the previous image (hence the node OS will not be upgraded), it will not correctly test the bug.*

7) Repeat step (3) -- if the bug is fixed, you'll still see the adjusted max pods.

Comment 20 Weinan Liu 2019-08-19 12:17:36 UTC
I'm still working on this, looking for "OS _older_ than the OS in the current release", and it can get built successfully would take some time

Comment 25 errata-xmlrpc 2019-10-16 06:31:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.