Bug 1718726

Summary:

Worker node Allocatable pods capacity reverted from 500 back to 250 after update and reboot

Product:

OpenShift Container Platform

Reporter:

Walid A. <wabouham>

Component:

Node

Assignee:

Robert Krawitz <rkrawitz>

Status:

CLOSED ERRATA

QA Contact:

Weinan Liu <weinliu>

Severity:

high

Docs Contact:

Priority:

high

Version:

4.1.0

CC:

amurdaca, aos-bugs, jokerman, mifiedle, mmccomas, rkrawitz, walters

Target Milestone:

---

Target Release:

4.2.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: The controller for kubeConfig was incorrect. Consequence: If the kubeletConfig was customized by the administrator, the changes will be reverted if the cluster is upgraded to a version that uses a different OS release. Fix: Specify the correct controller in the source. Result: Customizations to the kubeletConfig will be retained.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-10-16 06:31:35 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Upgrade trace to 4.2.0-0.okd-2019-07-29-205431	none
Node log from worker node corresponding to upgrade trace.	none
journalctl -u kubelet	none
MCO log	none
machineconfig for the worker	none

Comment 1 Seth Jennings 2019-06-10 13:24:42 UTC

Ryan, can you take a look?  Not sure if this is a KubeletConfig controller issue or an MC issue.

Comment 2 Ryan Phillips 2019-06-10 13:49:46 UTC

From what I can tell after looking at this on Friday, it looks like there was an RHCOS upgrade.

Next time this issue comes up we need the MCC logs and `oc get kubeletconfigs` output.

Comment 4 Colin Walters 2019-07-24 15:15:13 UTC

Offhand I think the most likely thing here is a race in the kubelet/render controller which caused the rendered MC to transiently drop the kubelet config.

I'd look at:
`oc -n openshift-machine-config-operator logs deploy/machine-config-controller`
to check.
Or you can look at `oc describe machineconfig/rendered-blah` to see what MCs it used as input.  If it's missing the kubeletconfig one, there's your smoking gun.

Comment 6 Robert Krawitz 2019-07-29 14:26:55 UTC

Did not happen after install of 4.2.0-0.nightly-2019-07-26-234318 and upgrade to 4.2.0-0.nightly-2019-07-28-222114.

Comment 7 Robert Krawitz 2019-07-29 22:32:17 UTC

Reproduced upgrading from 4.2.0-0.okd-2019-07-28-045558 to 4.2.0-0.okd-2019-07-29-205431 (which crosses an OS boundary).

Comment 8 Robert Krawitz 2019-07-29 22:48:40 UTC

The node is labeled correctly as a worker, but has a capacity of only 250 pods:

$ oc describe node ip-10-0-152-211.us-east-2.compute.internal
Name:               ip-10-0-152-211.us-east-2.compute.internal
Roles:              worker
...
Capacity:
 attachable-volumes-aws-ebs:  39
 cpu:                         2
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      8162900Ki
 pods:                        250


The machineconfigpool is labeled correctly:

$ oc describe machineconfigpool worker
Name:         worker
Namespace:    
Labels:       custom-kubelet=large-pods
              machineconfiguration.openshift.io/mco-built-in=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool


The KubeletConfig is correct:

$ oc describe kubeletconfig set-max-pods
Name:         set-max-pods
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         KubeletConfig
Metadata:
  Creation Timestamp:  2019-07-29T21:31:53Z
  Finalizers:
    99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-kubelet
    99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-kubelet
    99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-kubelet
  Generation:        1
  Resource Version:  56071
  Self Link:         /apis/machineconfiguration.openshift.io/v1/kubeletconfigs/set-max-pods
  UID:               48a1237e-b248-11e9-a6e4-0a033c674f6c
Spec:
  Kubelet Config:
    Kube API Burst:  30
    Kube APIQPS:     15
    Max Pods:        750
  Machine Config Pool Selector:
    Match Labels:
      Custom - Kubelet:  large-pods
Status:
  Conditions:
    Last Transition Time:  2019-07-29T21:31:53Z
    Message:               Success
    Status:                True
    Type:                  Success
    Last Transition Time:  2019-07-29T22:17:26Z
    Message:               Success
    Status:                True
    Type:                  Success
    Last Transition Time:  2019-07-29T22:23:52Z
    Message:               Success
    Status:                True
    Type:                  Success
Events:                    <none>

Comment 9 Robert Krawitz 2019-07-29 22:50:49 UTC

Created attachment 1594427 [details]
Upgrade trace to 4.2.0-0.okd-2019-07-29-205431

Comment 10 Antonio Murdaca 2019-07-29 22:51:56 UTC

Robert, can you grab the machine-config-controller pod logs as Colin suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1718726#c4 - that could tell us if the render controller ran w/o the kubelet config (even if it should at some point generate a new rendered with that and start a roll out afaict)

Comment 11 Antonio Murdaca 2019-07-29 22:54:18 UTC

The kubelet config MC seems to be there tho


      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet

The MCC logs would tell us more for sure

Comment 12 Antonio Murdaca 2019-07-29 22:57:11 UTC

From Robert's output tho, there seem to be rendered-MC w/o the kubelet config indeed.

Comment 13 Robert Krawitz 2019-07-29 22:58:59 UTC

Created attachment 1594428 [details]
Node log from worker node corresponding to upgrade trace.

Note that the transition from 750 to 250 pods occurred across the reboot around 22:36 UTC.

Comment 14 Robert Krawitz 2019-07-29 23:02:39 UTC

Created attachment 1594429 [details]
journalctl -u kubelet

Comment 15 Robert Krawitz 2019-07-29 23:03:13 UTC

Created attachment 1594430 [details]
MCO log

Comment 16 Robert Krawitz 2019-07-29 23:03:47 UTC

Created attachment 1594431 [details]
machineconfig for the worker

Comment 17 Antonio Murdaca 2019-07-29 23:48:06 UTC

```
1:40:07 [~] oc get machineconfigs
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
00-master                                                   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
00-worker                                                   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-master-container-runtime                                 093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-master-kubelet                                           093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-worker-container-runtime                                 093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
01-worker-kubelet                                           093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
99-master-cfef54fd-b23d-11e9-804f-02ab32c68f4a-registries   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
99-master-ssh                                                                                          2.2.0             3h25m
99-worker-cff08ded-b23d-11e9-804f-02ab32c68f4a-registries   093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
99-worker-ssh                                                                                          2.2.0             3h24m
rendered-master-28f01d3937cd8a942520c72df62f36ea            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
rendered-master-2ff3158a4ce7ed781e4a0cfda3f429c3            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             84m
rendered-worker-37078352b9543dc667ea867ea2a3a65c            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             129m
rendered-worker-a046d622eb6041e65084db6dcbeb9ae2            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             3h24m
rendered-worker-cf3ba9c821b86b58407223b59e539b48            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             73m
rendered-worker-e456022d72042350c47c3828e4797f2b            093e96ef4cdbd15ecda18323dadf4d552fcfd327   2.2.0             84m
```

There's no `-kubelet` postfixed MC generated by the kubelet config controller it seems (why).

The worker MCP is targeting the resulting MC which doesn't contain the expected kubelet config MC.

Comment 18 Antonio Murdaca 2019-07-30 00:14:10 UTC

it looks like from the other rendered-worker-MC that the maxPod setting was indeed there but somehow the MC for it got lost

Comment 19 Robert Krawitz 2019-08-01 21:58:11 UTC

Fixed via https://github.com/openshift/machine-config-operator/pull/1022.

To test:

1) Install an image with underlying OS _older_ than the OS in the current release.

2) Check the OS version of this release:

$ oc image info -o json $(oc adm release info --image-for=machine-os-content) |jq '.config.config.Labels.version'
"42.80.20190731.2"

3) Increase the max pods (for example) of some or all nodes:

$ oc label machineconfigpool worker custom-kubelet=large-pods
$ oc create -f - <<'EOF'
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-max-pods
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: large-pods
  kubeletConfig:
    maxPods: 750
    KubeAPIBurst: 30
    KubeAPIQPS: 15
EOF

4) Wait for the number of pods on the relevant nodes to increase, and for the pods to be ready:

$ oc describe node |grep pods:
 pods:                        750
 pods:                        750
 pods:                        250
 pods:                        250
 pods:                        750
 pods:                        750
 pods:                        250
 pods:                        250
 pods:                        750
 pods:                        750
 pods:                        250
 pods:                        250
$ oc get nodes
NAME                                         STATUS   ROLES    AGE    VERSION
ip-10-0-138-155.us-east-2.compute.internal   Ready    worker   160m   v1.14.0+ff340d215
ip-10-0-140-107.us-east-2.compute.internal   Ready    master   165m   v1.14.0+ff340d215
ip-10-0-144-221.us-east-2.compute.internal   Ready    worker   160m   v1.14.0+ff340d215
ip-10-0-147-172.us-east-2.compute.internal   Ready    master   165m   v1.14.0+ff340d215
ip-10-0-163-193.us-east-2.compute.internal   Ready    worker   160m   v1.14.0+ff340d215
ip-10-0-174-195.us-east-2.compute.internal   Ready    master   165m   v1.14.0+ff340d215

5) Perform the upgrade.

6) Wait for the upgrade to complete.  Verify that the OS is different from the old one:

$ oc image info -o json $(oc adm release info --image-for=machine-os-content) |jq '.config.config.Labels.version'
"42.80.20190801.0"

*If the image you upgraded to has the same OS build as the previous image (hence the node OS will not be upgraded), it will not correctly test the bug.*

7) Repeat step (3) -- if the bug is fixed, you'll still see the adjusted max pods.

Comment 20 Weinan Liu 2019-08-19 12:17:36 UTC

I'm still working on this, looking for "OS _older_ than the OS in the current release", and it can get built successfully would take some time

Comment 25 errata-xmlrpc 2019-10-16 06:31:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922