Hide Forgot
Description of problem: Using a bootstrap node to create a SNO cluster [4.8.0-0.nightly-2021-04-08-124424], I am seeing the following issues: 1. I'm seeing MCD running twice on the single node 2. On install named A, there was a rendered config in the 'worker' pool, and nothing in the master pool. Custom configurations (Kubelet Configs, etc) would not be able to create a new machine config since the pool didn't have any rendered configs. 3. On install named B (a subsequent install) both master and worker pools were missing rendered configs. [1] 1. https://gist.githubusercontent.com/rphillips/ebd65549fb7ea446906ad9d29d49f6c1/raw/9fdad6d0f4572714f950a6ac0e4175bb5f5366dc/gistfile1.txt Version-Release number of selected component (if applicable): How reproducible: Seems to be different behavior on 3 separate installs. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
So just installing a regular SNO cluster with your reported version, I can see: [root@yzhang tmp]# oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-128-18.us-west-2.compute.internal Ready master,worker 34m v1.20.0+5f82cdb [root@yzhang tmp]# oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-722813928e4dc89c4278d6e66ccb1488 True False False 1 1 1 0 33m worker rendered-worker-349538575201cc727fd5db23354ff784 True False False 0 0 0 0 33m [root@yzhang tmp]# mcopods NAME READY STATUS RESTARTS AGE machine-config-controller-5954c58ff6-bprps 1/1 Running 2 31m machine-config-daemon-264bp 2/2 Running 0 33m machine-config-operator-776c745668-zg7xx 1/1 Running 2 38m machine-config-server-wwvgg 1/1 Running 0 31m So in a regular operation, there should be: 1. one MCD running 2. rendered master and worker 3. worker pool being empty It sounds to me that the kubeletconfig likely failed generation, which in turn caused the MCC to fail generating the rendered config. Do you have any logs from the failed runs?
I tried pre-applying the kubeletconfig manifest pre-install and I was able to reproduce: [root@yzhang 04-08]# oc get nodes oNAME STATUS ROLES AGE VERSION ip-10-0-153-126.us-west-1.compute.internal Ready master,worker 43m v1.20.0+5f82cdb [root@yzhang 04-08]# oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master False True True 1 0 0 1 41m worker rendered-worker-1eecc621170fe6d6fa350361bef8b00e True False False 0 0 0 0 41m [root@yzhang 04-08]# mcopods NAME READY STATUS RESTARTS AGE machine-config-controller-5954c58ff6-k95bz 1/1 Running 3 39m machine-config-daemon-lmpqr 2/2 Running 0 41m machine-config-operator-66c4ddf8df-n6njp 1/1 Running 3 49m machine-config-server-qk4pk 1/1 Running 0 39m The master pool has no config but it indeed gets generated. # oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master ca8fcb887f153604556f2706cee8a33165879797 3.2.0 42m 00-worker ca8fcb887f153604556f2706cee8a33165879797 3.2.0 42m 01-master-container-runtime ca8fcb887f153604556f2706cee8a33165879797 3.2.0 42m 01-master-kubelet ca8fcb887f153604556f2706cee8a33165879797 3.2.0 42m 01-worker-container-runtime ca8fcb887f153604556f2706cee8a33165879797 3.2.0 42m 01-worker-kubelet ca8fcb887f153604556f2706cee8a33165879797 3.2.0 42m 99-master-generated-kubelet ca8fcb887f153604556f2706cee8a33165879797 3.2.0 42m 99-master-generated-registries ca8fcb887f153604556f2706cee8a33165879797 3.2.0 42m 99-master-ssh 3.2.0 52m 99-worker-generated-registries ca8fcb887f153604556f2706cee8a33165879797 3.2.0 42m 99-worker-ssh 3.2.0 52m rendered-master-4d4cda0f8a9e404f0e3d170f9e48201f ca8fcb887f153604556f2706cee8a33165879797 3.2.0 42m rendered-worker-1eecc621170fe6d6fa350361bef8b00e ca8fcb887f153604556f2706cee8a33165879797 3.2.0 42m It just has no config since none of the nodes are on any config for it to report. This is due to the kubeletconfig failing to get rendered during bootstrap. In the MCD logs I see: I0409 00:14:22.993466 9851 daemon.go:769] In bootstrap mode E0409 00:14:22.993573 9851 writer.go:135] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-1551420a8a62d1a81f239737550b6cfd" not found So this seems more inline with your first try. I still believe the correct way to solve this is having the kubeletconfigcontroller handle bootstrap manifests. We have it for containerruntimecontroller but not the kubeletcontroller (see https://github.com/openshift/machine-config-operator/pull/1866) Moving over to node team to take a look
verified on version: 4.8.0-0.nightly-2021-04-18-101412 # oc get node NAME STATUS ROLES AGE VERSION sno-0-0 Ready master,worker 3h45m v1.21.0-rc.0+2993be8 # oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-768ee516526793d07d6875602ae3e2c6 True False False 1 1 1 0 3h43m worker rendered-worker-c3a3ddff445e0a8fcecd38e7225b3d85 True False False 0 0 0 0 3h43m # oc get pod -n openshift-machine-config-operator NAME READY STATUS RESTARTS AGE machine-config-controller-8676847cf8-8wzhd 1/1 Running 4 4h5m machine-config-daemon-7tq92 2/2 Running 0 4h7m machine-config-operator-84bd546ffc-48qsk 1/1 Running 5 4h18m machine-config-server-hdqjp 1/1 Running 0 3h54m
*** Bug 1951009 has been marked as a duplicate of this bug. ***
Can this be fixed also for 4.7 ?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438