Description of problem: From anli Some tuned could not be started in the cluster (Jan 16 OCP_Upgrade Testing_4.1 nightly ->4.2 nightly ->4.3.0 RC). From the log, it was trying to mount a non-existing tuned secret. Version-Release number of selected component (if applicable): How reproducible: For now only once.
[anli@preserve-docker-slave 77533]$ oc get pods NAME READY STATUS RESTARTS AGE cluster-node-tuning-operator-7876c9588f-zvrbq 1/1 Running 1 132m tuned-5cv7v 0/1 ContainerCreating 0 169m tuned-79c8s 1/1 Running 1 169m tuned-9hb29 0/1 ContainerCreating 0 169m tuned-fjngv 0/1 ContainerCreating 0 169m tuned-fk2l9 1/1 Running 1 169m tuned-hz2cx 1/1 Running 1 169m tuned-lz52m 1/1 Running 1 169m tuned-qcjdh 0/1 ContainerCreating 0 169m tuned-rzk77 0/1 ContainerCreating 0 169m Normal Created 169m kubelet, ip-10-0-60-171.us-east-2.compute.internal Created container tuned Normal Started 169m kubelet, ip-10-0-60-171.us-east-2.compute.internal Started container tuned Warning FailedMount 101m (x22 over 129m) kubelet, ip-10-0-60-171.us-east-2.compute.internal MountVolume.SetUp failed for volume "tuned-token-ml7gx" : secret "tuned-token-ml7gx" not found Warning FailedMount 78m kubelet, ip-10-0-60-171.us-east-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[tuned-token-ml7gx], unattached volumes=[tuned-token-ml7gx etc-tuned-recommend var-lib-tuned-profiles-data sys var-run-dbus run-systemd-system lib-modules]: timed out waiting for the condition Warning FailedMount 58m (x4 over 92m) kubelet, ip-10-0-60-171.us-east-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[tuned-token-ml7gx], unattached volumes=[etc-tuned-recommend var-lib-tuned-profiles-data sys var-run-dbus run-systemd-system lib-modules tuned-token-ml7gx]: timed out waiting for the condition Warning FailedMount 53m (x5 over 94m) kubelet, ip-10-0-60-171.us-east-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[tuned-token-ml7gx], unattached volumes=[var-run-dbus run-systemd-system lib-modules tuned-token-ml7gx etc-tuned-recommend var-lib-tuned-profiles-data sys]: timed out waiting for the condition Warning FailedMount 28m (x5 over 87m) kubelet, ip-10-0-60-171.us-east-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[tuned-token-ml7gx], unattached volumes=[var-lib-tuned-profiles-data sys var-run-dbus run-systemd-system lib-modules tuned-token-ml7gx etc-tuned-recommend]: timed out waiting for the condition Warning FailedMount 13m (x50 over 99m) kubelet, ip-10-0-60-171.us-east-2.compute.internal MountVolume.SetUp failed for volume "tuned-token-ml7gx" : secret "tuned-token-ml7gx" not found Warning FailedMount 8m25s (x7 over 97m) kubelet, ip-10-0-60-171.us-east-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[tuned-token-ml7gx], unattached volumes=[run-systemd-system lib-modules tuned-token-ml7gx etc-tuned-recommend var-lib-tuned-profiles-data sys var-run-dbus]: timed out waiting for the condition Warning FailedMount 3m50s (x6 over 83m) kubelet, ip-10-0-60-171.us-east-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[tuned-token-ml7gx], unattached volumes=[sys var-run-dbus run-systemd-system lib-modules tuned-token-ml7gx etc-tuned-recommend var-lib-tuned-profiles-data]: timed out waiting for the condition
Workaround: Delete tuned daemonset to get working pods. oc delete ds/tuned
Release note: When upgrading a cluster from 4.1 to 4.2 to 4.3, there is a possibility that Node Tuning Operator tuned pods can get stuck in ContainerCreating state. Confirming the issue: - oc get pods -n openshift-cluster-node-tuning-operator - one or more tuned pods are stuck in ContainerCreating state Workaround to resolve the issue: - oc delete daemonset/tuned -n openshift-cluster-node-tuning-operator - oc get daemonset/tuned -n openshift-cluster-node-tuning-operator - oc get pods -n openshift-cluster-node-tuning-operator - Verifiy the pods are now in Running state
Hit this issue in again after upgrade from 4.2.15 to 4.3.0-rc.3.
(In reply to Anping Li from comment #7) > Hit this issue in again after upgrade from 4.2.15 to 4.3.0-rc.3. it is upgraded from 4.2.16 to 4.3.0-rc.3
*** Bug 1793714 has been marked as a duplicate of this bug. ***
I have seen this issue too. After updating from 4.3.0-0.nightly-s390x-2020-06-02-081204 to 4.4.0-0.nightly-s390x-2020-06-01-021037 tuned started. However after one restart of the cluster the tuned pods did not get started. Your workaround resolved the issue. Restarting of the cluster was successful.
Any logs from the operator/tuned pods we could take a look at?
Unfortunately there are no logs available, only some eventlogs from a tuned-pod that is not spinning up. I attach them below, hoping that they will help at least a little. Unfortunately again, I already had to reinstall the cluster to a different version. Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator a few seconds ago Generated from kubelet on master-0.domain Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system lib-modules host]: timed out waiting for the condition Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator 2 minutes ago Generated from kubelet on master-0.domain 15 times in the last 16 minutes MountVolume.SetUp failed for volume "var-lib-tuned-profiles-data" : stat /var/lib/kubelet/pods/0cc325bd-2434-4fbf-a288-2af9fed667c3/volumes/kubernetes.io~configmap/var-lib-tuned-profiles-data: no such file or directory Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator 2 minutes ago Generated from kubelet on master-0.domain Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[host tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system lib-modules]: timed out waiting for the condition Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator 5 minutes ago Generated from kubelet on master-0.domain Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system lib-modules host tuned-token-jdnc7]: timed out waiting for the condition Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator 7 minutes ago Generated from kubelet on master-0.domain Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[sys var-run-dbus run-systemd-system lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data etc]: timed out waiting for the condition Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator 9 minutes ago Generated from kubelet on master-0.domain Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system]: timed out waiting for the condition Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator Jun 5, 3:11 pm Generated from kubelet on master-0.domain Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[etc sys var-run-dbus run-systemd-system lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data]: timed out waiting for the condition Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator Jun 5, 3:09 pm Generated from kubelet on master-0.domain Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[run-systemd-system lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus]: timed out waiting for the condition
The last report of any issues was during an upgrade from 4.3 to 4.4. 4.4 has seen a large rewrite and none of these issues were reported since. As 4.3 is end-of-life, I'm closing this BZ. If the issues persist in the supported versions, please create a new BZ.