Bug 1791916

Summary: After upgrading 4.1->4.2->4.3 some tuned pods couldn't be started.
Product: OpenShift Container Platform Reporter: Simon <skordas>
Component: Node Tuning OperatorAssignee: Sebastian Jug <sejug>
Status: CLOSED EOL QA Contact: Simon <skordas>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: akamra, anli, jmencak, jparrill, juzhao, nnosenzo, scuppett, sejug, wvoesch, zisis.lianas
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-25 12:10:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1795665, 1795671    
Bug Blocks:    

Description Simon 2020-01-16 17:48:46 UTC
Description of problem:
From anli

Some tuned could not be started in the cluster (Jan 16  OCP_Upgrade
Testing_4.1 nightly ->4.2 nightly ->4.3.0 RC).  From the log, it was
trying to mount a non-existing tuned secret.


Version-Release number of selected component (if applicable):


How reproducible:
For now only once.

Comment 1 Simon 2020-01-16 17:51:11 UTC
[anli@preserve-docker-slave 77533]$ oc get pods
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-node-tuning-operator-7876c9588f-zvrbq   1/1     Running             1          132m
tuned-5cv7v                                     0/1     ContainerCreating   0          169m
tuned-79c8s                                     1/1     Running             1          169m
tuned-9hb29                                     0/1     ContainerCreating   0          169m
tuned-fjngv                                     0/1     ContainerCreating   0          169m
tuned-fk2l9                                     1/1     Running             1          169m
tuned-hz2cx                                     1/1     Running             1          169m
tuned-lz52m                                     1/1     Running             1          169m
tuned-qcjdh                                     0/1     ContainerCreating   0          169m
tuned-rzk77                                     0/1     ContainerCreating   0          169m



  Normal   Created      169m                  kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Created container tuned
  Normal   Started      169m                  kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Started container tuned
  Warning  FailedMount  101m (x22 over 129m)  kubelet,
ip-10-0-60-171.us-east-2.compute.internal  MountVolume.SetUp failed
for volume "tuned-token-ml7gx" : secret "tuned-token-ml7gx" not found
  Warning  FailedMount  78m                   kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[tuned-token-ml7gx etc-tuned-recommend
var-lib-tuned-profiles-data sys var-run-dbus run-systemd-system
lib-modules]: timed out waiting for the condition
  Warning  FailedMount  58m (x4 over 92m)     kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[etc-tuned-recommend var-lib-tuned-profiles-data sys
var-run-dbus run-systemd-system lib-modules tuned-token-ml7gx]: timed
out waiting for the condition
  Warning  FailedMount  53m (x5 over 94m)     kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[var-run-dbus run-systemd-system lib-modules tuned-token-ml7gx
etc-tuned-recommend var-lib-tuned-profiles-data sys]: timed out
waiting for the condition
  Warning  FailedMount  28m (x5 over 87m)     kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[var-lib-tuned-profiles-data sys var-run-dbus
run-systemd-system lib-modules tuned-token-ml7gx etc-tuned-recommend]:
timed out waiting for the condition
  Warning  FailedMount  13m (x50 over 99m)    kubelet,
ip-10-0-60-171.us-east-2.compute.internal  MountVolume.SetUp failed
for volume "tuned-token-ml7gx" : secret "tuned-token-ml7gx" not found
  Warning  FailedMount  8m25s (x7 over 97m)   kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[run-systemd-system lib-modules tuned-token-ml7gx
etc-tuned-recommend var-lib-tuned-profiles-data sys var-run-dbus]:
timed out waiting for the condition
  Warning  FailedMount  3m50s (x6 over 83m)   kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[sys var-run-dbus run-systemd-system lib-modules
tuned-token-ml7gx etc-tuned-recommend var-lib-tuned-profiles-data]:
timed out waiting for the condition

Comment 4 Simon 2020-01-16 20:23:05 UTC
Workaround: Delete tuned daemonset to get working pods.
oc delete ds/tuned

Comment 5 Mike Fiedler 2020-01-16 20:47:01 UTC
Release note:

When upgrading a cluster from 4.1 to 4.2 to 4.3, there is a possibility that Node Tuning Operator tuned pods can get stuck in ContainerCreating state.

Confirming the issue:

- oc get pods -n openshift-cluster-node-tuning-operator
- one or more tuned pods are stuck in ContainerCreating state

Workaround to resolve the issue:

- oc delete daemonset/tuned -n openshift-cluster-node-tuning-operator
- oc get daemonset/tuned -n openshift-cluster-node-tuning-operator
- oc get pods -n openshift-cluster-node-tuning-operator
- Verifiy the pods are now in Running state

Comment 7 Anping Li 2020-01-20 10:32:34 UTC
Hit this issue in again after upgrade from 4.2.15 to 4.3.0-rc.3.

Comment 8 Junqi Zhao 2020-01-20 10:35:59 UTC
(In reply to Anping Li from comment #7)
> Hit this issue in again after upgrade from 4.2.15 to 4.3.0-rc.3.

it is upgraded from 4.2.16 to 4.3.0-rc.3

Comment 10 Jiří Mencák 2020-03-03 13:22:33 UTC
*** Bug 1793714 has been marked as a duplicate of this bug. ***

Comment 12 wvoesch 2020-06-08 11:50:02 UTC
I have seen this issue too. 

After updating from 4.3.0-0.nightly-s390x-2020-06-02-081204 to 4.4.0-0.nightly-s390x-2020-06-01-021037 tuned started. However after one restart of the cluster the tuned pods did not get started. 

Your workaround resolved the issue.

Restarting of the cluster was successful.

Comment 13 Jiří Mencák 2020-06-08 12:45:30 UTC
Any logs from the operator/tuned pods we could take a look at?

Comment 14 wvoesch 2020-06-08 14:22:35 UTC
Unfortunately there are no logs available, only some eventlogs from a tuned-pod that is not spinning up. I attach them below, hoping that they will help at least a little. 

Unfortunately again, I already had to reinstall the cluster to a different version. 


Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
a few seconds ago
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system lib-modules host]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
2 minutes ago
Generated from kubelet on master-0.domain
15 times in the last 16 minutes
MountVolume.SetUp failed for volume "var-lib-tuned-profiles-data" : stat /var/lib/kubelet/pods/0cc325bd-2434-4fbf-a288-2af9fed667c3/volumes/kubernetes.io~configmap/var-lib-tuned-profiles-data: no such file or directory

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
2 minutes ago
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[host tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system lib-modules]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
5 minutes ago
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system lib-modules host tuned-token-jdnc7]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
7 minutes ago
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[sys var-run-dbus run-systemd-system lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data etc]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
9 minutes ago
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
Jun 5, 3:11 pm
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[etc sys var-run-dbus run-systemd-system lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
Jun 5, 3:09 pm
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[run-systemd-system lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus]: timed out waiting for the condition

Comment 15 Jiří Mencák 2020-11-25 12:10:12 UTC
The last report of any issues was during an upgrade from 4.3 to 4.4.  4.4 has seen a large rewrite and none of these issues were reported since.  As 4.3 is end-of-life, I'm closing this BZ.  If the issues persist in the supported versions, please create a new BZ.