Bug 1791916 - After upgrading 4.1->4.2->4.3 some tuned pods couldn't be started.
Summary: After upgrading 4.1->4.2->4.3 some tuned pods couldn't be started.
Keywords:
Status: ASSIGNED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.3.z
Assignee: Sebastian Jug
QA Contact: Simon
URL:
Whiteboard:
: 1793714 (view as bug list)
Depends On: 1795665 1795671
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-16 17:48 UTC by Simon
Modified: 2020-08-12 07:02 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Simon 2020-01-16 17:48:46 UTC
Description of problem:
From anli@redhat.com

Some tuned could not be started in the cluster (Jan 16  OCP_Upgrade
Testing_4.1 nightly ->4.2 nightly ->4.3.0 RC).  From the log, it was
trying to mount a non-existing tuned secret.


Version-Release number of selected component (if applicable):


How reproducible:
For now only once.

Comment 1 Simon 2020-01-16 17:51:11 UTC
[anli@preserve-docker-slave 77533]$ oc get pods
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-node-tuning-operator-7876c9588f-zvrbq   1/1     Running             1          132m
tuned-5cv7v                                     0/1     ContainerCreating   0          169m
tuned-79c8s                                     1/1     Running             1          169m
tuned-9hb29                                     0/1     ContainerCreating   0          169m
tuned-fjngv                                     0/1     ContainerCreating   0          169m
tuned-fk2l9                                     1/1     Running             1          169m
tuned-hz2cx                                     1/1     Running             1          169m
tuned-lz52m                                     1/1     Running             1          169m
tuned-qcjdh                                     0/1     ContainerCreating   0          169m
tuned-rzk77                                     0/1     ContainerCreating   0          169m



  Normal   Created      169m                  kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Created container tuned
  Normal   Started      169m                  kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Started container tuned
  Warning  FailedMount  101m (x22 over 129m)  kubelet,
ip-10-0-60-171.us-east-2.compute.internal  MountVolume.SetUp failed
for volume "tuned-token-ml7gx" : secret "tuned-token-ml7gx" not found
  Warning  FailedMount  78m                   kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[tuned-token-ml7gx etc-tuned-recommend
var-lib-tuned-profiles-data sys var-run-dbus run-systemd-system
lib-modules]: timed out waiting for the condition
  Warning  FailedMount  58m (x4 over 92m)     kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[etc-tuned-recommend var-lib-tuned-profiles-data sys
var-run-dbus run-systemd-system lib-modules tuned-token-ml7gx]: timed
out waiting for the condition
  Warning  FailedMount  53m (x5 over 94m)     kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[var-run-dbus run-systemd-system lib-modules tuned-token-ml7gx
etc-tuned-recommend var-lib-tuned-profiles-data sys]: timed out
waiting for the condition
  Warning  FailedMount  28m (x5 over 87m)     kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[var-lib-tuned-profiles-data sys var-run-dbus
run-systemd-system lib-modules tuned-token-ml7gx etc-tuned-recommend]:
timed out waiting for the condition
  Warning  FailedMount  13m (x50 over 99m)    kubelet,
ip-10-0-60-171.us-east-2.compute.internal  MountVolume.SetUp failed
for volume "tuned-token-ml7gx" : secret "tuned-token-ml7gx" not found
  Warning  FailedMount  8m25s (x7 over 97m)   kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[run-systemd-system lib-modules tuned-token-ml7gx
etc-tuned-recommend var-lib-tuned-profiles-data sys var-run-dbus]:
timed out waiting for the condition
  Warning  FailedMount  3m50s (x6 over 83m)   kubelet,
ip-10-0-60-171.us-east-2.compute.internal  Unable to attach or mount
volumes: unmounted volumes=[tuned-token-ml7gx], unattached
volumes=[sys var-run-dbus run-systemd-system lib-modules
tuned-token-ml7gx etc-tuned-recommend var-lib-tuned-profiles-data]:
timed out waiting for the condition

Comment 4 Simon 2020-01-16 20:23:05 UTC
Workaround: Delete tuned daemonset to get working pods.
oc delete ds/tuned

Comment 5 Mike Fiedler 2020-01-16 20:47:01 UTC
Release note:

When upgrading a cluster from 4.1 to 4.2 to 4.3, there is a possibility that Node Tuning Operator tuned pods can get stuck in ContainerCreating state.

Confirming the issue:

- oc get pods -n openshift-cluster-node-tuning-operator
- one or more tuned pods are stuck in ContainerCreating state

Workaround to resolve the issue:

- oc delete daemonset/tuned -n openshift-cluster-node-tuning-operator
- oc get daemonset/tuned -n openshift-cluster-node-tuning-operator
- oc get pods -n openshift-cluster-node-tuning-operator
- Verifiy the pods are now in Running state

Comment 7 Anping Li 2020-01-20 10:32:34 UTC
Hit this issue in again after upgrade from 4.2.15 to 4.3.0-rc.3.

Comment 8 Junqi Zhao 2020-01-20 10:35:59 UTC
(In reply to Anping Li from comment #7)
> Hit this issue in again after upgrade from 4.2.15 to 4.3.0-rc.3.

it is upgraded from 4.2.16 to 4.3.0-rc.3

Comment 10 jmencak 2020-03-03 13:22:33 UTC
*** Bug 1793714 has been marked as a duplicate of this bug. ***

Comment 12 wvoesch 2020-06-08 11:50:02 UTC
I have seen this issue too. 

After updating from 4.3.0-0.nightly-s390x-2020-06-02-081204 to 4.4.0-0.nightly-s390x-2020-06-01-021037 tuned started. However after one restart of the cluster the tuned pods did not get started. 

Your workaround resolved the issue.

Restarting of the cluster was successful.

Comment 13 jmencak 2020-06-08 12:45:30 UTC
Any logs from the operator/tuned pods we could take a look at?

Comment 14 wvoesch 2020-06-08 14:22:35 UTC
Unfortunately there are no logs available, only some eventlogs from a tuned-pod that is not spinning up. I attach them below, hoping that they will help at least a little. 

Unfortunately again, I already had to reinstall the cluster to a different version. 


Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
a few seconds ago
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system lib-modules host]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
2 minutes ago
Generated from kubelet on master-0.domain
15 times in the last 16 minutes
MountVolume.SetUp failed for volume "var-lib-tuned-profiles-data" : stat /var/lib/kubelet/pods/0cc325bd-2434-4fbf-a288-2af9fed667c3/volumes/kubernetes.io~configmap/var-lib-tuned-profiles-data: no such file or directory

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
2 minutes ago
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[host tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system lib-modules]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
5 minutes ago
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system lib-modules host tuned-token-jdnc7]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
7 minutes ago
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[sys var-run-dbus run-systemd-system lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data etc]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
9 minutes ago
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus run-systemd-system]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
Jun 5, 3:11 pm
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[etc sys var-run-dbus run-systemd-system lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data]: timed out waiting for the condition

Pod P tuned-5zjpq Namespace NS openshift-cluster-node-tuning-operator
Jun 5, 3:09 pm
Generated from kubelet on master-0.domain
Unable to attach or mount volumes: unmounted volumes=[var-lib-tuned-profiles-data], unattached volumes=[run-systemd-system lib-modules host tuned-token-jdnc7 var-lib-tuned-profiles-data etc sys var-run-dbus]: timed out waiting for the condition


Note You need to log in before you can comment on or make changes to this bug.