Bug 1834693 - The tuned pod on RHEL worker couldn't set max virtual memory areas vm.max_map_count for elasticsearch pod
Summary: The tuned pod on RHEL worker couldn't set max virtual memory areas vm.max_map...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Jiří Mencák
QA Contact: Simon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-12 09:10 UTC by Qiaoling Tang
Modified: 2020-07-13 17:37 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:37:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 132 0 None closed Bug 1834693: Fix issues with profile application on non-RHCOS platforms with host tuned. 2020-06-24 02:20:47 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:37:53 UTC

Description Qiaoling Tang 2020-05-12 09:10:46 UTC
Description of problem:

The tuned pod on RHEL worker couldn't set max virtual memory areas vm.max_map_count for elasticsearch pod:

[2020-05-12T08:58:56,086][INFO ][o.e.b.BootstrapChecks    ] [elasticsearch-cdm-st2q5ar0-1] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2020-05-12T08:58:56,097][INFO ][o.e.n.Node               ] [elasticsearch-cdm-st2q5ar0-1] stopping ...
[2020-05-12T08:58:56,111][INFO ][c.a.o.s.a.s.SinkProvider ] [elasticsearch-cdm-st2q5ar0-1] Closing DebugSink
[2020-05-12T08:58:56,132][INFO ][o.e.n.Node               ] [elasticsearch-cdm-st2q5ar0-1] stopped
[2020-05-12T08:58:56,132][INFO ][o.e.n.Node               ] [elasticsearch-cdm-st2q5ar0-1] closing ...
[2020-05-12T08:58:56,154][INFO ][o.e.n.Node               ] [elasticsearch-cdm-st2q5ar0-1] closed

$ oc get pod -owide
NAME                                            READY   STATUS    RESTARTS   AGE    IP            NODE                               NOMINATED NODE   READINESS GATES
cluster-logging-operator-6b5f95b55f-nwv7m       1/1     Running   0          69m    10.130.2.5    qeci-bmt45-fkpxh-rhelx-2           <none>           <none>
elasticsearch-cdm-st2q5ar0-1-84f489b5b4-nxmdp   1/2     Error     3          110s   10.129.2.17   qeci-bmt45-fkpxh-rhelx-0           <none>           <none>
elasticsearch-cdm-st2q5ar0-2-6456f7d56-j2s5v    1/2     Error     3          110s   10.130.2.11   qeci-bmt45-fkpxh-rhelx-2           <none>           <none>
elasticsearch-cdm-st2q5ar0-3-8445fc9ccb-kk74f   0/2     Pending   0          110s   <none>        <none>                             <none>           <none>
fluentd-42vz2                                   1/1     Running   0          68m    10.128.0.6    qeci-bmt45-fkpxh-control-plane-0   <none>           <none>
fluentd-fj7n4                                   1/1     Running   0          68m    10.128.2.14   qeci-bmt45-fkpxh-rhelx-1           <none>           <none>
fluentd-gth4j                                   1/1     Running   0          68m    10.130.2.8    qeci-bmt45-fkpxh-rhelx-2           <none>           <none>
fluentd-s5mcq                                   1/1     Running   0          68m    10.130.0.49   qeci-bmt45-fkpxh-control-plane-1   <none>           <none>
fluentd-xg5j2                                   1/1     Running   0          68m    10.129.2.14   qeci-bmt45-fkpxh-rhelx-0           <none>           <none>
fluentd-zxclq                                   1/1     Running   0          68m    10.129.0.44   qeci-bmt45-fkpxh-control-plane-2   <none>           <none>
kibana-65bc4bdb89-smgfj                         2/2     Running   0          68m    10.130.2.7    qeci-bmt45-fkpxh-rhelx-2           <none>           <none>

$ oc get pod -n openshift-cluster-node-tuning-operator -owide
NAME                                            READY   STATUS    RESTARTS   AGE     IP            NODE                               NOMINATED NODE   READINESS GATES
cluster-node-tuning-operator-678789c7d8-jth4x   1/1     Running   0          4h16m   10.130.0.3    qeci-bmt45-fkpxh-control-plane-1   <none>           <none>
tuned-2ws95                                     1/1     Running   1          4h12m   10.0.99.39    qeci-bmt45-fkpxh-control-plane-1   <none>           <none>
tuned-76x6c                                     1/1     Running   0          4h12m   10.0.97.235   qeci-bmt45-fkpxh-control-plane-0   <none>           <none>
tuned-hs6vx                                     1/1     Running   0          5m15s   10.0.99.83    qeci-bmt45-fkpxh-rhelx-0           <none>           <none>
tuned-mtqv2                                     1/1     Running   0          4h12m   10.0.97.247   qeci-bmt45-fkpxh-control-plane-2   <none>           <none>
tuned-p7v2f                                     1/1     Running   0          5m7s    10.0.98.239   qeci-bmt45-fkpxh-rhelx-2           <none>           <none>
tuned-rxwnf                                     1/1     Running   0          5m16s   10.0.99.91    qeci-bmt45-fkpxh-rhelx-1           <none>           <none>


$ oc logs -n openshift-cluster-node-tuning-operator  tuned-hs6vx
I0512 08:56:03.417919   21802 tuned.go:706] started events processor
I0512 08:56:03.418197   21802 tuned.go:271] extracting tuned profiles
I0512 08:56:03.420100   21802 tuned.go:749] started controller
I0512 08:56:03.808869   21802 tuned.go:349] written "/etc/tuned/recommend.d/50-openshift.conf" to set tuned profile openshift-node
I0512 08:56:04.804789   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:04.804836   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:05.775733   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:05.775769   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:06.777882   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:06.777916   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:07.780416   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:07.780465   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:08.771657   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:08.771704   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:09.789677   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:09.789725   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:10.767895   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:10.767935   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:11.767774   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:11.768257   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:12.763399   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:12.763442   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:13.775537   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:13.776125   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:14.802621   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:14.802660   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:15.802654   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:15.802688   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?
I0512 08:56:16.769580   21802 tuned.go:519] active profile () != recommended profile (virtual-guest)
E0512 08:56:16.769618   21802 tuned.go:525] tuned profile directory "/etc/tuned/virtual-guest" does not exist; was "virtual-guest" defined?

Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-05-11-225223   True        False         3h46m   Cluster version is 4.5.0-0.nightly-2020-05-11-225223


How reproducible:
Always

Steps to Reproduce:
1. launch 4.5 cluster with RHEL worker
2. try to deploy logging on this cluster
3.

Actual results:


Expected results:


Additional info:
I checked a cluster with rhcos worker, it didn't have this issue.

Comment 1 Jiří Mencák 2020-05-12 11:04:49 UTC
Can you check on the host that tuned is *not* running there?

Comment 2 Jiří Mencák 2020-05-12 13:57:34 UTC
If tuned daemon is running on the host as a service, a workaround until this is addressed is to disable the tuned daemon on the host.  E.g.:

Starting pod/qitang-debug-hrnvb-rhel-1-debug ...
To use host binaries, run `chroot /host`
chroot /hostPod IP: 10.0.98.166
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.2# systemctl status tuned
â tuned.service - Dynamic System Tuning Daemon
   Loaded: loaded (/usr/lib/systemd/system/tuned.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-05-12 07:40:49 EDT; 2h 11min ago
     Docs: man:tuned(8)
           man:tuned.conf(5)
           man:tuned-adm(8)
 Main PID: 1329 (tuned)
   Memory: 17.3M
   CGroup: /system.slice/tuned.service
           ââ1329 /usr/bin/python2 -Es /usr/sbin/tuned -l -P

May 12 07:40:48 qitang-debug-hrnvb-rhel-1 systemd[1]: Starting Dynamic System Tuning Daemon...
May 12 07:40:49 qitang-debug-hrnvb-rhel-1 systemd[1]: Started Dynamic System Tuning Daemon.
sh-4.2# systemctl disable tuned --now
Removed symlink /etc/systemd/system/multi-user.target.wants/tuned.service.
sh-4.2#

Comment 10 errata-xmlrpc 2020-07-13 17:37:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.