1739322 – The tuned pod doesn't set max virtual memory areas vm.max_map_count for elasticsearch pod in node.

Bug 1739322 - The tuned pod doesn't set max virtual memory areas vm.max_map_count for elasticsearch pod in node.

Summary: The tuned pod doesn't set max virtual memory areas vm.max_map_count for elast...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node Tuning Operator
Sub Component:
Version:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	4.1.z
Assignee:	Jiří Mencák
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:	4.1.12
Depends On:
Blocks:	1740558 1791997
TreeView+	depends on / blocked

Reported:	2019-08-09 03:39 UTC by Qiaoling Tang
Modified:	2023-10-06 18:28 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1740558 (view as bug list)
Environment:
Last Closed:	2019-08-28 19:54:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift openshift-tuned pull 22	0	'None'	closed	Bug 1739322: Fix plugin_sysctl for tuned 2.11	2021-01-29 07:29:45 UTC
Red Hat Product Errata	RHBA-2019:2547	0	None	None	None	2019-08-28 19:54:59 UTC

Internal Links: 1739418 1739563

Description Qiaoling Tang 2019-08-09 03:39:51 UTC

Description of problem:
Deploy logging, the ES pods couldn't start, check the logs:
[2019-08-09T03:19:46,980][ERROR][o.e.b.Bootstrap          ] [elasticsearch-cdm-bhiy12w8-1] node validation exception
[1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2019-08-09T03:19:47,471][INFO ][o.e.n.Node               ] [elasticsearch-cdm-bhiy12w8-1] stopping ...
[2019-08-09T03:19:47,476][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-cdm-bhiy12w8-1] [gc][2] overhead, spent [312ms] collecting in the last [1s]
[2019-08-09T03:19:47,581][INFO ][i.f.e.p.OpenShiftElasticSearchService] Stopping the ACL expiration thread...
[2019-08-09T03:19:47,582][INFO ][o.e.n.Node               ] [elasticsearch-cdm-bhiy12w8-1] stopped
[2019-08-09T03:19:47,582][INFO ][o.e.n.Node               ] [elasticsearch-cdm-bhiy12w8-1] closing ...
[2019-08-09T03:19:47,672][INFO ][i.f.e.p.OpenShiftElasticSearchService] Stopping the ACL expiration thread...
[2019-08-09T03:19:47,680][INFO ][o.e.n.Node               ] [elasticsearch-cdm-bhiy12w8-1] closed
[2019-08-09T03:20:07,578][INFO ][o.e.n.Node               ] [elasticsearch-cdm-bhiy12w8-1] initializing ...

Then check the tuned pod logs in openshift-cluster-node-tuning-operator namespace, it shows some error:
$oc logs -n openshift-cluster-node-tuning-operator tuned-h67bj
I0809 03:18:01.052466    4456 openshift-tuned.go:424] Pod (openshift-logging/elasticsearch-cdm-bhiy12w8-1-5b995b86cc-75prz) labels changed node wide: true
I0809 03:18:04.928069    4456 openshift-tuned.go:282] Dumping labels to /var/lib/tuned/ocp-pod-labels.cfg
I0809 03:18:04.929748    4456 openshift-tuned.go:315] Getting recommended profile...
I0809 03:18:05.066743    4456 openshift-tuned.go:509] Active profile (openshift-node) != recommended profile (openshift-node-es)
I0809 03:18:05.066774    4456 openshift-tuned.go:215] Reloading tuned...
2019-08-09 03:18:05,246 INFO     tuned.daemon.application: dynamic tuning is globally disabled
2019-08-09 03:18:05,250 INFO     tuned.daemon.daemon: using sleep interval of 1 second(s)
2019-08-09 03:18:05,251 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2019-08-09 03:18:05,252 INFO     tuned.daemon.daemon: Using 'openshift-node-es' profile
2019-08-09 03:18:05,253 INFO     tuned.profiles.loader: loading profile: openshift-node-es
2019-08-09 03:18:05,290 WARNING  tuned.daemon.application: Using one shot no deamon mode, most of the functionality will be not available, it can be changed in global config
2019-08-09 03:18:05,290 INFO     tuned.daemon.controller: starting controller
2019-08-09 03:18:05,290 INFO     tuned.daemon.daemon: starting tuning
2019-08-09 03:18:05,297 INFO     tuned.plugins.base: instance cpu: assigning devices cpu0, cpu1
2019-08-09 03:18:05,298 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2019-08-09 03:18:05,300 INFO     tuned.daemon.controller: terminating controller
2019-08-09 03:18:05,301 INFO     tuned.daemon.daemon: stopping tuning
2019-08-09 03:18:05,301 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2019-08-09 03:18:05,303 INFO     tuned.plugins.base: instance disk: assigning devices xvdba, xvda
2019-08-09 03:18:05,305 INFO     tuned.plugins.base: instance net: assigning devices ens3
2019-08-09 03:18:05,534 ERROR    tuned.units.manager: BUG: Unhandled exception in start_tuning: expected a character buffer object
2019-08-09 03:18:05,534 ERROR    tuned.units.manager: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tuned/units/manager.py", line 88, in _try_call
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/tuned/plugins/instance/instance.py", line 78, in apply_tuning
    self._plugin.instance_apply_tuning(self)
  File "/usr/lib/python2.7/site-packages/tuned/plugins/base.py", line 261, in instance_apply_tuning
    self._instance_apply_static(instance)
  File "/usr/lib/python2.7/site-packages/tuned/plugins/plugin_sysctl.py", line 62, in _instance_apply_static
    _write_sysctl(option, new_value)
  File "/usr/lib/python2.7/site-packages/tuned/plugins/plugin_sysctl.py", line 174, in _write_sysctl
    f.write(value)
TypeError: expected a character buffer object

2019-08-09 03:18:05,535 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-node-es' applied
2019-08-09 03:18:05,560 INFO     tuned.daemon.daemon: terminating Tuned in one-shot mode

The ES pods have labled to tuned.openshift.io/elasticsearch: "true"
$ oc get pods --selector component=elasticsearch  -o jsonpath='{..labels}'
map[cluster-name:elasticsearch component:elasticsearch es-node-client:true es-node-data:true es-node-master:true node-name:elasticsearch-cdm-bhiy12w8-1 pod-template-hash:5b995b86cc tuned.openshift.io/elasticsearch:true] map[cluster-name:elasticsearch component:elasticsearch es-node-client:true es-node-data:true es-node-master:true node-name:elasticsearch-cdm-bhiy12w8-2 pod-template-hash:77bb8df5db tuned.openshift.io/elasticsearch:true]

Version-Release number of selected component (if applicable):
4.1.0-0.nightly-2019-08-06-212225
4.1.0-0.nightly-2019-08-07-190748

How reproducible:
Always

Steps to Reproduce:
1.Deploy logging 
2.
3.

Actual results:
The ES pod could not start due to the max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

Expected results:
The ES pod should be able to start 

Additional info:

Comment 1 Qiaoling Tang 2019-08-09 04:08:36 UTC

This bug is blocking Logging. With this bug, the ES pods could not start, and Logging stack could not work.

Comment 2 Jiří Mencák 2019-08-09 07:03:51 UTC

This is very likely a bug associated with the recent switch of the base image for containerized tuned from RHEL 7.6 to 7.7.  RHEL 7.6 ships tuned 2.10, RHEL 7.7 tuned 2.11.  Investigating further.

Comment 3 Jiří Mencák 2019-08-09 09:33:19 UTC

Linking an associated tuned 2.11 BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1739418
This will also likely affect the Node Tuning Operator functionality in OCP 4.2 with a switch to RHEL 7.7 container image.

Comment 5 Anping Li 2019-08-10 02:43:55 UTC

the workarount is to set vm.max_map_count=262144 by hack into the node where the ES pod may located.

Comment 6 Anping Li 2019-08-12 09:50:21 UTC

The same version tuned(tuned-2.11.0-5.el7.noarch based on 7.7) works on 4.2.
 
4.2    image -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1e4a2551fbabc2082bf64feaecad2ba69ad791f62e3413a8468f204b4b3b2422
4.1.10 image -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1e4a2551fbabc2082bf64feaecad2ba69ad791f62e3413a8468f204b4b3b2422

Comment 13 Anping Li 2019-08-16 09:38:32 UTC

Test blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1741753

Comment 15 Anping Li 2019-08-19 14:29:34 UTC

Verified in v4.1.12

Comment 18 errata-xmlrpc 2019-08-28 19:54:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2547

Note You need to log in before you can comment on or make changes to this bug.