Description of problem: Deploy logging, the ES pods couldn't start, check the logs: [2019-08-09T03:19:46,980][ERROR][o.e.b.Bootstrap ] [elasticsearch-cdm-bhiy12w8-1] node validation exception [1] bootstrap checks failed [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] [2019-08-09T03:19:47,471][INFO ][o.e.n.Node ] [elasticsearch-cdm-bhiy12w8-1] stopping ... [2019-08-09T03:19:47,476][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-cdm-bhiy12w8-1] [gc][2] overhead, spent [312ms] collecting in the last [1s] [2019-08-09T03:19:47,581][INFO ][i.f.e.p.OpenShiftElasticSearchService] Stopping the ACL expiration thread... [2019-08-09T03:19:47,582][INFO ][o.e.n.Node ] [elasticsearch-cdm-bhiy12w8-1] stopped [2019-08-09T03:19:47,582][INFO ][o.e.n.Node ] [elasticsearch-cdm-bhiy12w8-1] closing ... [2019-08-09T03:19:47,672][INFO ][i.f.e.p.OpenShiftElasticSearchService] Stopping the ACL expiration thread... [2019-08-09T03:19:47,680][INFO ][o.e.n.Node ] [elasticsearch-cdm-bhiy12w8-1] closed [2019-08-09T03:20:07,578][INFO ][o.e.n.Node ] [elasticsearch-cdm-bhiy12w8-1] initializing ... Then check the tuned pod logs in openshift-cluster-node-tuning-operator namespace, it shows some error: $oc logs -n openshift-cluster-node-tuning-operator tuned-h67bj I0809 03:18:01.052466 4456 openshift-tuned.go:424] Pod (openshift-logging/elasticsearch-cdm-bhiy12w8-1-5b995b86cc-75prz) labels changed node wide: true I0809 03:18:04.928069 4456 openshift-tuned.go:282] Dumping labels to /var/lib/tuned/ocp-pod-labels.cfg I0809 03:18:04.929748 4456 openshift-tuned.go:315] Getting recommended profile... I0809 03:18:05.066743 4456 openshift-tuned.go:509] Active profile (openshift-node) != recommended profile (openshift-node-es) I0809 03:18:05.066774 4456 openshift-tuned.go:215] Reloading tuned... 2019-08-09 03:18:05,246 INFO tuned.daemon.application: dynamic tuning is globally disabled 2019-08-09 03:18:05,250 INFO tuned.daemon.daemon: using sleep interval of 1 second(s) 2019-08-09 03:18:05,251 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2019-08-09 03:18:05,252 INFO tuned.daemon.daemon: Using 'openshift-node-es' profile 2019-08-09 03:18:05,253 INFO tuned.profiles.loader: loading profile: openshift-node-es 2019-08-09 03:18:05,290 WARNING tuned.daemon.application: Using one shot no deamon mode, most of the functionality will be not available, it can be changed in global config 2019-08-09 03:18:05,290 INFO tuned.daemon.controller: starting controller 2019-08-09 03:18:05,290 INFO tuned.daemon.daemon: starting tuning 2019-08-09 03:18:05,297 INFO tuned.plugins.base: instance cpu: assigning devices cpu0, cpu1 2019-08-09 03:18:05,298 INFO tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform 2019-08-09 03:18:05,300 INFO tuned.daemon.controller: terminating controller 2019-08-09 03:18:05,301 INFO tuned.daemon.daemon: stopping tuning 2019-08-09 03:18:05,301 WARNING tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias 2019-08-09 03:18:05,303 INFO tuned.plugins.base: instance disk: assigning devices xvdba, xvda 2019-08-09 03:18:05,305 INFO tuned.plugins.base: instance net: assigning devices ens3 2019-08-09 03:18:05,534 ERROR tuned.units.manager: BUG: Unhandled exception in start_tuning: expected a character buffer object 2019-08-09 03:18:05,534 ERROR tuned.units.manager: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tuned/units/manager.py", line 88, in _try_call return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/tuned/plugins/instance/instance.py", line 78, in apply_tuning self._plugin.instance_apply_tuning(self) File "/usr/lib/python2.7/site-packages/tuned/plugins/base.py", line 261, in instance_apply_tuning self._instance_apply_static(instance) File "/usr/lib/python2.7/site-packages/tuned/plugins/plugin_sysctl.py", line 62, in _instance_apply_static _write_sysctl(option, new_value) File "/usr/lib/python2.7/site-packages/tuned/plugins/plugin_sysctl.py", line 174, in _write_sysctl f.write(value) TypeError: expected a character buffer object 2019-08-09 03:18:05,535 INFO tuned.daemon.daemon: static tuning from profile 'openshift-node-es' applied 2019-08-09 03:18:05,560 INFO tuned.daemon.daemon: terminating Tuned in one-shot mode The ES pods have labled to tuned.openshift.io/elasticsearch: "true" $ oc get pods --selector component=elasticsearch -o jsonpath='{..labels}' map[cluster-name:elasticsearch component:elasticsearch es-node-client:true es-node-data:true es-node-master:true node-name:elasticsearch-cdm-bhiy12w8-1 pod-template-hash:5b995b86cc tuned.openshift.io/elasticsearch:true] map[cluster-name:elasticsearch component:elasticsearch es-node-client:true es-node-data:true es-node-master:true node-name:elasticsearch-cdm-bhiy12w8-2 pod-template-hash:77bb8df5db tuned.openshift.io/elasticsearch:true] Version-Release number of selected component (if applicable): 4.1.0-0.nightly-2019-08-06-212225 4.1.0-0.nightly-2019-08-07-190748 How reproducible: Always Steps to Reproduce: 1.Deploy logging 2. 3. Actual results: The ES pod could not start due to the max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] Expected results: The ES pod should be able to start Additional info:
This bug is blocking Logging. With this bug, the ES pods could not start, and Logging stack could not work.
This is very likely a bug associated with the recent switch of the base image for containerized tuned from RHEL 7.6 to 7.7. RHEL 7.6 ships tuned 2.10, RHEL 7.7 tuned 2.11. Investigating further.
Linking an associated tuned 2.11 BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1739418 This will also likely affect the Node Tuning Operator functionality in OCP 4.2 with a switch to RHEL 7.7 container image.
the workarount is to set vm.max_map_count=262144 by hack into the node where the ES pod may located.
The same version tuned(tuned-2.11.0-5.el7.noarch based on 7.7) works on 4.2. 4.2 image -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1e4a2551fbabc2082bf64feaecad2ba69ad791f62e3413a8468f204b4b3b2422 4.1.10 image -> quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1e4a2551fbabc2082bf64feaecad2ba69ad791f62e3413a8468f204b4b3b2422
Test blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1741753
Verified in v4.1.12
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2547