Bug 1557290

Summary:	Cannot allocate memory when redeploy logging
Product:	OpenShift Container Platform	Reporter:	Anping Li <anli>
Component:	Installer	Assignee:	Michael Gugino <mgugino>
Status:	CLOSED ERRATA	QA Contact:	Anping Li <anli>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	3.9.0	CC:	aos-bugs, boris.ruppert, dzhukous, eminguez, jokerman, mmccomas, nnosenzo, rmeggins, sdodson, wsun, zisis.lianas
Target Milestone:	---
Target Release:	3.9.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1497421	Environment:
Last Closed:	2018-06-06 15:46:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1497421
Bug Blocks:

Comment 1 Anping Li 2018-03-16 11:24:01 UTC

Clone bug to trace this isuse in v3.9.

Cannot allocate memory when redeploy logging with openshift3/ose-ansible/images/v3.9.11-1

Ansible Host
free -h
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.6G        5.2G        121M        893M        5.6G

Comment 2 Anping Li 2018-03-16 12:13:20 UTC

After add 8Gi swapfile, the deploy succeed.  16G memory are enough to deploy logging with both ops and eventrouter enabled. 

[OSEv3:vars]
openshift_logging_install_eventrouter=true
openshift_logging_elasticsearch_kibana_index_mode=shared_ops
openshift_logging_es_ops_pvc_dynamic=true
openshift_logging_es_ops_memory_limit=2Gi
openshift_logging_use_ops=true
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_cluster_size=1
openshift_logging_namespace=logging
openshift_logging_image_prefix=registry.example.com/openshift3/
openshift_logging_install_logging=true


[masters]
master1.example.com
[nodes]
node1.example.com
node2.example.com
node3.example.com
node4.example.com

Comment 3 Anping Li 2018-03-27 05:37:33 UTC

The issue still exist with openshift3/ose-ansible/images/v3.9.14-3


TASK [openshift_logging_curator : Generate Curator deploymentconfig] ***********
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: OSError: [Errno 12] Cannot allocate memory
fatal: [host-8-246-167.host.centralci.eng.rdu2.redhat.com]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}

RUNNING HANDLER [openshift_logging_elasticsearch : Restarting logging-{{ _cluster_component }} cluster] ***

Comment 6 Scott Dodson 2018-04-16 20:58:00 UTC

There's suspicion that this has been linked to a python garbage collection bug[1] that was fixed in 7.5, anyone hitting this we'd be interested to see if updating to python-2.7.5-68.el7.x86_64 fixes the problem.

1 - https://bugzilla.redhat.com/show_bug.cgi?id=1468410

Comment 7 Nicolas Nosenzo 2018-04-25 07:58:15 UTC

Hitting this as well:

Env:
ansible-2.4.3.0-1.el7ae.noarch
python-2.7.5-68.el7.x86_64
openshift-ansible-3.9.14-1.git.3.c62bc34.el7.noarch
Red Hat Enterprise Linux Server release 7.5 
HW:
  Memory:
    Total: 8192 MiB (8 GiB)


        ....
        TASK [openshift_logging_fluentd : Label example.node for Fluentd deployment] ************************************************************************
        task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_fluentd/tasks/label_and_wait.yaml:2
        ERROR! Unexpected Exception, this is probably a bug: [Errno 12] Cannot allocate memory
        the full traceback was:

        Traceback (most recent call last):
          File "/usr/bin/ansible-playbook", line 106, in <module>
            exit_code = cli.run()
          File "/usr/lib/python2.7/site-packages/ansible/cli/playbook.py", line 122, in run
            results = pbex.run()
          File "/usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py", line 154, in run
            result = self._tqm.run(play=play)
          File "/usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py", line 290, in run
            play_return = strategy.run(iterator, play_context)
          File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/linear.py", line 277, in run
            self._queue_task(host, task, task_vars, play_context)
          File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/__init__.py", line 254, in _queue_task
            worker_prc.start()
          File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
            self._popen = Popen(self)
          File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__
            self.pid = os.fork()
        OSError: [Errno 12] Cannot allocate memory


Variables used:

# Aggregated logging
openshift_logging_image_prefix=openshift3/
openshift_logging_image_version=v3.9
# No separate ops logging
openshift_logging_use_ops=False
openshift_logging_install_eventrouter=True
# run all logging on infra nodes
openshift_logging_curator_nodeselector={'region': 'infra', 'component': 'logging'}
openshift_logging_es_nodeselector={'region': 'infra', 'component': 'logging'}
openshift_logging_kibana_nodeselector={'region': 'infra', 'component': 'logging'}
openshift_logging_eventrouter_nodeselector={'region': 'infra', 'component': 'logging'}
openshift_logging_es_cluster_size=4
openshift_logging_es_number_of_replicas=1
openshift_logging_es_allows_cluster_reader=True
openshift_logging_es_pv_selector={'logging-infra':'es'}
openshift_logging_es_pvc_size=200G
openshift_logging_es_pvc_storage_class_name=""
openshift_logging_es_cpu_request=400m
openshift_logging_fluentd_cpu_request=50m
openshift_logging_kibana_cpu_request=50m
openshift_logging_kibana_proxy_cpu_request=50m
openshift_logging_curator_cpu_request=50m
openshift_logging_eventrouter_cpu_request=50m
openshift_logging_es_memory_limit=12Gi
openshift_logging_curator_run_timezone=Europe/Oslo
openshift_logging_curator_default_days=90

Comment 10 Michael Gugino 2018-04-26 17:34:12 UTC

PR created against master: https://github.com/openshift/openshift-ansible/pull/8165

Comment 11 Michael Gugino 2018-05-01 19:27:00 UTC

PR For 3.9: https://github.com/openshift/openshift-ansible/pull/8210

Comment 13 Anping Li 2018-05-31 09:07:35 UTC

The redeploy succeed without memory limit error with openshift-ansible:v3.9.40. So move bug to verified.

Comment 15 errata-xmlrpc 2018-06-06 15:46:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1796