Bug 1557290 - Cannot allocate memory when redeploy logging
Summary: Cannot allocate memory when redeploy logging
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.9.z
Assignee: Michael Gugino
QA Contact: Anping Li
URL:
Whiteboard:
Depends On: 1497421
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-16 11:21 UTC by Anping Li
Modified: 2018-06-06 15:47 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1497421
Environment:
Last Closed: 2018-06-06 15:46:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1558672 0 unspecified CLOSED openshift ansible node scaleup fails - Cannot allocate memory 2023-09-14 04:25:46 UTC
Red Hat Product Errata RHBA-2018:1796 0 None None None 2018-06-06 15:47:03 UTC

Internal Links: 1558672

Comment 1 Anping Li 2018-03-16 11:24:01 UTC
Clone bug to trace this isuse in v3.9.

Cannot allocate memory when redeploy logging with openshift3/ose-ansible/images/v3.9.11-1

Ansible Host
free -h
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.6G        5.2G        121M        893M        5.6G

Comment 2 Anping Li 2018-03-16 12:13:20 UTC
After add 8Gi swapfile, the deploy succeed.  16G memory are enough to deploy logging with both ops and eventrouter enabled. 

[OSEv3:vars]
openshift_logging_install_eventrouter=true
openshift_logging_elasticsearch_kibana_index_mode=shared_ops
openshift_logging_es_ops_pvc_dynamic=true
openshift_logging_es_ops_memory_limit=2Gi
openshift_logging_use_ops=true
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_cluster_size=1
openshift_logging_namespace=logging
openshift_logging_image_prefix=registry.example.com/openshift3/
openshift_logging_install_logging=true


[masters]
master1.example.com
[nodes]
node1.example.com
node2.example.com
node3.example.com
node4.example.com

Comment 3 Anping Li 2018-03-27 05:37:33 UTC
The issue still exist with openshift3/ose-ansible/images/v3.9.14-3


TASK [openshift_logging_curator : Generate Curator deploymentconfig] ***********
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: OSError: [Errno 12] Cannot allocate memory
fatal: [host-8-246-167.host.centralci.eng.rdu2.redhat.com]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}

RUNNING HANDLER [openshift_logging_elasticsearch : Restarting logging-{{ _cluster_component }} cluster] ***

Comment 6 Scott Dodson 2018-04-16 20:58:00 UTC
There's suspicion that this has been linked to a python garbage collection bug[1] that was fixed in 7.5, anyone hitting this we'd be interested to see if updating to python-2.7.5-68.el7.x86_64 fixes the problem.

1 - https://bugzilla.redhat.com/show_bug.cgi?id=1468410

Comment 7 Nicolas Nosenzo 2018-04-25 07:58:15 UTC
Hitting this as well:

Env:
ansible-2.4.3.0-1.el7ae.noarch
python-2.7.5-68.el7.x86_64
openshift-ansible-3.9.14-1.git.3.c62bc34.el7.noarch
Red Hat Enterprise Linux Server release 7.5 
HW:
  Memory:
    Total: 8192 MiB (8 GiB)


        ....
        TASK [openshift_logging_fluentd : Label example.node for Fluentd deployment] ************************************************************************
        task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_fluentd/tasks/label_and_wait.yaml:2
        ERROR! Unexpected Exception, this is probably a bug: [Errno 12] Cannot allocate memory
        the full traceback was:

        Traceback (most recent call last):
          File "/usr/bin/ansible-playbook", line 106, in <module>
            exit_code = cli.run()
          File "/usr/lib/python2.7/site-packages/ansible/cli/playbook.py", line 122, in run
            results = pbex.run()
          File "/usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py", line 154, in run
            result = self._tqm.run(play=play)
          File "/usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py", line 290, in run
            play_return = strategy.run(iterator, play_context)
          File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/linear.py", line 277, in run
            self._queue_task(host, task, task_vars, play_context)
          File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/__init__.py", line 254, in _queue_task
            worker_prc.start()
          File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
            self._popen = Popen(self)
          File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__
            self.pid = os.fork()
        OSError: [Errno 12] Cannot allocate memory


Variables used:

# Aggregated logging
openshift_logging_image_prefix=openshift3/
openshift_logging_image_version=v3.9
# No separate ops logging
openshift_logging_use_ops=False
openshift_logging_install_eventrouter=True
# run all logging on infra nodes
openshift_logging_curator_nodeselector={'region': 'infra', 'component': 'logging'}
openshift_logging_es_nodeselector={'region': 'infra', 'component': 'logging'}
openshift_logging_kibana_nodeselector={'region': 'infra', 'component': 'logging'}
openshift_logging_eventrouter_nodeselector={'region': 'infra', 'component': 'logging'}
openshift_logging_es_cluster_size=4
openshift_logging_es_number_of_replicas=1
openshift_logging_es_allows_cluster_reader=True
openshift_logging_es_pv_selector={'logging-infra':'es'}
openshift_logging_es_pvc_size=200G
openshift_logging_es_pvc_storage_class_name=""
openshift_logging_es_cpu_request=400m
openshift_logging_fluentd_cpu_request=50m
openshift_logging_kibana_cpu_request=50m
openshift_logging_kibana_proxy_cpu_request=50m
openshift_logging_curator_cpu_request=50m
openshift_logging_eventrouter_cpu_request=50m
openshift_logging_es_memory_limit=12Gi
openshift_logging_curator_run_timezone=Europe/Oslo
openshift_logging_curator_default_days=90

Comment 10 Michael Gugino 2018-04-26 17:34:12 UTC
PR created against master: https://github.com/openshift/openshift-ansible/pull/8165

Comment 11 Michael Gugino 2018-05-01 19:27:00 UTC
PR For 3.9: https://github.com/openshift/openshift-ansible/pull/8210

Comment 13 Anping Li 2018-05-31 09:07:35 UTC
The redeploy succeed without memory limit error with openshift-ansible:v3.9.40. So move bug to verified.

Comment 15 errata-xmlrpc 2018-06-06 15:46:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1796


Note You need to log in before you can comment on or make changes to this bug.