Bug 1497421 - Cannot allocate memory when deploy logging on one Env
Summary: Cannot allocate memory when deploy logging on one Env
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.7.z
Assignee: Michael Gugino
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks: 1557290
TreeView+ depends on / blocked
 
Reported: 2017-09-30 09:34 UTC by Anping Li
Modified: 2023-09-14 04:09 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1557290 (view as bug list)
Environment:
Last Closed: 2018-05-04 12:44:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The memory allocate error (52.22 KB, application/x-gzip)
2017-09-30 09:35 UTC, Anping Li
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1558672 0 unspecified CLOSED openshift ansible node scaleup fails - Cannot allocate memory 2023-09-14 04:25:46 UTC
Red Hat Product Errata RHBA-2018:0636 0 None None None 2018-04-05 09:30:13 UTC

Internal Links: 1558672

Description Anping Li 2017-09-30 09:34:49 UTC
Description of problem:
Cannot allocate memory was reported when deploy logging. It seem the free memory  is enough for tasks.


On localhosts:
# free -h
              total        used        free      shared  buff/cache   available
Mem:           3.7G        2.1G        984M        182M        675M        1.2G
Swap:            0B          0B          0B

On the first master:
        "              total        used        free      shared  buff/cache   available", 
        "Mem:           7.1G        883M        1.5G        796K        4.8G        5.9G", 
        "Swap:            0B          0B          0B"


Version-Release number of the following components:
openshift-ansible-3.7.0-0.134.0.git.0.6f43fc3.el7.noarch

How reproducible:
Always on one Env.  but no such issue on the other Env


Steps to Reproduce:
1. deploy loggging on OCP by playbook

Actual results:
TASK [openshift_logging : copy] ***********************************************************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_certs.yaml:106
skipping: [ec2-54-85-72-229.compute-1.amazonaws.com] => {
    "changed": false, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}

TASK [openshift_logging : Generate PEM certs] *********************************************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_certs.yaml:115
included: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_pems.yaml for ec2-54-85-72-229.compute-1.amazonaws.com
included: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_pems.yaml for ec2-54-85-72-229.compute-1.amazonaws.com
included: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_pems.yaml for ec2-54-85-72-229.compute-1.amazonaws.com
included: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_pems.yaml for ec2-54-85-72-229.compute-1.amazonaws.com

TASK [openshift_logging : Checking for system.logging.fluentd.key] ************************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_pems.yaml:3
ERROR! Unexpected Exception, this is probably a bug: [Errno 12] Cannot allocate memory
the full traceback was:

Traceback (most recent call last):
  File "/usr/bin/ansible-playbook", line 106, in <module>
    exit_code = cli.run()
  File "/usr/lib/python2.7/site-packages/ansible/cli/playbook.py", line 130, in run
    results = pbex.run()
  File "/usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py", line 154, in run
    result = self._tqm.run(play=play)
  File "/usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py", line 292, in run
    play_return = strategy.run(iterator, play_context)
  File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/linear.py", line 277, in run
    self._queue_task(host, task, task_vars, play_context)
  File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/__init__.py", line 222, in _queue_task
    worker_prc.start()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory


Expected results:

Additional info:

Comment 1 Anping Li 2017-09-30 09:35:27 UTC
Created attachment 1332643 [details]
The memory allocate error

Comment 5 Anping Li 2018-02-04 01:17:52 UTC
I guess the bug should be fixed in openshift-ansible:v3.7.25 and later.

Comment 7 Anping Li 2018-02-06 08:36:33 UTC
The error appears again  when install Openshift with logging by openshift-ansile-3.9.0-0.38.0.0

ansible slave:
free -h
              total        used        free      shared  buff/cache   available
Mem:           3.9G        1.5G        1.5G        1.2M        853M        1.9G


TASK [openshift_logging_fluentd : Generate logging-fluentd daemonset definition] ***
Tuesday 06 February 2018  07:46:31 +0000 (0:00:02.093)       0:19:01.565 ****** 
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: OSError: [Errno 12] Cannot allocate memory
fatal: [ec2-54-87-30-170.compute-1.amazonaws.com]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}

Comment 8 Jeff Cantrill 2018-02-06 20:22:10 UTC
Isn't this related to ansible and not logging specifically?

Comment 9 Anping Li 2018-02-07 02:02:33 UTC
we didn't hit this issue for the separate logging deploy.

Comment 12 Jeff Cantrill 2018-02-09 15:55:06 UTC
@Scott,  do we advocate using ansible 2.4.x with ose-ansible 3.7?

Comment 13 Scott Dodson 2018-02-09 15:59:00 UTC
(In reply to Jeff Cantrill from comment #12)
> @Scott,  do we advocate using ansible 2.4.x with ose-ansible 3.7?

Ansible 2.4 is not required until OCP 3.9 but should work. If downgrading to ansible-2.3.2 as shipped in the OCP channel fixes the problem then that's a perfectly valid workaround and we can lower priority based on having a confirmed workaround.

Comment 14 Zisis Lianas 2018-03-02 13:13:25 UTC
I had the same issue with the OCP 3.7 installer:
openshift-ansible-3.7.23-1.git.0.bc406aa.el7.noarch

$ ansible --version
ansible 2.3.1.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]



Workaround was to increase the memory of the bastion/installer host (in my case from 2GB RAM to 8GB RAM).

Comment 15 Scott Dodson 2018-03-27 13:50:45 UTC
The current release-3.7 code no longer has include_tasks calls which lead to this problem. Can you please test the latest 3.7.z?

Comment 16 Michael Gugino 2018-03-27 14:02:40 UTC
We're not going to be able to apply the same work around as we did for the node role in this case.  Logging role requires the use of dynamic imports due to the way that it's constructed.

For now, the workaround for logging is to increase memory sufficiently.

Comment 17 Michael Gugino 2018-03-27 17:55:57 UTC
I have been unsuccessful in replicating this with either the logging play (playbooks/openshift-logging/config.yml) or synthetically with contrived plays and tasks.  With release-3.9 + RPM installed ansible 2.4.3 w/ RHEL localhost.

Comment 18 Anping Li 2018-03-28 05:13:08 UTC
The logging can be installed with OCP. And redpeloyed. the memory are reduced with openshift3/ose-ansible/images/v3.7.40-1.   So move to verified.

Comment 22 errata-xmlrpc 2018-04-05 09:29:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0636

Comment 25 Scott Dodson 2018-05-04 12:39:23 UTC
Please have your customer downgrade to ansible-2.3.2 and let us know if that improves the situation. That's the version that ships in the OCP 3.7 channels and it's the preferred version for use with 3.7.

Comment 26 Scott Dodson 2018-05-04 12:44:22 UTC
Also, if you find that doesn't resolve the issue please open a new bug. We don't re-open bugs that have an errata shipped for them.

Comment 27 Red Hat Bugzilla 2023-09-14 04:09:16 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.