Description of problem: Cannot allocate memory was reported when deploy logging. It seem the free memory is enough for tasks. On localhosts: # free -h total used free shared buff/cache available Mem: 3.7G 2.1G 984M 182M 675M 1.2G Swap: 0B 0B 0B On the first master: " total used free shared buff/cache available", "Mem: 7.1G 883M 1.5G 796K 4.8G 5.9G", "Swap: 0B 0B 0B" Version-Release number of the following components: openshift-ansible-3.7.0-0.134.0.git.0.6f43fc3.el7.noarch How reproducible: Always on one Env. but no such issue on the other Env Steps to Reproduce: 1. deploy loggging on OCP by playbook Actual results: TASK [openshift_logging : copy] *********************************************************************************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_certs.yaml:106 skipping: [ec2-54-85-72-229.compute-1.amazonaws.com] => { "changed": false, "skip_reason": "Conditional result was False", "skipped": true } TASK [openshift_logging : Generate PEM certs] ********************************************************************************************************************************************************************* task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_certs.yaml:115 included: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_pems.yaml for ec2-54-85-72-229.compute-1.amazonaws.com included: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_pems.yaml for ec2-54-85-72-229.compute-1.amazonaws.com included: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_pems.yaml for ec2-54-85-72-229.compute-1.amazonaws.com included: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_pems.yaml for ec2-54-85-72-229.compute-1.amazonaws.com TASK [openshift_logging : Checking for system.logging.fluentd.key] ************************************************************************************************************************************************ task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/generate_pems.yaml:3 ERROR! Unexpected Exception, this is probably a bug: [Errno 12] Cannot allocate memory the full traceback was: Traceback (most recent call last): File "/usr/bin/ansible-playbook", line 106, in <module> exit_code = cli.run() File "/usr/lib/python2.7/site-packages/ansible/cli/playbook.py", line 130, in run results = pbex.run() File "/usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py", line 154, in run result = self._tqm.run(play=play) File "/usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py", line 292, in run play_return = strategy.run(iterator, play_context) File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/linear.py", line 277, in run self._queue_task(host, task, task_vars, play_context) File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/__init__.py", line 222, in _queue_task worker_prc.start() File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory Expected results: Additional info:
Created attachment 1332643 [details] The memory allocate error
I guess the bug should be fixed in openshift-ansible:v3.7.25 and later.
The error appears again when install Openshift with logging by openshift-ansile-3.9.0-0.38.0.0 ansible slave: free -h total used free shared buff/cache available Mem: 3.9G 1.5G 1.5G 1.2M 853M 1.9G TASK [openshift_logging_fluentd : Generate logging-fluentd daemonset definition] *** Tuesday 06 February 2018 07:46:31 +0000 (0:00:02.093) 0:19:01.565 ****** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: OSError: [Errno 12] Cannot allocate memory fatal: [ec2-54-87-30-170.compute-1.amazonaws.com]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}
Isn't this related to ansible and not logging specifically?
we didn't hit this issue for the separate logging deploy.
@Scott, do we advocate using ansible 2.4.x with ose-ansible 3.7?
(In reply to Jeff Cantrill from comment #12) > @Scott, do we advocate using ansible 2.4.x with ose-ansible 3.7? Ansible 2.4 is not required until OCP 3.9 but should work. If downgrading to ansible-2.3.2 as shipped in the OCP channel fixes the problem then that's a perfectly valid workaround and we can lower priority based on having a confirmed workaround.
I had the same issue with the OCP 3.7 installer: openshift-ansible-3.7.23-1.git.0.bc406aa.el7.noarch $ ansible --version ansible 2.3.1.0 config file = /etc/ansible/ansible.cfg configured module search path = Default w/o overrides python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] Workaround was to increase the memory of the bastion/installer host (in my case from 2GB RAM to 8GB RAM).
The current release-3.7 code no longer has include_tasks calls which lead to this problem. Can you please test the latest 3.7.z?
We're not going to be able to apply the same work around as we did for the node role in this case. Logging role requires the use of dynamic imports due to the way that it's constructed. For now, the workaround for logging is to increase memory sufficiently.
I have been unsuccessful in replicating this with either the logging play (playbooks/openshift-logging/config.yml) or synthetically with contrived plays and tasks. With release-3.9 + RPM installed ansible 2.4.3 w/ RHEL localhost.
The logging can be installed with OCP. And redpeloyed. the memory are reduced with openshift3/ose-ansible/images/v3.7.40-1. So move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0636
Please have your customer downgrade to ansible-2.3.2 and let us know if that improves the situation. That's the version that ships in the OCP 3.7 channels and it's the preferred version for use with 3.7.
Also, if you find that doesn't resolve the issue please open a new bug. We don't re-open bugs that have an errata shipped for them.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days