1575063 – OSError: [Errno 12] Cannot allocate memory when deploy logging

Bug 1575063 - OSError: [Errno 12] Cannot allocate memory when deploy logging

Summary: OSError: [Errno 12] Cannot allocate memory when deploy logging

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.7.z
Assignee:	Scott Dodson
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1610420
TreeView+	depends on / blocked

Reported:	2018-05-04 16:29 UTC by mmariyan
Modified:	2018-10-25 12:08 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1610420 (view as bug list)
Environment:
Last Closed:	2018-07-31 14:59:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description mmariyan 2018-05-04 16:29:13 UTC

Description of problem:

Cannot allocate memory error when deploying logging, more over its same as the Bugzilla [0]. this bugfix available 3.7.42-1 but it was not resolved issue.

[0]https://bugzilla.redhat.com/show_bug.cgi?id=1497421

As per Engineering team suggested workaround the ansible 2.3.2 version also not resolved the issue.

Version-Release number of the following components:

openshift-ansible-3.7.42-1.git.2.9ee4e71.el7.noarch
openshift-ansible-playbooks-3.7.42-1.git.2.9ee4e71.el7.noarch
ansible-2.4.3.0-1.el7.noarch

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

TASK [openshift_logging_fluentd : include] ********************************************************************************************
included: /usr/share/ansible/openshift-ansible/roles/openshift_logging_fluentd/tasks/label_and_wait.yaml for xxxx
included: /usr/share/ansible/openshift-ansible/roles/openshift_logging_fluentd/tasks/label_and_wait.yaml for xxx
included: /usr/share/ansible/openshift-ansible/roles/openshift_logging_fluentd/tasks/label_and_wait.yaml for xxx
included: /usr/share/ansible/openshift-ansible/roles/openshift_logging_fluentd/tasks/label_and_wait.yaml for xxx
included: /usr/share/ansible/openshift-ansible/roles/openshift_logging_fluentd/tasks/label_and_wait.yaml for xxx

TASK [openshift_logging_fluentd : Label xxxxxxxxxx.xxx.xx Fluentd deployment] ************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: OSError: [Errno 12] Cannot allocate memory
fatal: [xxxx]: FAILED! => {"failed": true, "msg": "Unexpected failure during module execution.", "stdout": ""}

Expected results:

should get install without error

Comment 1 Scott Dodson 2018-05-04 16:42:56 UTC

Needs backport of https://github.com/openshift/openshift-ansible/pull/8165

Comment 2 Michael Gugino 2018-05-07 14:43:39 UTC

PR Created: https://github.com/openshift/openshift-ansible/pull/8284

Comment 4 Anping Li 2018-05-10 03:04:25 UTC

Still Cannot allocate memory, the fix is not in openshift-ansible-3.7.46-1.git.0.37f607e.el7.noarch.

Comment 6 Scott Dodson 2018-05-14 14:50:43 UTC

(In reply to Anping Li from comment #4)
> Still Cannot allocate memory, the fix is not in
> openshift-ansible-3.7.46-1.git.0.37f607e.el7.noarch.

The fix is only in openshift-ansible-3.7.47-1 and newer.

Comment 7 Anping Li 2018-05-22 07:56:07 UTC

The "Cannot allocate memory" reported in 'Generate Kibana DC template' this time when I redeploy logging. 

TASK [openshift_logging_kibana : Set Kibana Proxy secret] **********************
ok: [ec2-34-230-65-4.compute-1.amazonaws.com]

TASK [openshift_logging_kibana : Generate Kibana DC template] ******************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: OSError: [Errno 12] Cannot allocate memory
fatal: [ec2-34-230-65-4.compute-1.amazonaws.com]: FAILED! => {"failed": true, "msg": "Unexpected failure during module execution.", "stdout": ""}

RUNNING HANDLER [openshift_logging_elasticsearch : Restarting logging-{{ _cluster_component }} cluster] ***

RUNNING HANDLER [openshift_logging_elasticsearch : set_fact] *******************

Comment 8 Michael Gugino 2018-05-22 13:50:41 UTC

Please describe host details such as memory on both the ansible host and the target host.

This latest failure does not resemble previous scenarios.  There are no dynamic includes and no looping for that task.

Comment 9 Anping Li 2018-05-23 01:39:05 UTC

ansible slave 8Gi on, openshift-ansible-3.7.48. Running as docker containers
hosts: 8Gi in AWS

Comment 10 Anping Li 2018-05-23 02:32:13 UTC

Logging Inventory varaibles

openshift_logging_fluentd_audit_container_engine=true
openshift_logging_install_eventrouter=true
openshift_logging_elasticsearch_kibana_index_mode=shared_ops
openshift_logging_es_allow_external=True
openshift_logging_es_ops_pvc_dynamic=true
openshift_logging_use_ops=true
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_number_of_shards=1
openshift_logging_es_number_of_replicas=1
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_cluster_size=3
openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/
openshift_logging_install_logging=true

Comment 16 Michael Gugino 2018-07-06 18:18:56 UTC

I don't see a reason for this to be happening with this role.  Perhaps you have reverted to using a newer version of ansible with 3.7?  It's important to use a 2.3 release as in 2.4 'include_role' and similar statements are dynamic includes; in 2.3 those same statements would be static by default.

Also, it's possible that the host or container is running out of memory due to memory consumption by other processes.

You can try adding a temporary swap file on the host running ansible to increase memory, or you can try limiting the number of nodes in inventory when running the kibana plays.

Comment 17 Scott Dodson 2018-07-24 17:03:17 UTC

Anping, which version of ansible was used in your testing?

mmariyan, can you please confirm whether the problem is alleviated by running ansible 2.3? They should be able to simply run `yum downgrade ansible-2.3*` to get ansible 2.3 re-installed.

Comment 18 Scott Dodson 2018-07-24 17:20:18 UTC

(In reply to Scott Dodson from comment #17)
> Anping, which version of ansible was used in your testing?
> 
> mmariyan, can you please confirm whether the problem is
> alleviated by running ansible 2.3? They should be able to simply run `yum
> downgrade ansible-2.3*` to get ansible 2.3 re-installed.

Specifically with Ansible 2.3 and openshift-ansible-3.7.47 and newer, the original bug was opened against openshift-ansible-3.7.42.

Comment 19 Anping Li 2018-07-25 01:46:21 UTC

Scott, I am using ose-ansible image, it should be ansible 2.3.2.0

Comment 21 Brenton Leanhardt 2018-07-31 14:52:35 UTC

For Documentation, could the 3.7 "known issues" page be updated to state that only the released version of Ansible 2.3 in the OCP channel should be used with OCP 3.7?

The challenge is that RHEL released Ansible 2.4 which means some customers install it.  They need to be instructed to downgrade if they are using OCP 3.7.

Comment 22 Brenton Leanhardt 2018-07-31 14:59:55 UTC

Sorry for the noise, originally I moved this to documentation bug I later thought it would be less confusing to clone the bug and change the title.

Note You need to log in before you can comment on or make changes to this bug.