Bug 1525577

Summary: ceph-ansible fork count should be a parameter
Product: Red Hat OpenStack Reporter: John Fulton <johfulto>
Component: openstack-tripleo-commonAssignee: John Fulton <johfulto>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: acanan, gfidente, jjoyce, jomurphy, jschluet, mburns, rhel-osp-director-maint, slinaber
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-7.6.9-1.el7ost openstack-tripleo-heat-templates-7.0.9-1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-28 17:27:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Fulton 2017-12-13 15:44:07 UTC
By default OSP12 sets the ansible fork count to maximize the number of concurrent ceph-ansible processes [0] by setting it equal to the number of nodes needing to be configured up to 100. However, if the undercloud doesn't have enough memory, then this setting cause the undercloud to run out of RAM [1]. The user should be able to easily override this parameter and not use the formula. 

[0] https://github.com/openstack/tripleo-common/blob/fa0b9f52080580b7408dc6f5f2da6fc1dc07d500/workbooks/ceph-ansible.yaml#L25

[1] 
2017-12-13 04:05:53,887 p=7259 u=mistral |  ERROR! Unexpected Exception, this is probably a bug: [Errno 12] Cannot allocate memory                                                            │····················
2017-12-13 04:05:53,893 p=7259 u=mistral |  to see the full traceback, use -vvv                                                                                                               │····················
2017-12-13 04:05:58,541 p=7259 u=mistral |  the full traceback was:                                                                                                                           │····················
                                                                                                                                                                                              │····················
Traceback (most recent call last):                                                                                                                                                            │····················
  File "/bin/ansible-playbook", line 106, in <module>                                                                                                                                         │····················
    exit_code = cli.run()                                                                                                                                                                     │····················
  File "/usr/lib/python2.7/site-packages/ansible/cli/playbook.py", line 130, in run                                                                                                           │····················
    results = pbex.run()                                                                                                                                                                      │····················
  File "/usr/lib/python2.7/site-packages/ansible/executor/playbook_executor.py", line 154, in run                                                                                             │····················
    result = self._tqm.run(play=play)                                                                                                                                                         │····················
  File "/usr/lib/python2.7/site-packages/ansible/executor/task_queue_manager.py", line 302, in run                                                                                            │····················
    play_return = strategy.run(iterator, play_context)                                                                                                                                        │····················
  File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/linear.py", line 277, in run                                                                                                │····················
    self._queue_task(host, task, task_vars, play_context)                                                                                                                                     │····················
  File "/usr/lib/python2.7/site-packages/ansible/plugins/strategy/__init__.py", line 222, in _queue_task                                                                                      │····················
    worker_prc.start()                                                                                                                                                                        │····················
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start                                                                                                                  │····················
    self._popen = Popen(self)                                                                                                                                                                 │····················
  File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__                                                                                                               │····················
    self.pid = os.fork()                                                                                                                                                                      │····················
OSError: [Errno 12] Cannot allocate memory

Comment 3 John Fulton 2018-01-10 15:28:19 UTC
- relevant patches merged upstream for pike
- a related, but not the same bug, is https://bugzilla.redhat.com/show_bug.cgi?id=1527205

Comment 10 Yogev Rabl 2018-03-13 19:18:15 UTC
verified on openstack-tripleo-common-7.6.9-2.el7ost.noarch

Comment 13 errata-xmlrpc 2018-03-28 17:27:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0607