Bug 1527205 - ansible memory utilization
Summary: ansible memory utilization
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 12.0 (Pike)
Hardware: All
OS: Linux
high
high
Target Milestone: z3
: 12.0 (Pike)
Assignee: John Fulton
QA Contact: Yogev Rabl
URL:
Whiteboard: PerfScale
Keywords: TestOnly, Triaged, ZStream
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-18 19:46 UTC by Joe Talerico
Modified: 2019-03-28 14:01 UTC (History)
19 users (show)

(edit)
TripleO uses ceph-ansible to configure Ceph clients and servers. 
To reduce the undercloud memory requirement when deploying a large number of Compute nodes, the TripleO ceph-ansible fork count default was reduced from 50 to 25.
One result of the lower fork count is a reduction in the number of hosts that can be configured in parallel.

You can use a Heat environment file to override the default fork count. The following example sets the fork count to 10.

parameter_defaults:
  CephAnsibleEnvironmentVariables:
    DEFAULT_FORKS: '10'
Clone Of:
(edit)
Last Closed: 2018-08-20 12:58:39 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2331 None None None 2018-08-20 12:59 UTC
OpenStack gerrit 531616 None stable/pike: MERGED tripleo-common: Lowering the number of ansible forks to 25 (I7c57b641aa7fea02f865778321c98d1dd3d7e085) 2018-02-16 17:57 UTC

Description Joe Talerico 2017-12-18 19:46:32 UTC
Description of problem:

We are seeing high memory utilization while running ceph-ansible with OpenStack Director (TripleO)

- Results from: ControllerCount: 3 CephStorageCount: 18
R620ComputeCount: 6 6018RComputeCount: 2 1029PComputeCount: 3
+ https://snapshot.raintank.io/dashboard/snapshot/FkKSk62ntTGjJx9NnliDCo2qDEi5zzr1

- Results from: ControllerCount: 3 CephStorageCount: 18
R620ComputeCount: 6 6018RComputeCount: 2 1029PComputeCount: 57
+ https://snapshot.raintank.io/dashboard/snapshot/7NoI3ptoaRUic5kOA3AftPfA53zeiWxg?orgId=2

The first spike we see (@07:26), seems to be around the task:
2017-12-17 07:26:42,386 p=381800 u=mistral |  TASK [ceph-defaults :
set_fact monitor_name ansible_hostname] ******************

Which runs across all the nodes (not just the ceph nodes). Is it
necessary to set this fact across all nodes? Doing a quick search, I
don't see a reason to have this run on the compute nodes.

Trying to track this down more, the spike to 34GB RSS @ 07:29 12/17 is
around the task,
2017-12-17 07:29:17,892 p=381800 u=mistral |  TASK [ceph-docker-common
: pull ceph/daemon image] *****************************

This task seems to be happening across all the nodes, should it not be
skipped unless it is a ceph node (mon or nodes with osds)?

Reviewing a compute-node.

[root@overcloud-1029pcompute-7 heat-admin]# hostname; grep "docker
pull" /var/log/messages
overcloud-1029pcompute-7
Dec 17 02:29:38 localhost ansible-command: Invoked with warn=True
executable=None _uses_shell=False _raw_params=docker pull
docker.io/ceph/daemon:tag-build-master-jewel-centos-7 removes=None
creates=None chdir=None stdin=None

This shows the specific task I mentioned (docker pull) running on the
compute node.

Looking at docker on that compute node, we can see the ceph-image:

[root@overcloud-1029pcompute-7 heat-admin]# docker images
REPOSITORY
              TAG                               IMAGE ID
CREATED             SIZE
docker.io/ceph/daemon
              tag-build-master-jewel-centos-7   fc66b4dad728        2
weeks ago         677.5 MB
docker-registry.engineering.redhat.com/rhosp12/openstack-nova-compute-docker
        12.0-20171127.1                   27596daf8bf3        2 weeks
ago         1.178 GB
docker-registry.engineering.redhat.com/rhosp12/openstack-ceilometer-central-docker
  12.0-20171127.1                   867b7e52e622        2 weeks ago
     699.4 MB
docker-registry.engineering.redhat.com/rhosp12/openstack-ceilometer-compute-docker
  12.0-20171127.1                   828af4062894        2 weeks ago
     699.4 MB
docker-registry.engineering.redhat.com/rhosp12/openstack-nova-libvirt-docker
        12.0-20171127.1                   9c1b1840ab52        2 weeks
ago         1.062 GB
docker-registry.engineering.redhat.com/rhosp12/openstack-cron-docker
              12.0-20171127.1                   66bed5ed2d94        2
weeks ago         341.1 MB
[root@overcloud-1029pcompute-7 heat-admin]#

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.14-1.el7cp.noarch


Expected results:
Ansible tasks only run on nodes necessary for ceph (ie, not all 100+ compute nodes need to run docker-pull). 

Additional info:

Comment 3 John Fulton 2017-12-19 15:11:48 UTC
This might be addressed by the following if you're able to test it: 

 https://github.com/ceph/ceph-ansible/pull/2283

Comment 4 Joe Talerico 2017-12-20 14:52:43 UTC
Implementing the changes in Comment #3 didn't help: 

https://snapshot.raintank.io/dashboard/snapshot/84QQ2mEdJ2A7bzzNkceDIpsX0muLYgJY?orgId=2

Could something like the docker_image module help here?

Comment 5 leseb 2017-12-20 16:46:46 UTC
Joe, did you run Ansible with "-e delegate_facts_host=False"?

Comment 6 John Fulton 2017-12-20 18:25:48 UTC
Seb,

Yes, we tested with ceph-andible 3.0 using the backport from the linked PR with that value defaulted to False in the site file. The site-docker.yaml file we used is online at: http://ix.io/Dc

  John

Comment 7 John Fulton 2017-12-20 18:27:25 UTC
typo, that site-docker.yaml file is at http://ix.io/Dfc

Comment 8 leseb 2017-12-21 09:35:56 UTC
If this doesn't help then I'm not sure where the problem is.
What makes you think that the docker_image module will help?

Comment 9 Ken Dreyer (Red Hat) 2017-12-21 19:47:24 UTC
Does https://github.com/ceph/ceph-ansible/pull/2283 resolve this bug, or should we remove that PR from the External Trackers?

Comment 10 John Fulton 2017-12-21 21:08:11 UTC
(In reply to Ken Dreyer (Red Hat) from comment #9)
> Does https://github.com/ceph/ceph-ansible/pull/2283 resolve this bug, or
> should we remove that PR from the External Trackers?

Hi Ken, no Joe had the same problem using the PR so I've removed it from tracker.

Comment 11 Joe Talerico 2017-12-22 12:09:19 UTC
(In reply to leseb from comment #8)
> If this doesn't help then I'm not sure where the problem is.
> What makes you think that the docker_image module will help?

Trying multiple things, it is recommended to use the built in modules when possible.

However, this did not seem to help. Using async did seem to help a little.

These memory spikes seem related to the set_facts that ceph-ansible does across many tasks. 

I have a failed deployment, but it _seems_ that setting forks to 25 helped with the memory utilization. 
Forks: 25 -> https://snapshot.raintank.io/dashboard/snapshot/p0bQAtzt7huo3hWCyoXmvrs3ZlSGN1Wk?panelId=81&fullscreen&orgId=2

Forks: 100 (84 in this deployment) -> https://snapshot.raintank.io/dashboard/snapshot/BaSVC5qpmW26Ea7Amt0mvDyz3FIDKC0Z?orgId=2

With the Forks:25 I did have a overcloud failure, but it was not related to ceph-ansible. I am re-running the deploy.

If we feel that updating the forks calculation to 25 is reasonable, I will push a patchset.

Comment 12 John Fulton 2018-01-10 15:24:02 UTC
There is one more patch that needs to merge so that this can be backported to Pike and it is: https://review.openstack.org/531616

Comment 15 John Fulton 2018-02-08 02:58:20 UTC
https://review.openstack.org/#/c/531616 merged

Comment 17 Yogev Rabl 2018-03-12 17:45:20 UTC
We don't have sufficient hardware to test this a bug in this scale

Comment 18 Yogev Rabl 2018-03-12 17:47:34 UTC
Joe, can you please verify it in the scale lab?

Comment 21 Ben England 2018-03-21 12:07:11 UTC
Joe made the observation that an entire container must be deployed to every RBD client (compute node) just to generate facts for ansible.  As Andrew Schoen notes in 

https://bugzilla.redhat.com/show_bug.cgi?id=1550977#c13 

these facts are not needed to generate ceph.conf for RBD clients, or to install the Ceph RPMs on those clients.  

Andrew suggested that we try to inhibit fact collection on nodes that are only clients.   Andrew suggests "If you don't need to update the ceph.conf on the client nodes it looks like you get around this by setting 'delegate_facts_host: false' and using '--skip-tags= ceph_update_config'"  . 

I don't understand what "--skip-tags=ceph_update_config" does at all, but Andrew are you proposing to add a tag "ceph_update_config" to the "gather and delegate facts" task in site-docker.yml.sample (and site.yml.sample)?  How would this avoid impacting nodes where ansible *does* need to collect facts?  

The root of the problem, it seems to me, is that we cannot easily express the set of hosts that play the "clients" role only, so we cannot decide at the top-level playbook whether we need to inhibit fact gathering or not.   

Can we push the decision about whether or not to gather facts into the per-role main.yml?  What consequences would this have?  Specifically, suppose we are talking about a hyperconverged system which is a member of [osds], [clients], [rgws], etc.  Would this trigger more fact gathering than before?

Comment 22 John Fulton 2018-03-21 15:19:40 UTC
(In reply to Ben England from comment #21)
> The root of the problem, it seems to me, is that we cannot easily express
> the set of hosts that play the "clients" role only, so we cannot decide at
> the top-level playbook whether we need to inhibit fact gathering or not.   
> 
> Can we push the decision about whether or not to gather facts into the
> per-role main.yml?  What consequences would this have?  Specifically,
> suppose we are talking about a hyperconverged system which is a member of
> [osds], [clients], [rgws], etc.  Would this trigger more fact gathering than
> before?

TripleO matches its roles to ceph-ansible roles. So if TripleO has an HCI role with the following:

  ServicesDefault:
    - OS::TripleO::Services::CephClient
    - OS::TripleO::Services::CephOSD

and the node at IP 192.168.1.42 is from this role, then Mistral will build an ansible inventory containing:

osds:
  hosts:
    192.168.1.42: {}

clients:
  hosts:
    192.168.1.42: {}

and ceph-ansible will "make it so" on the one node twice. 

In Joe's case he deployed many computes so we had:

clients:
  hosts:
    192.168.1.2
    192.168.1.3
    ...

If we could just grab the first client host and configure it in the computationally expensive way and then copy the result of it to the rest of the client hosts, then it would probably result in less resources being consumed. At least that's the theory. 

Note that this bug tracks the fix for lowering the fork count (and the fork count change is on QA) in OSPd. If necessary, we can clone it to a Ceph bug which is focussed on the client optimization. The ceph team hasn't asked for that yet however and the issue is being researched under BZ 1550977.

Comment 27 Lon Hohberger 2018-03-29 10:34:55 UTC
According to our records, this should be resolved by openstack-tripleo-common-7.6.9-3.el7ost.  This build is available now.

Comment 29 Yogev Rabl 2018-07-19 12:42:49 UTC
Verified on openstack-tripleo-common-7.6.9-3.el7ost.noarch

Comment 32 errata-xmlrpc 2018-08-20 12:58:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2331


Note You need to log in before you can comment on or make changes to this bug.