Description of problem: We are seeing high memory utilization while running ceph-ansible with OpenStack Director (TripleO) - Results from: ControllerCount: 3 CephStorageCount: 18 R620ComputeCount: 6 6018RComputeCount: 2 1029PComputeCount: 3 + https://snapshot.raintank.io/dashboard/snapshot/FkKSk62ntTGjJx9NnliDCo2qDEi5zzr1 - Results from: ControllerCount: 3 CephStorageCount: 18 R620ComputeCount: 6 6018RComputeCount: 2 1029PComputeCount: 57 + https://snapshot.raintank.io/dashboard/snapshot/7NoI3ptoaRUic5kOA3AftPfA53zeiWxg?orgId=2 The first spike we see (@07:26), seems to be around the task: 2017-12-17 07:26:42,386 p=381800 u=mistral | TASK [ceph-defaults : set_fact monitor_name ansible_hostname] ****************** Which runs across all the nodes (not just the ceph nodes). Is it necessary to set this fact across all nodes? Doing a quick search, I don't see a reason to have this run on the compute nodes. Trying to track this down more, the spike to 34GB RSS @ 07:29 12/17 is around the task, 2017-12-17 07:29:17,892 p=381800 u=mistral | TASK [ceph-docker-common : pull ceph/daemon image] ***************************** This task seems to be happening across all the nodes, should it not be skipped unless it is a ceph node (mon or nodes with osds)? Reviewing a compute-node. [root@overcloud-1029pcompute-7 heat-admin]# hostname; grep "docker pull" /var/log/messages overcloud-1029pcompute-7 Dec 17 02:29:38 localhost ansible-command: Invoked with warn=True executable=None _uses_shell=False _raw_params=docker pull docker.io/ceph/daemon:tag-build-master-jewel-centos-7 removes=None creates=None chdir=None stdin=None This shows the specific task I mentioned (docker pull) running on the compute node. Looking at docker on that compute node, we can see the ceph-image: [root@overcloud-1029pcompute-7 heat-admin]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/ceph/daemon tag-build-master-jewel-centos-7 fc66b4dad728 2 weeks ago 677.5 MB docker-registry.engineering.redhat.com/rhosp12/openstack-nova-compute-docker 12.0-20171127.1 27596daf8bf3 2 weeks ago 1.178 GB docker-registry.engineering.redhat.com/rhosp12/openstack-ceilometer-central-docker 12.0-20171127.1 867b7e52e622 2 weeks ago 699.4 MB docker-registry.engineering.redhat.com/rhosp12/openstack-ceilometer-compute-docker 12.0-20171127.1 828af4062894 2 weeks ago 699.4 MB docker-registry.engineering.redhat.com/rhosp12/openstack-nova-libvirt-docker 12.0-20171127.1 9c1b1840ab52 2 weeks ago 1.062 GB docker-registry.engineering.redhat.com/rhosp12/openstack-cron-docker 12.0-20171127.1 66bed5ed2d94 2 weeks ago 341.1 MB [root@overcloud-1029pcompute-7 heat-admin]# Version-Release number of selected component (if applicable): ceph-ansible-3.0.14-1.el7cp.noarch Expected results: Ansible tasks only run on nodes necessary for ceph (ie, not all 100+ compute nodes need to run docker-pull). Additional info:
This might be addressed by the following if you're able to test it: https://github.com/ceph/ceph-ansible/pull/2283
Implementing the changes in Comment #3 didn't help: https://snapshot.raintank.io/dashboard/snapshot/84QQ2mEdJ2A7bzzNkceDIpsX0muLYgJY?orgId=2 Could something like the docker_image module help here?
Joe, did you run Ansible with "-e delegate_facts_host=False"?
Seb, Yes, we tested with ceph-andible 3.0 using the backport from the linked PR with that value defaulted to False in the site file. The site-docker.yaml file we used is online at: http://ix.io/Dc John
typo, that site-docker.yaml file is at http://ix.io/Dfc
If this doesn't help then I'm not sure where the problem is. What makes you think that the docker_image module will help?
Does https://github.com/ceph/ceph-ansible/pull/2283 resolve this bug, or should we remove that PR from the External Trackers?
(In reply to Ken Dreyer (Red Hat) from comment #9) > Does https://github.com/ceph/ceph-ansible/pull/2283 resolve this bug, or > should we remove that PR from the External Trackers? Hi Ken, no Joe had the same problem using the PR so I've removed it from tracker.
(In reply to leseb from comment #8) > If this doesn't help then I'm not sure where the problem is. > What makes you think that the docker_image module will help? Trying multiple things, it is recommended to use the built in modules when possible. However, this did not seem to help. Using async did seem to help a little. These memory spikes seem related to the set_facts that ceph-ansible does across many tasks. I have a failed deployment, but it _seems_ that setting forks to 25 helped with the memory utilization. Forks: 25 -> https://snapshot.raintank.io/dashboard/snapshot/p0bQAtzt7huo3hWCyoXmvrs3ZlSGN1Wk?panelId=81&fullscreen&orgId=2 Forks: 100 (84 in this deployment) -> https://snapshot.raintank.io/dashboard/snapshot/BaSVC5qpmW26Ea7Amt0mvDyz3FIDKC0Z?orgId=2 With the Forks:25 I did have a overcloud failure, but it was not related to ceph-ansible. I am re-running the deploy. If we feel that updating the forks calculation to 25 is reasonable, I will push a patchset.
There is one more patch that needs to merge so that this can be backported to Pike and it is: https://review.openstack.org/531616
https://review.openstack.org/#/c/531616 merged
We don't have sufficient hardware to test this a bug in this scale
Joe, can you please verify it in the scale lab?
Joe made the observation that an entire container must be deployed to every RBD client (compute node) just to generate facts for ansible. As Andrew Schoen notes in https://bugzilla.redhat.com/show_bug.cgi?id=1550977#c13 these facts are not needed to generate ceph.conf for RBD clients, or to install the Ceph RPMs on those clients. Andrew suggested that we try to inhibit fact collection on nodes that are only clients. Andrew suggests "If you don't need to update the ceph.conf on the client nodes it looks like you get around this by setting 'delegate_facts_host: false' and using '--skip-tags= ceph_update_config'" . I don't understand what "--skip-tags=ceph_update_config" does at all, but Andrew are you proposing to add a tag "ceph_update_config" to the "gather and delegate facts" task in site-docker.yml.sample (and site.yml.sample)? How would this avoid impacting nodes where ansible *does* need to collect facts? The root of the problem, it seems to me, is that we cannot easily express the set of hosts that play the "clients" role only, so we cannot decide at the top-level playbook whether we need to inhibit fact gathering or not. Can we push the decision about whether or not to gather facts into the per-role main.yml? What consequences would this have? Specifically, suppose we are talking about a hyperconverged system which is a member of [osds], [clients], [rgws], etc. Would this trigger more fact gathering than before?
(In reply to Ben England from comment #21) > The root of the problem, it seems to me, is that we cannot easily express > the set of hosts that play the "clients" role only, so we cannot decide at > the top-level playbook whether we need to inhibit fact gathering or not. > > Can we push the decision about whether or not to gather facts into the > per-role main.yml? What consequences would this have? Specifically, > suppose we are talking about a hyperconverged system which is a member of > [osds], [clients], [rgws], etc. Would this trigger more fact gathering than > before? TripleO matches its roles to ceph-ansible roles. So if TripleO has an HCI role with the following: ServicesDefault: - OS::TripleO::Services::CephClient - OS::TripleO::Services::CephOSD and the node at IP 192.168.1.42 is from this role, then Mistral will build an ansible inventory containing: osds: hosts: 192.168.1.42: {} clients: hosts: 192.168.1.42: {} and ceph-ansible will "make it so" on the one node twice. In Joe's case he deployed many computes so we had: clients: hosts: 192.168.1.2 192.168.1.3 ... If we could just grab the first client host and configure it in the computationally expensive way and then copy the result of it to the rest of the client hosts, then it would probably result in less resources being consumed. At least that's the theory. Note that this bug tracks the fix for lowering the fork count (and the fork count change is on QA) in OSPd. If necessary, we can clone it to a Ceph bug which is focussed on the client optimization. The ceph team hasn't asked for that yet however and the issue is being researched under BZ 1550977.
According to our records, this should be resolved by openstack-tripleo-common-7.6.9-3.el7ost. This build is available now.
Verified on openstack-tripleo-common-7.6.9-3.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2331