Description of problem: Scaling from: parameter_defaults: DnsServers: ["10.16.36.29","10.11.5.19"] ControllerCount: 3 CephStorageCount: 18 R620ComputeCount: 23 R630ComputeCount: 23 6018RComputeCount: 1 R930ComputeCount: 1 1029pComputeCount: 0 1029uComputeCount: 1 1028rComputeCount: 1 R730ComputeCount: 1 ComputeCount: 0 To: parameter_defaults: DnsServers: ["10.16.36.29","10.11.5.19"] ControllerCount: 3 CephStorageCount: 18 R620ComputeCount: 77 R630ComputeCount: 46 6018RComputeCount: 1 R930ComputeCount: 1 1029pComputeCount: 0 1029uComputeCount: 1 1028rComputeCount: 1 R730ComputeCount: 1 ComputeCount: 0 Resulted in : 2018-03-02 11:58:37Z [overcloud-AllNodesDeploySteps-w5a6kxikdgwc.R620ComputeDeployment_Step1]: UPDATE_COMPLETE state changed 2018-03-02 11:58:37Z [overcloud-AllNodesDeploySteps-w5a6kxikdgwc.WorkflowTasks_Step2_Execution]: UPDATE_IN_PROGRESS state changed 2018-03-02 11:58:37Z [overcloud-AllNodesDeploySteps-w5a6kxikdgwc.WorkflowTasks_Step2_Execution]: UPDATE_COMPLETE The Resource WorkflowTasks_Step2_Execution requires replacement. 2018-03-02 11:58:38Z [overcloud-AllNodesDeploySteps-w5a6kxikdgwc.WorkflowTasks_Step2_Execution]: CREATE_IN_PROGRESS state changed 2018-03-02 12:44:46Z [overcloud-AllNodesDeploySteps-w5a6kxikdgwc.WorkflowTasks_Step2_Execution]: CREATE_FAILED resources.WorkflowTasks_Step2_Execution: ERROR 2018-03-02 12:44:47Z [overcloud-AllNodesDeploySteps-w5a6kxikdgwc]: UPDATE_FAILED resources.WorkflowTasks_Step2_Execution: ERROR 2018-03-02 12:44:48Z [AllNodesDeploySteps]: UPDATE_FAILED resources.AllNodesDeploySteps: resources.WorkflowTasks_Step2_Execution: ERROR 2018-03-02 12:44:49Z [overcloud]: UPDATE_FAILED resources.AllNodesDeploySteps: resources.WorkflowTasks_Step2_Execution: ERROR Stack overcloud UPDATE_FAILED overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: c5c6b59a-7a03-4993-bad8-8ae0abb2a0e0 status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR Fri Mar 2 12:45:25 UTC 2018 Which looking at the ceph-ansible log the last task that ran was : 2018-03-02 12:05:20,644 p=27069 u=mistral | PLAY [mons,agents,osds,mdss,rgws,nfss,restapis,rbdmirrors,clients,iscsigws,mgrs] *** 2018-03-02 12:05:21,544 p=27069 u=mistral | TASK [gather and delegate facts] *********************************************** Which ended with : 2018-03-02 12:44:06,025 p=27069 u=mistral | ok: [192.168.25.51 -> 192.168.25.168] => (item=192.168.25.168) 2018-03-02 12:44:06,136 p=27069 u=mistral | ok: [192.168.25.54 -> 192.168.25.169] => (item=192.168.25.169) 2018-03-02 12:44:06,817 p=27069 u=mistral | ok: [192.168.25.171 -> 192.168.25.165] => (item=192.168.25.165) 2018-03-02 12:44:07,263 p=27069 u=mistral | ERROR! A worker was found in a dead state
Possibly related? https://github.com/ansible/ansible/issues/32554
Not sure if we ran out of fd's... I suppose I would need to maybe bump the output of ansible to get more insight? [stack@b04-h01-1029p ~]$ sudo sysctl fs.file-nr fs.file-nr = 12928 0 26125814
try ulimit -a from account running ceph-ansible and see if any resource limits would affect it. If so change /etc/security/limits.*. Also see: https://bugzilla.redhat.com/show_bug.cgi?id=1459891 For discussion of which kernel parameters limit Ceph thread creation. Note that this problem goes away when we transition to RHCS 3.0 but we are still running RHCS 2.4 in RHOSP 12.
cc'ing John Fulton, OpenStack-Ceph DFG lead.
Thanks Ben, Mistral is kicking off ceph-ansible so : (overcloud) [stack@b04-h01-1029p ~]$ cat /proc/176989/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 1029496 1029496 processes Max open files 1024 4096 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 1029496 1029496 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us (overcloud) [stack@b04-h01-1029p ~]$ I am trying to run through the same test again to see if I can get more detail on what is causing the failure.
- Moving this from an OSP bug to a Ceph ceph-ansible bug - It seems, with a high number of clients, that ceph-ansible hits this issue - Rather than throw more memory at the host running ceph-ansible, can ceph-ansible optimize the client configuration so that it can configure 89 nodes (of which only 3 are clients)
(In reply to John Fulton from comment #8) > - Moving this from an OSP bug to a Ceph ceph-ansible bug > - It seems, with a high number of clients, that ceph-ansible hits this issue > - Rather than throw more memory at the host running ceph-ansible, can > ceph-ansible optimize the client configuration so that it can configure 89 > nodes (of which only 3 are clients) Typo: "of which only 3 are ceph OSD servers" The other 86 were just in the ansible inventory under the ceph-client role: https://github.com/ceph/ceph-ansible/tree/master/roles/ceph-client
That task we're failing on here is a very expensive one https://github.com/ceph/ceph-ansible/blob/master/site.yml.sample#L57 I believe this was initially added so that we could support the --limit of ansible-playbook. The issue being that to generate a ceph.conf we need to know facts from all nodes in the cluster. In this case we're having an issue because of the amount of client nodes, but facts from clients nodes are not needed to generate a ceph.conf. Perhaps ceph-ansible could find a way to avoid collecting facts from client nodes (or any other nodes not needed in conf generation) on that task. If you don't need to update the ceph.conf on the client nodes it looks like you get around this by setting 'delegate_facts_host: false' and using '--skip-tags= ceph_update_config'
Discussed in stand-up call today. OSP team has attempted delegate_facts_host: false and they still hit the memory problem. Guillaume and Joe are working to reproduce this today. Joe and Guillaume, would you please share the results of your testing?
It looks like `delegate_facts_host` does not exist in the stable-3.0 version of ceph-ansible upstream. This commit would need backported: https://github.com/ceph/ceph-ansible/commit/4596fbaac1322a4c670026bc018e3b5b061b072b
This commit would add `delegate_facts_host` to the site-docker.yml.sample playbook and it is not backported to stable-3.0 either: https://github.com/ceph/ceph-ansible/commit/c315f81dfe440945aaa90265cd3294fdea549942
(In reply to Andrew Schoen from comment #16) > This commit would add `delegate_facts_host` to the site-docker.yml.sample > playbook and it is not backported to stable-3.0 either: > > https://github.com/ceph/ceph-ansible/commit/ > c315f81dfe440945aaa90265cd3294fdea549942 I'm incorrect, that commit does exist in stable-3.0 upstream.
I tried to run the playbook on an admin node with only 200Mb RAM with 60+ nodes in the inventory. After many tests I was only able to hit a memory issue but not the one described in that BZ : An exception occurred during task execution. To see the full traceback, use -vvv. The error was: OSError: [Errno 12] Cannot allocate memory fatal: [osd17]: FAILED! => {} MSG: Unexpected failure during module execution. Looks like I run ouf of memory before I can hit the issue reported. I'm not sure how I can reproduce this. Joe, if you can reproduce this error in your env, could you link the playbook run log and keep the env running so I can take a look ? Thanks!
In today's Ceph DFG meeting Guillaume told us that this PR is being tried: https://github.com/ceph/ceph-ansible/pull/2459 This might solve the problem if it works as intended. Question: does this change mean that a node which is only in [clients] role will not need to deploy a container just to discover facts, put ceph.conf in and install RPMs? I think the cost of container deployment to potentially hundreds of compute nodes was part of Joe Talerico's concern. Thanks!
I was looking at Joe Talerico's results here https://i.imgur.com/eppyWLW.png with/without Guillaume's patch, wondering why would compute nodes still have to pull docker image down, with Guillaume's patch? Then I looked at roles/ceph-client/tasks/create_users_keys.yml, which shows that on compute nodes you still have to pull down the docker image to manufacture keys and create pools. What I don't get about it is this: - aren't the keys the same on every client? - so why can't they be manufactured on one of the clients and copied to the other ones? - why are pools created ON THE CLIENTS? This should only be done once. Is it possible to do pool creation from the first client only? If it was possible to do it this way, I think deployment would be greatly speeded up if we avoid deploying a container on every client, as Joe suggested.
Will be in 3.1
Would you please tag v3.1.0beta5 on master for this so OSP 13 can cross-ship this into their release?
There has been additional work on memory consumption and scalability of ceph-ansible, See https://github.com/ceph/ceph-ansible/issues/2553. I plan to test in Infrared with 80-node deploy, if we get the result with PR 2560, which passed CI, then I think we can consider the memory issue resolved.
memory issue is fixed, and O(N^2) issue is fixed. Tested with 90 computes, 4 OSDs and 3 mons using ceph-ansible-3.1.0-0.1.rc3.el7cp.noarch and ansible 2.4.3. The delegate facts task takes 10 min, about 1/4 of total ceph-ansible execution time, to run. Perhaps it is collecting on 1 host at a time. But this is not the original issue, and it certainly isn't worse than before in this respect. May revisit slowness later, and we may need to test ceph-ansible more extensively at this scale (e.g. in HCI configuration, with more roles in play). See ceph-ansible issue comment here: https://github.com/ceph/ceph-ansible/issues/2553#issuecomment-390020874
OpenStack shipped ceph-ansible-3.1.0-0.1.rc9.el7cp first in http://access.redhat.com/errata/RHEA-2018:2086 . RHCEPH shipped ceph-ansible-3.1.5-1.el7cp in http://access.redhat.com/errata/RHBA-2018:2819 .