Bug 1651415 - ceph-ansible task "generate ceph configuration file: {{ cluster }}.conf" slow with large clusters
Summary: ceph-ansible task "generate ceph configuration file: {{ cluster }}.conf" slow...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.0
Hardware: All
OS: All
unspecified
high
Target Milestone: rc
: 4.0
Assignee: Dimitri Savineau
QA Contact: Vasishta
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-20 02:32 UTC by Patrick Donnelly
Modified: 2019-09-26 17:58 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-26 17:58:29 UTC
Embargoed:


Attachments (Terms of Use)

Description Patrick Donnelly 2018-11-20 02:32:49 UTC
Description of problem:

In a Linode cluster [1] configured with 256 ceph-osds, 0 clients, 3 ceph-mon, 2 ceph-mgr, I have observed this task in $subject [2] take a very long time to complete. It appears there is some linear or quadratic behavior with the size of the cluster.

[1] https://github.com/batrick/ceph-linode
[2] https://github.com/ceph/ceph-ansible/blob/098f42f2334c442bf418f09d3f4b3b99750c7ba0/roles/ceph-config/tasks/main.yml#L77-L93

Version-Release number of selected component (if applicable):

4.0 (master)

How reproducible:

100%

Steps to Reproduce:
1. Create a large cluster using steps as outlined on ceph-linode's README. 256 OSDs reliably reproduces the issue.

Other notes:

--forks=50 [3] is not the cause. I've observed the same issue with --forks=5.

[3] https://github.com/batrick/ceph-linode/blob/master/ansible-env.bash#L8

Comment 3 Dimitri Savineau 2019-02-26 20:16:47 UTC
Could you give us more information about the setup ?
  - ansible version
  - containerized deployment
  - any other useful configuration variables (ceph overrides, osd scenarios, etc..)

When you said 256 OSDs, I suppose that it's not the number of osd nodes but osd devices right ? If that's true how many dedicated nodes are you using ?

I took a quick look and I don't see any reason why the ceph.conf template generation could take more time with the number of OSDs.
I have more concern about the OSDs count on the ceph_volume task [1] than the template creation.

Also it could be interesting to run ceph-ansible with the configuration from ansible.cfg [2] (because it's overrided by linode's launch.sh script) and see the task timing via the profile_tasks callback.

[1] https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-config/tasks/main.yml#L22-L39
[2] https://github.com/ceph/ceph-ansible/blob/master/ansible.cfg

Comment 5 Giridhar Ramaraju 2019-08-05 13:09:37 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 6 Giridhar Ramaraju 2019-08-05 13:10:52 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri


Note You need to log in before you can comment on or make changes to this bug.