Bug 1664112

Summary:	Cache size is not created correctly in a hyperconverged installation when using the is_hci flag
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Eliad Cohen <elicohen>
Component:	Ceph-Ansible	Assignee:	Dimitri Savineau <dsavinea>
Status:	CLOSED ERRATA	QA Contact:	Vasishta <vashastr>
Severity:	medium	Docs Contact:	Bara Ancincova <bancinco>
Priority:	medium
Version:	3.2	CC:	anharris, aschoen, ceph-eng-bugs, ceph-qe-bugs, dsavinea, edonnell, elicohen, gabrioux, gfidente, gmeno, jbrier, johfulto, mburrows, nojha, nthomas, nweinber, pasik, pnguyen, tchandra, tserlin
Target Milestone:	z2
Target Release:	3.3
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	RHEL: ceph-ansible-3.2.29-1.el7cp Ubuntu: ceph-ansible_3.2.29-2redhat1	Doc Type:	Bug Fix
Doc Text:	.The value of `osd_memory_target` for HCI deployment is calculated properly Previously, the calculation of the number of OSDs was not implemented for containerized deployment; the default value was `0`. Consequently, the calculation of the value of the BlueStore `osd_memory_target` option for Hyper-converged infrastructure (HCI) deployment was not correct. With this update, the number of OSDs is reported correctly for containerized deployment, and the value of `osd_memory_target` for the HCI configuration is calculated properly.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-12-19 17:59:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1578730, 1726135

Description Eliad Cohen 2019-01-07 17:56:49 UTC

Description of problem:
Upon deploying a stack using infrared generated playbooks in a hyperconverged configuration and with is_hci set to true (or false) - The value set for the cache falls-back to the default value of 4294967296 (4GB?).
Upon further scrutiny two things arise:
1. It seems that when running the playbooks the number of osd's is not calculated (in roles/ceph-config/tasks/main.yml, always skipped, defaults to zero). This may lead to a problem when calculating the cache size. In the deployment scenario we should have calculated 1 from the device list.

2. The code itself compares a value of the memory in megabytes (ansible_memtotal_mb is compared to the value of osd_memory_target,
roles/ceph-config/templates/ceph.conf.j2) - For a machine with 20 GB, 1 OSD, safety factor of 0.2, the conservative allotment per OSD will be 4,096MB yet when compared to osd_memory_target (which is in Bytes - four billion), the conditional will be false. - This was the scenario run here but the cache was not calculated correctly.

Version-Release number of selected component (if applicable):
ceph-ansible-3.2.0-1.el7cp.noarch

How reproducible:
Deploy hyperconverged nodes with ample ram for osdcompute machines (Scenario was demonstrated on OSP 13). Consistent on installations and updates

Steps to Reproduce:
1.deploy OSP13 using infrared, hyperconverged.
2.Use the, provide 20GB+ memory to the hyperconverged nodes is_hci flag set to true, single OSD. (To clarify - tried once with 20 then again with 32 as an update)
3. Look at ceph.conf for results

Actual results:
1.Theoretical cache should be 4.2GB but defaults to 4, when using 32GB theoretical cache should be 6.5GB

Expected results:
Cache should be calculated as per the formula outlined in code MAX[(RAM*SafetyFactor)/#OSDS | 4GB]

Additional info:
[1] ceph_osd_tree.json https://pastebin.com/raw/VqhAWQAq
[2] internal.yaml https://pastebin.com/raw/vcWbBDjB
[3] inventory.yaml https://pastebin.com/raw/KZgvD0ek
[4] ceph.conf

Comment 1 Eliad Cohen 2019-01-07 18:28:43 UTC

See also: ceph.conf result at: https://pastebin.com/raw/d4ePRFHG

Comment 2 Sébastien Han 2019-01-08 09:14:34 UTC

Assigning to Neha since she worked on the initial implementation.

Comment 3 Neha Ojha 2019-01-08 19:32:48 UTC

Can we have someone from QE reproduce this?

Comment 4 Guillaume Abrioux 2019-01-09 08:20:49 UTC

Hi Neha,

I think the issue here is that `num_osds` never got defined in the ceph-config role:

2019-01-04 15:30:51,087 p=7977 u=mistral |  TASK [ceph-config : count number of osds for ceph-disk scenarios] **************
2019-01-04 15:30:51,088 p=7977 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-config/tasks/main.yml:16
2019-01-04 15:30:51,088 p=7977 u=mistral |  Friday 04 January 2019  15:30:51 -0500 (0:00:00.035)       0:00:58.029 ********
2019-01-04 15:30:51,111 p=7977 u=mistral |  skipping: [192.168.24.11] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2019-01-04 15:30:51,122 p=7977 u=mistral |  TASK [ceph-config : count number of osds for lvm scenario] *********************
2019-01-04 15:30:51,122 p=7977 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-config/tasks/main.yml:23
2019-01-04 15:30:51,122 p=7977 u=mistral |  Friday 04 January 2019  15:30:51 -0500 (0:00:00.034)       0:00:58.064 ********
2019-01-04 15:30:51,145 p=7977 u=mistral |  skipping: [192.168.24.11] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2019-01-04 15:30:51,156 p=7977 u=mistral |  TASK [ceph-config : run 'ceph-volume lvm batch --report' to see how many osds are to be created] ***
2019-01-04 15:30:51,156 p=7977 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-config/tasks/main.yml:30
2019-01-04 15:30:51,156 p=7977 u=mistral |  Friday 04 January 2019  15:30:51 -0500 (0:00:00.033)       0:00:58.098 ********
2019-01-04 15:30:51,177 p=7977 u=mistral |  skipping: [192.168.24.11] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2019-01-04 15:30:51,188 p=7977 u=mistral |  TASK [ceph-config : set_fact num_osds from the output of 'ceph-volume lvm batch --report'] ***
2019-01-04 15:30:51,188 p=7977 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-config/tasks/main.yml:47
2019-01-04 15:30:51,188 p=7977 u=mistral |  Friday 04 January 2019  15:30:51 -0500 (0:00:00.032)       0:00:58.130 ********
2019-01-04 15:30:51,211 p=7977 u=mistral |  skipping: [192.168.24.11] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2019-01-04 15:30:51,222 p=7977 u=mistral |  TASK [ceph-config : run 'ceph-volume lvm list' to see how many osds have already been created] ***
2019-01-04 15:30:51,222 p=7977 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-config/tasks/main.yml:55
2019-01-04 15:30:51,222 p=7977 u=mistral |  Friday 04 January 2019  15:30:51 -0500 (0:00:00.034)       0:00:58.164 ********
2019-01-04 15:30:51,245 p=7977 u=mistral |  skipping: [192.168.24.11] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}
2019-01-04 15:30:51,256 p=7977 u=mistral |  TASK [ceph-config : set_fact num_osds from the output of 'ceph-volume lvm list'] ***
2019-01-04 15:30:51,256 p=7977 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-config/tasks/main.yml:66
2019-01-04 15:30:51,256 p=7977 u=mistral |  Friday 04 January 2019  15:30:51 -0500 (0:00:00.033)       0:00:58.198 ********
2019-01-04 15:30:51,281 p=7977 u=mistral |  skipping: [192.168.24.11] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}


so it's set to 0 as default value [1] (which is not a good default value I think)
it means it doesn't enter in the conditions [2][3] and it takes the default value [4] of `osd_memory_target`

[1] https://github.com/ceph/ceph-ansible/blob/v3.2.0/roles/ceph-config/templates/ceph.conf.j2#L155
[2] https://github.com/ceph/ceph-ansible/blob/v3.2.0/roles/ceph-config/templates/ceph.conf.j2#L157
[3] https://github.com/ceph/ceph-ansible/blob/v3.2.0/roles/ceph-config/templates/ceph.conf.j2#L162
[4] https://github.com/ceph/ceph-ansible/blob/v3.2.0/roles/ceph-config/templates/ceph.conf.j2#L168

Comment 6 Neha Ojha 2019-01-09 21:59:45 UTC

I think Guillaume's comment here https://bugzilla.redhat.com/show_bug.cgi?id=1664112#c4 makes sense. 
 
So, we have two aspects:

1. Currently, if number of osds are not determined correctly by the code, we default to 0. This does not permit any kind of further automation to calculate the value of osd_memory_target(meaning: none of the math is done). To prevent this, we can say that we will default to 1, and at least allow the calculation to happen, though it might not be perfect. 

This is easy to get into 3.x

2. We need to ensure that "num_osds" is populated correctly under all circumstances(understand why it was not done correctly in this case). A solution to this will be eradicate this problem.

I am not sure about the timeline for this.


Guillaume, what are your thoughts?

Comment 7 Guillaume Abrioux 2019-01-10 13:31:28 UTC

sticking to default 1 for num_osds isn't enough,
it means there's an issue with the current implementation, we must understand how we can fall in a case where it doesn't detect any OSD and fix it.

I can assist you to reproduce and figure out how to fix it, let me know.

Comment 8 Neha Ojha 2019-01-10 15:19:36 UTC

Guilluame, sure let's reproduce this. Do you already have an environment for it or else Elliad can help with that.

Comment 9 John Fulton 2019-01-23 16:41:09 UTC

(In reply to Neha Ojha from comment #8)
> Guilluame, sure let's reproduce this. Do you already have an environment for
> it or else Elliad can help with that.

Eliad can you contact Neha to provide an env?

Comment 12 Giridhar Ramaraju 2019-08-05 13:08:49 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 13 Giridhar Ramaraju 2019-08-05 13:10:12 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 14 Nathan Weinberg 2019-09-16 18:54:55 UTC

Tested this today with an OSP13 deployment with 3 controller and 3 hci-ceph-all nodes. Specified override.hcicephall.memory=131072 in infrared for deployment with 5 OSDs per node, such that the memory target allocation for each node should have been larger than 4GB. However, upon inspection of the ceph.conf file on the hci-ceph-all nodes the value of osd memory target was 4294967296, i.e. the default value.

Jordan (infrared plugin we use for automated testing of ceph integration) patch with the test I used today can be found here: https://review.gerrithub.io/c/rhos-infra/jordan/+/468500

Some additional details: 

ansible_memtotal_mb (found on hci-ceph-all nodes after deployment): 128773
Core Puddle: 2019-09-05.1
Ceph Image: 3-31
ceph-ansible version: ceph-ansible-3.2.24-1.el7cp.noarch
puppet-ceph: puppet-ceph-2.5.1-2.git372379b.el7ost.noarch

Comment 15 Eliad Cohen 2019-09-19 17:04:03 UTC

Should this be changed back to ASSIGNED ?

Comment 36 errata-xmlrpc 2019-12-19 17:59:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:4353