1562172 – overcloud deployment with rgw failed with miscalculation in the validation of the number of osds in the cluster

Bug 1562172 - overcloud deployment with rgw failed with miscalculation in the validation of the number of osds in the cluster

Summary: overcloud deployment with rgw failed with miscalculation in the validation of...

Keywords:
Status:	CLOSED DUPLICATE of bug 1539852
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	James Slagle
QA Contact:	Arik Chernetsky
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-29 17:18 UTC by Yogev Rabl
Modified:	2018-04-11 12:29 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-11 12:29:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Yogev Rabl 2018-03-29 17:18:16 UTC

Description of problem:
A deployment of an overcloud with RGW failed with the error: 
2018-03-29 10:08:30,058 p=5245 u=mistral |  failed: [192.168.24.8] (item={u'rule_name': u'', u'pg_num': 32, u'name': u'vms'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-control
ler-2", "ceph", "--cluster", "ceph", "osd", "pool", "create", "vms", "32", "32", "replicated"], "delta": "0:00:01.070260", "end": "2018-03-29 14:08:28.030349", "item": {"name": "vms", "pg_num
": 32, "rule_name": ""}, "msg": "non-zero return code", "rc": 34, "start": "2018-03-29 14:08:26.960089", "stderr": "Error ERANGE:  pg_num 32 size 3 would mean 672 total pgs, which exceeds max
 600 (mon_max_pg_per_osd 200 * num_in_osds 3)", "stderr_lines": ["Error ERANGE:  pg_num 32 size 3 would mean 672 total pgs, which exceeds max 600 (mon_max_pg_per_osd 200 * num_in_osds 3)"], "
stdout": "", "stdout_lines": []}
2018-03-29 10:08:31,360 p=5245 u=mistral |  failed: [192.168.24.8] (item={u'rule_name': u'', u'pg_num': 32, u'name': u'volumes'}) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-con
troller-2", "ceph", "--cluster", "ceph", "osd", "pool", "create", "volumes", "32", "32", "replicated"], "delta": "0:00:01.064665", "end": "2018-03-29 14:08:29.333740", "item": {"name": "volum
es", "pg_num": 32, "rule_name": ""}, "msg": "non-zero return code", "rc": 34, "start": "2018-03-29 14:08:28.269075", "stderr": "Error ERANGE:  pg_num 32 size 3 would mean 672 total pgs, which
 exceeds max 600 (mon_max_pg_per_osd 200 * num_in_osds 3)", "stderr_lines": ["Error ERANGE:  pg_num 32 size 3 would mean 672 total pgs, which exceeds max 600 (mon_max_pg_per_osd 200 * num_in_
osds 3)"], "stdout": "", "stdout_lines": []}

The number of OSDs in the validation is wrong, it assumes that the number of OSDs that will be deployed is 3. 

Version-Release number of selected component (if applicable):
ansible-tripleo-ipsec-8.1.1-0.20180303222819.8f5369a.el7ost.noarch
openstack-tripleo-heat-templates-8.0.0-0.20180304031148.el7ost.noarch
openstack-tripleo-common-containers-8.5.1-0.20180304032202.e8d9da9.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-0.20180304005217.dabb361.el7ost.noarch
python-tripleoclient-9.1.1-0.20180305094421.90727db.el7ost.noarch
openstack-tripleo-image-elements-8.0.0-0.20180304011935.e427c90.el7ost.noarch
puppet-tripleo-8.3.1-0.20180304033908.ed3285e.el7ost.noarch
openstack-tripleo-common-8.5.1-0.20180304032202.e8d9da9.el7ost.noarch
openstack-tripleo-validations-8.3.1-0.20180304031640.d5546cd.el7ost.noarch
ceph-ansible-3.1.0-0.1.beta4.el7cp.noarch

How reproducible:
100%

Steps to Reproduce:
1. deploy an overcloud with RGW and a single node that will run 5 osds

Actual results:
Deployment fails, the validation shows there are insufficient number of OSDs in the cluster

Expected results:
Deployment is successful, this validation runs after the deployment has create the OSDs, so it has the right amount of OSDs in the cluster

Additional info:
deployment command
openstack overcloud deploy \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
--environment-file /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \
--environment-file /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/ceph-single-host-mode.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /home/stack/virt/docker-images.yaml \
--log-file overcloud_deployment_13.log

Comment 1 John Fulton 2018-04-11 12:29:23 UTC

Hi Yogev,

Please see the duplicate bug and try to resolve this as described in the following comment: 

 https://bugzilla.redhat.com/show_bug.cgi?id=1539852#c17

  JOhn

*** This bug has been marked as a duplicate of bug 1539852 ***

Note You need to log in before you can comment on or make changes to this bug.