Bug 1603551 - OSP13 deploy fails pg count exceeds max
Summary: OSP13 deploy fails pg count exceeds max
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: z2
: 3.3
Assignee: Guillaume Abrioux
QA Contact: Vasishta
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-19 15:21 UTC by Alex Krzos
Modified: 2019-12-19 17:59 UTC (History)
12 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.30-1.el7cp Ubuntu: ceph-ansible_3.2.30-2redhat1
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-19 17:58:55 UTC
Embargoed:


Attachments (Terms of Use)
pgcalc for derived values for OpenStack Ceph Pools (32.46 KB, image/png)
2018-07-19 15:21 UTC, Alex Krzos
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 4525 0 'None' closed osd: refact 'wait for all osd to be up' task 2020-02-12 23:29:39 UTC
Red Hat Product Errata RHSA-2019:4353 0 None None None 2019-12-19 17:59:11 UTC

Description Alex Krzos 2018-07-19 15:21:52 UTC
Created attachment 1460856 [details]
pgcalc for derived values for OpenStack Ceph Pools

Description of problem:
Deployed OSP13 GA with Ceph Storage and configured the pg count for each of the 5 OpenStack Ceph pools with the following snippet from our templates:

  CephPools:
    - name: images
      pg_num: 256
      rule_name: ""
    - name: metrics
      pg_num: 16
      rule_name: ""
    - name: backups
      pg_num: 16
      rule_name: ""
    - name: vms
      pg_num: 1024
      rule_name: ""
    - name: volumes
      pg_num: 256
      rule_name: ""

That is a total of 1568 pgs.  We have a total of 20 configured OSDs in this testbed so we should be able to have (200 * 20) 4000 pgs.  

The deploy failed on ceph ansible with:
https://gist.githubusercontent.com/akrzos/809f744fbc95110b0b89d7fae30082c0/raw/c5a67eb6686ad466f75f2d2045e637a73186c080/gistfile1.txt

I am not sure of why it only showed 10 OSDs in the calculation (2000 (mon_max_pg_per_osd 200 * num_in_osds 10) or why the calculated (3936 required pgs was much higher than what we actually configured (1568).  


Version-Release number of selected component (if applicable):
OSP13 with Ceph Storage - GA build

Undercloud:
(undercloud) [stack@b04-h01-1029p ~]$ rpm -qa | grep ceph
puppet-ceph-2.5.0-1.el7ost.noarch
ceph-ansible-3.1.0-0.1.rc9.el7cp.noarch

Controller:
[root@overcloud-controller-0 ~]# rpm -qa | grep ceph
collectd-ceph-5.8.0-10.el7ost.x86_64
puppet-ceph-2.5.0-1.el7ost.noarch
ceph-mds-12.2.4-10.el7cp.x86_64
libcephfs2-12.2.4-10.el7cp.x86_64
ceph-base-12.2.4-10.el7cp.x86_64
ceph-radosgw-12.2.4-10.el7cp.x86_64
python-cephfs-12.2.4-10.el7cp.x86_64
ceph-selinux-12.2.4-10.el7cp.x86_64
ceph-mon-12.2.4-10.el7cp.x86_64
ceph-common-12.2.4-10.el7cp.x86_64



How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
Overcloud failed

Expected results:
Ceph Ansible to not block deployment

Additional info:
This bug seems similar and is fixed in an earlier version than the version of Ceph ansible used here:
https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Comment 3 Ken Dreyer (Red Hat) 2018-07-26 16:38:21 UTC
Alex, would you please say why you see this issue as different than https://bugzilla.redhat.com/show_bug.cgi?id=1578086 ?

Comment 4 Alex Krzos 2018-07-27 14:56:22 UTC
(In reply to Ken Dreyer (Red Hat) from comment #3)
> Alex, would you please say why you see this issue as different than
> https://bugzilla.redhat.com/show_bug.cgi?id=1578086 ?

Hi Ken,

I am not positive it is a different issue, I was just instructed to open a new bug while reporting this on #ceph-dfg irc channel. Considering the original bug was already marked as fixed and I was using a build > when the fix had posted, I didn't think opening a new bug would be a bad idea.  If it is the same issue by all means use the existing bug or maybe it makes sense to track this as new one, which ever works best for you guys, just let me know if you need to check anything on our systems to help fix.

Thanks,

-Alex

Comment 6 Giridhar Ramaraju 2019-08-05 13:10:04 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 7 Giridhar Ramaraju 2019-08-05 13:11:12 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 8 Giridhar Ramaraju 2019-08-20 07:17:12 UTC
Level setting the severity of this defect to "High" with a bulk update. Pls
refine it to a more closure value, as defined by the severity definition in
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity

Comment 14 Yogev Rabl 2019-11-26 16:37:08 UTC
Verified

Comment 18 errata-xmlrpc 2019-12-19 17:58:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:4353


Note You need to log in before you can comment on or make changes to this bug.