Bug 1603551

Summary: OSP13 deploy fails pg count exceeds max
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Alex Krzos <akrzos>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact: Bara Ancincova <bancinco>
Priority: unspecified    
Version: 3.1CC: aschoen, ceph-eng-bugs, ceph-qe-bugs, gabrioux, gmeno, hnallurv, kdreyer, nthomas, pasik, tchandra, tserlin, yrabl
Target Milestone: z2   
Target Release: 3.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.30-1.el7cp Ubuntu: ceph-ansible_3.2.30-2redhat1 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-19 17:58:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
pgcalc for derived values for OpenStack Ceph Pools none

Description Alex Krzos 2018-07-19 15:21:52 UTC
Created attachment 1460856 [details]
pgcalc for derived values for OpenStack Ceph Pools

Description of problem:
Deployed OSP13 GA with Ceph Storage and configured the pg count for each of the 5 OpenStack Ceph pools with the following snippet from our templates:

  CephPools:
    - name: images
      pg_num: 256
      rule_name: ""
    - name: metrics
      pg_num: 16
      rule_name: ""
    - name: backups
      pg_num: 16
      rule_name: ""
    - name: vms
      pg_num: 1024
      rule_name: ""
    - name: volumes
      pg_num: 256
      rule_name: ""

That is a total of 1568 pgs.  We have a total of 20 configured OSDs in this testbed so we should be able to have (200 * 20) 4000 pgs.  

The deploy failed on ceph ansible with:
https://gist.githubusercontent.com/akrzos/809f744fbc95110b0b89d7fae30082c0/raw/c5a67eb6686ad466f75f2d2045e637a73186c080/gistfile1.txt

I am not sure of why it only showed 10 OSDs in the calculation (2000 (mon_max_pg_per_osd 200 * num_in_osds 10) or why the calculated (3936 required pgs was much higher than what we actually configured (1568).  


Version-Release number of selected component (if applicable):
OSP13 with Ceph Storage - GA build

Undercloud:
(undercloud) [stack@b04-h01-1029p ~]$ rpm -qa | grep ceph
puppet-ceph-2.5.0-1.el7ost.noarch
ceph-ansible-3.1.0-0.1.rc9.el7cp.noarch

Controller:
[root@overcloud-controller-0 ~]# rpm -qa | grep ceph
collectd-ceph-5.8.0-10.el7ost.x86_64
puppet-ceph-2.5.0-1.el7ost.noarch
ceph-mds-12.2.4-10.el7cp.x86_64
libcephfs2-12.2.4-10.el7cp.x86_64
ceph-base-12.2.4-10.el7cp.x86_64
ceph-radosgw-12.2.4-10.el7cp.x86_64
python-cephfs-12.2.4-10.el7cp.x86_64
ceph-selinux-12.2.4-10.el7cp.x86_64
ceph-mon-12.2.4-10.el7cp.x86_64
ceph-common-12.2.4-10.el7cp.x86_64



How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
Overcloud failed

Expected results:
Ceph Ansible to not block deployment

Additional info:
This bug seems similar and is fixed in an earlier version than the version of Ceph ansible used here:
https://bugzilla.redhat.com/show_bug.cgi?id=1578086

Comment 3 Ken Dreyer (Red Hat) 2018-07-26 16:38:21 UTC
Alex, would you please say why you see this issue as different than https://bugzilla.redhat.com/show_bug.cgi?id=1578086 ?

Comment 4 Alex Krzos 2018-07-27 14:56:22 UTC
(In reply to Ken Dreyer (Red Hat) from comment #3)
> Alex, would you please say why you see this issue as different than
> https://bugzilla.redhat.com/show_bug.cgi?id=1578086 ?

Hi Ken,

I am not positive it is a different issue, I was just instructed to open a new bug while reporting this on #ceph-dfg irc channel. Considering the original bug was already marked as fixed and I was using a build > when the fix had posted, I didn't think opening a new bug would be a bad idea.  If it is the same issue by all means use the existing bug or maybe it makes sense to track this as new one, which ever works best for you guys, just let me know if you need to check anything on our systems to help fix.

Thanks,

-Alex

Comment 6 Giridhar Ramaraju 2019-08-05 13:10:04 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 7 Giridhar Ramaraju 2019-08-05 13:11:12 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 8 Giridhar Ramaraju 2019-08-20 07:17:12 UTC
Level setting the severity of this defect to "High" with a bulk update. Pls
refine it to a more closure value, as defined by the severity definition in
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity

Comment 14 Yogev Rabl 2019-11-26 16:37:08 UTC
Verified

Comment 18 errata-xmlrpc 2019-12-19 17:58:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:4353