Bug 1871035 - [Ceph-Ansible]: ceph-ansible (3.2) deployment fails on pool creation because of exceeding max pgs value
Summary: [Ceph-Ansible]: ceph-ansible (3.2) deployment fails on pool creation because ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.3
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 3.3z7
Assignee: Guillaume Abrioux
QA Contact: Ameena Suhani S H
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-21 08:01 UTC by RPietrzak
Modified: 2021-05-06 18:32 UTC (History)
6 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.52-1.el7cp Ubuntu: ceph-ansible_3.2.52-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-06 18:32:04 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 5704 0 None closed Remove 'run_once: true' from wait 'for all osd to be up' task in ceph… 2021-02-09 20:41:17 UTC
Red Hat Product Errata RHSA-2021:1518 0 None None None 2021-05-06 18:32:30 UTC

Description RPietrzak 2020-08-21 08:01:40 UTC
Description of problem:
During the deployment of new Ceph cluster deployment is failing with error:

TASK [ceph-osd : create openstack pool(s)

["Error ERANGE:  pg_num 512 size 3 would mean 1728 total pgs, which exceeds max 750 (mon_max_pg_per_osd 250 * num_in_osds 3)"]`

Actual nr of osds: 32

Cause:
[ceph-osd : wait for all osd to be up] task is skipped because of `run_once: true` connected together with condition - `inventory_hostname == ansible_play_hosts_all | last` will cause that this task will be skipped if `osds` group contains more than 1 host.


Version-Release number of selected component (if applicable):
ceph-ansible-3.2.43-1.el7cp (stable-3.2)
ceph rhcs3.3z5 (12.2.12-115.el7cp)
ansible 2.6.18

How reproducible:
easy

Steps to Reproduce:
1. Deploy of Ceph with more than one OSDs host in `osds` group
2. If we want to see error related to pg max exceeded we need to create openstack pools that will have enough big PG numbers, size etc. so they could not be created if some of OSD will not be up at the pool creation, but this is just result of the wait-for-osd not happening at first place.

example config:

Actual results:
TASK [ceph-osd : wait for all osd to be up]
skipping: [OSD-1]

Expected results:
TASK [ceph-osd : wait for all osd to be up]
skipping: [OSD-1]
skipping: [OSD-2]
skipping: [OSD-3]
FAILED - RETRYING: wait for all osd to be up (60 retries left).
FAILED - RETRYING: wait for all osd to be up (59 retries left).
ok: [OSD-4 -> MON-1]

Additional info:
This issue is present only in ceph-ansible stable-3.2 (stable-4.0 and stable-5.0 looks fine).

In the master branch it was fixed with (plus some other changes):
https://github.com/ceph/ceph-ansible/commit/af6875706af93f133299156403f51d3ad48d17d3

Comment 1 RPietrzak 2020-08-21 08:17:22 UTC
https://github.com/ceph/ceph-ansible/pull/5704

Comment 7 errata-xmlrpc 2021-05-06 18:32:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 3.3 Security and Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1518


Note You need to log in before you can comment on or make changes to this bug.