1871035 – [Ceph-Ansible]: ceph-ansible (3.2) deployment fails on pool creation because of exceeding max pgs value

Bug 1871035 - [Ceph-Ansible]: ceph-ansible (3.2) deployment fails on pool creation because of exceeding max pgs value

Summary: [Ceph-Ansible]: ceph-ansible (3.2) deployment fails on pool creation because ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	3.3z7
Assignee:	Guillaume Abrioux
QA Contact:	Ameena Suhani S H
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-21 08:01 UTC by RPietrzak
Modified:	2021-05-06 18:32 UTC (History)
CC List:	6 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.2.52-1.el7cp Ubuntu: ceph-ansible_3.2.52-2redhat1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-06 18:32:04 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 5704	0	None	closed	Remove 'run_once: true' from wait 'for all osd to be up' task in ceph…	2021-02-09 20:41:17 UTC
Red Hat Product Errata	RHSA-2021:1518	0	None	None	None	2021-05-06 18:32:30 UTC

Description RPietrzak 2020-08-21 08:01:40 UTC

Description of problem:
During the deployment of new Ceph cluster deployment is failing with error:

TASK [ceph-osd : create openstack pool(s)

["Error ERANGE:  pg_num 512 size 3 would mean 1728 total pgs, which exceeds max 750 (mon_max_pg_per_osd 250 * num_in_osds 3)"]`

Actual nr of osds: 32

Cause:
[ceph-osd : wait for all osd to be up] task is skipped because of `run_once: true` connected together with condition - `inventory_hostname == ansible_play_hosts_all | last` will cause that this task will be skipped if `osds` group contains more than 1 host.


Version-Release number of selected component (if applicable):
ceph-ansible-3.2.43-1.el7cp (stable-3.2)
ceph rhcs3.3z5 (12.2.12-115.el7cp)
ansible 2.6.18

How reproducible:
easy

Steps to Reproduce:
1. Deploy of Ceph with more than one OSDs host in `osds` group
2. If we want to see error related to pg max exceeded we need to create openstack pools that will have enough big PG numbers, size etc. so they could not be created if some of OSD will not be up at the pool creation, but this is just result of the wait-for-osd not happening at first place.

example config:

Actual results:
TASK [ceph-osd : wait for all osd to be up]
skipping: [OSD-1]

Expected results:
TASK [ceph-osd : wait for all osd to be up]
skipping: [OSD-1]
skipping: [OSD-2]
skipping: [OSD-3]
FAILED - RETRYING: wait for all osd to be up (60 retries left).
FAILED - RETRYING: wait for all osd to be up (59 retries left).
ok: [OSD-4 -> MON-1]

Additional info:
This issue is present only in ceph-ansible stable-3.2 (stable-4.0 and stable-5.0 looks fine).

In the master branch it was fixed with (plus some other changes):
https://github.com/ceph/ceph-ansible/commit/af6875706af93f133299156403f51d3ad48d17d3

Comment 1 RPietrzak 2020-08-21 08:17:22 UTC

https://github.com/ceph/ceph-ansible/pull/5704

Comment 7 errata-xmlrpc 2021-05-06 18:32:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 3.3 Security and Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1518

Note You need to log in before you can comment on or make changes to this bug.