1920507 – Creation of cephblockpool with compression failed on timeout

Bug 1920507 - Creation of cephblockpool with compression failed on timeout

Summary: Creation of cephblockpool with compression failed on timeout

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	OCS 4.7.0
Assignee:	Jose A. Rivera
QA Contact:	Shay Rozen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1926154
TreeView+	depends on / blocked

Reported:	2021-01-26 12:39 UTC by Avi Liani
Modified:	2021-08-23 14:45 UTC (History)
CC List:	10 users (show)
Fixed In Version:	4.7.0-723.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-19 09:18:35 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:2041	0	None	None	None	2021-05-19 09:19:02 UTC

Description Avi Liani 2021-01-26 12:39:45 UTC

Description of problem (please be detailed as possible and provide log
snippests):

while trying to create new pool with compression enable, the operation filed on timeout and the pool on the back-end is not creating.


Version of all relevant components (if applicable):

OCP version : 4.7.0-0.nightly-2021-01-26-044139
OCS Version : ocs-operator.v4.7.0-238.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes, can not used it for compressed data

Is there any workaround available to the best of your knowledge?

No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1

Can this issue reproducible?

Yes


Can this issue reproduce from the UI?

Yes


If this is a regression, please provide more details to justify this:

Yes, it work fine on :
OCP : 4.7.0-0.nightly-2021-01-19-095812
OCS : 4.7.0-231.ci

Steps to Reproduce:
1. Deploy OCS
2. try to create new SC with New pool, with compression enable



Actual results:

pool creation filed on timeout

Expected results:

New pool and new SC will be created.


Additional info:

Trying to create the pool again, said that the pool already exists.
checking with the CLI command :

   oc get cephblockpool

show the the pool exists, but checking directly on the rook-ceph-tools pod with the command :

   ceph osd lspools

indeed show the the pool does not exists.

Comment 3 Avi Liani 2021-01-26 12:41:57 UTC

collection all must-gather logs, and will upload when it will be ready

Comment 5 Avi Liani 2021-02-04 06:58:19 UTC

any update on this ?

try it on :
OCp :     4.7.0-0.nightly-2021-02-01-145821
OCS : 4.7.0-251.ci

and it is not working.

Comment 6 Jose A. Rivera 2021-02-08 14:05:01 UTC

I see the following error message in the rook-ceph-operator log:

ceph-block-pool-controller: failed to reconcile. failed to create pool "cbp-test-d1d1584a0ab54ce19887f7fd8f96929".: failed to create pool "cbp-test-d1d1584a0ab54ce19887f7fd8f96929".: failed to create pool "cbp-test-d1d1584a0ab54ce19887f7fd8f96929": failed to create replicated pool cbp-test-d1d1584a0ab54ce19887f7fd8f96929. Error ERANGE:  pg_num 32 size 3 would mean 912 total pgs, which exceeds max 900 (mon_max_pg_per_osd 300 * num_in_osds 3)
2021-01-26T12:46:43.501665171Z . : exit status 34

This seems pretty stratghtforward, we are hitting our configuration limits. I think we've hit this before. Travis, is this the same issue with the autoscaler we've seen?

Comment 7 Travis Nielsen 2021-02-08 14:20:34 UTC

Yes, this is the same error we have seen before from the autoscaler. We adjusted the PGs per OSD to 300 with this PR:
https://github.com/openshift/ocs-operator/pull/989

Now the question is what the auto-scaler has adjusted the PGs to in this cluster. 

What are the PGs per pool as seen with "ceph osd pool ls detail"?

Comment 8 Avi Liani 2021-02-08 14:46:30 UTC

(In reply to Travis Nielsen from comment #7)
> Yes, this is the same error we have seen before from the autoscaler. We
> adjusted the PGs per OSD to 300 with this PR:
> https://github.com/openshift/ocs-operator/pull/989
> 
> Now the question is what the auto-scaler has adjusted the PGs to in this
> cluster. 
> 
> What are the PGs per pool as seen with "ceph osd pool ls detail"?

the results from latest install cluster (today) are :

# ceph osd pool ls detail              
pool 1 'ocs-storagecluster-cephblockpool' replicated size 4 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 24 lfor 0/0/22 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd
	removed_snaps [1~3]

Comment 9 Jose A. Rivera 2021-02-08 15:07:29 UTC

A question: Is this related to any feature that is officially slated for OCS 4.7? My understanding is that we are not supporting the creation of additional CephBlockPools yet. If so, we should move this to OCS 4.8.

Comment 10 Avi Liani 2021-02-08 15:13:20 UTC

(In reply to Jose A. Rivera from comment #9)
> A question: Is this related to any feature that is officially slated for OCS
> 4.7? My understanding is that we are not supporting the creation of
> additional CephBlockPools yet. If so, we should move this to OCS 4.8.

this is mandatory for the compression, and AFAIK we are supporting the compression from 4.6, so this is a regression and a blocker for 4.7

Comment 11 Mudit Agarwal 2021-02-08 16:29:22 UTC

Hitting something similar in https://bugzilla.redhat.com/show_bug.cgi?id=1926312

Comment 12 Jose A. Rivera 2021-02-08 17:30:28 UTC

(In reply to Avi Liani from comment #10)
> (In reply to Jose A. Rivera from comment #9)
> > A question: Is this related to any feature that is officially slated for OCS
> > 4.7? My understanding is that we are not supporting the creation of
> > additional CephBlockPools yet. If so, we should move this to OCS 4.8.
> 
> this is mandatory for the compression, and AFAIK we are supporting the
> compression from 4.6, so this is a regression and a blocker for 4.7

I understand this is mandatory for compression, but that does not answer my question. Are we explicitly allowing *multiple* CephBlockPools outside the default one we create, or are we just supporting one CephBlockPool with compression enabled?

Comment 13 Shay Rozen 2021-02-08 18:20:12 UTC

We are allowing multiple CephBlockPools outside the default - https://issues.redhat.com/browse/KNIP-1462

I found something that is causing the pg_limit to hit faster.

In 4.6 ocs-storagecluster-cephblockpool had 32 pg
In 4.7 ocs-storagecluster-cephblockpool has 128 pg 

and for that the pg_limit is hitting faster.

Now how come the pool got bigger X4 and why?

From 4.6
pool 2 'ocs-storagecluster-cephblockpool' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 14 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd

from 4.7 
pool 1 'ocs-storagecluster-cephblockpool' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 42 lfor 0/0/29 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd



In 4.6 multiple storageclass and hence multiple pool with compression and replication was introduced. The pg limit was an issue after adding 2 RBD pools and now customer with 3 osd can't add any pool unless he adds capacity.

Comment 14 Travis Nielsen 2021-02-08 21:32:51 UTC

Yes, the autoscaler is affecting the higher PG counts in 4.7. 
Agreed that this is the same root cause as https://bugzilla.redhat.com/show_bug.cgi?id=1926312 as mentioned by Mudit.

If we want to allow for more pools, we need to increase the PG count to 400 per OSD as mentioned in the other BZ. Otherwise, we get blocked when creating any additional pools, which would be a regression from 4.6.

Comment 15 Shay Rozen 2021-02-09 12:47:04 UTC

Will a new CephBlockPool that user creates will have also 128 PGs? If that's true I don't think that 400 per OSD will be enough. Any pool that will be created will have another 384 PGs on the cluster.
In 4.6 we could create 2 more with replica 3 and if using replica 2 you could add like ~3.

Comment 16 Shay Rozen 2021-02-09 13:00:46 UTC

I checked on a cluster which was added capacity so I can create another CephBlockPool and the pg_num is 32.
Why default CephBlockPool has 128 and user created has only 32? They both have "autoscale_mode on".

Comment 17 Travis Nielsen 2021-02-09 19:18:29 UTC

The commit message on this PR explains the details of the PGs and autoscaling that may help understand, but it's all related to the target_size_ratio and decisions the ceph auto scaler is making:
https://github.com/openshift/ocs-operator/pull/1047

Comment 18 Mudit Agarwal 2021-02-10 11:07:46 UTC

Should be fixed via https://bugzilla.redhat.com/show_bug.cgi?id=1926312

Comment 21 Shay Rozen 2021-02-11 22:55:57 UTC

So why ceph auto scaler decides with no actions on the pool that the default is 128 pg and user created pool is 32? What is the logic behind this?

Comment 22 Travis Nielsen 2021-02-15 19:10:21 UTC

@Shay The auto scaler takes many factors such as cluster usage and the target_size_ratio of the pool. The difference with the user-created pools is that they don't set the target_size_ratio, but OCS does set it.

Comment 23 Avi Liani 2021-02-16 08:31:06 UTC

I just create new SC with new pool as compressed pool and replica-3 on :

Platform : Vmware-Dynamic
OCP Version : 4.7.0-rc.2
OCS Version : ocs-operator.v4.7.0-262.ci

# oc get sc
NAME                          PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
cmp-testing                   openshift-storage.rbd.csi.ceph.com      Delete          Immediate           false                  117s


# oc get cephblockpool
NAME                               AGE
ocs-storagecluster-cephblockpool   49m
sc-pool-cmp                        16m


# oc get cephblockpool sc-pool-cmp -o yaml
...
spec:
  compressionMode: aggressive
  crushRoot: ""
  deviceClass: ""
  enableRBDStats: false
  erasureCoded:
    codingChunks: 0
    dataChunks: 0
  failureDomain: ""
  mirroring: {}
  parameters:
    compression_mode: aggressive
  replicated:
    requireSafeReplicaSize: false
    size: 3
    targetSizeRatio: 0
  statusCheck:
    mirror: {}
status:
  phase: Ready

so this BZ can be verifyed.

Comment 26 errata-xmlrpc 2021-05-19 09:18:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041

Comment 27 Jilju Joy 2021-08-23 14:44:20 UTC

Covered in ocs-ci test tests/manage/storageclass/test_create_2_sc_with_1_pool_comp_rep2.py

Note You need to log in before you can comment on or make changes to this bug.