Bug 1890135
Summary: | When PG limit is reach via pool creation, pool is listed in oc get cephblockpool but in ceph level it is not created | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Shay Rozen <srozen> | ||||||
Component: | Console Storage Plugin | Assignee: | gowtham <gshanmug> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Shay Rozen <srozen> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 4.6 | CC: | afrahman, anbehl, aos-bugs, etamir, jdurgin, jefbrown, madam, muagarwa, nberry, nithin.thomas, nthomas, ocs-bugs, shan, smordech, tnielsen, vbadrina, ygalanti | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | 4.8.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2021-04-15 10:22:03 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Shay Rozen
2020-10-21 14:01:13 UTC
UI shows the error message if the Pool CephBlockPool object created by user gets in failed state (due to some error in creation at Ceph side). So, even if the pool creation is failed at Ceph, the corresponding CephBlockPool k8s object will still exist in CephBlockPool list (in failed status). Nishanth, Travis, any chance this BZ is similar to bug 1748001? Also, the fact that the pool is not cleaned up and keep being listed by oc get cephblockpool is a bug. Therefore, re-opening. Nishanth/Kanika, feel free to move to the correct component Moving the Bz to rook for travis to take a look. Doesn't belong to UI There are a couple of approaches to this issue. 1. The UI could stop allowing pool creation once the PG limit is hit. 2. Rook could automatically increase the PG limit if pool creation hits the limit. #1 is disruptive to the user creation of pools, so clearly #2 is preferred. @Josh Any concern with Rook increasing the max PG count automatically when needed to create a new pool? OCS users don't know anything about PG management. Shay, please paste the output of the following: ceph osd dump ceph pg dump Or else, provide the must-gather logs. Thanks (In reply to Travis Nielsen from comment #7) > There are a couple of approaches to this issue. > 1. The UI could stop allowing pool creation once the PG limit is hit. > 2. Rook could automatically increase the PG limit if pool creation hits the > limit. > > #1 is disruptive to the user creation of pools, so clearly #2 is preferred. > > @Josh Any concern with Rook increasing the max PG count automatically when > needed to create a new pool? OCS users don't know anything about PG > management. Yes, pgs take up finite memory/cpu resources, so OCS should prevent users from taking up too many. Limiting the number of pools in the UI makes sense to me, and I thought was the direction we were already headed for this. I do remember now that we were going limit the number of pools. Creating three additional pools should cover production needs for now. @Nithin @Eran In 4.6 can we restrict the number of pools created in the UI to 3 (or 4 if it's allowed)? Let's see in 4.6 if this is sufficient, or what feedback we get from customers. @Josh remind me... in larger clusters could we support more PGs? Some customer will surely want to create more pools at some point. There were some discussion to have PG count validation from admission controller https://issues.redhat.com/browse/RHSTOR-1257, because this will provide same experience with CLI and UI. (In reply to Travis Nielsen from comment #10) > I do remember now that we were going limit the number of pools. Creating > three additional pools should cover production needs for now. > > @Nithin @Eran In 4.6 can we restrict the number of pools created in the UI > to 3 (or 4 if it's allowed)? Let's see in 4.6 if this is sufficient, or what > feedback we get from customers. > > @Josh remind me... in larger clusters could we support more PGs? Some > customer will surely want to create more pools at some point. Yes, we're running into this particularly with OCS due to the small initial size. We target 100 pgs per OSD, with a hard cutoff at 300 by default. A larger cluster will allow more pools before hitting this limit. (In reply to Kanika Murarka from comment #11) > There were some discussion to have PG count validation from admission > controller https://issues.redhat.com/browse/RHSTOR-1257, because this will > provide same experience with CLI and UI. Right, the admission controller is needed for this. Until then, can another check be added to the UI such as limiting the number of pools created in the UI to 3? Moving it back to mgmt-console based on https://bugzilla.redhat.com/show_bug.cgi?id=1890135#c13 Created attachment 1724833 [details]
ceph osd dump & ceph pg dump
in this case, I would say that the pool was not created because we have reached the PG limit. does that make sense? I have opened https://bugzilla.redhat.com/show_bug.cgi?id=1946243 for block pool under ocs-operator page. Maybe the fix could be for both of them. (In reply to Travis Nielsen from comment #13) > (In reply to Kanika Murarka from comment #11) > > There were some discussion to have PG count validation from admission > > controller https://issues.redhat.com/browse/RHSTOR-1257, because this will > > provide same experience with CLI and UI. > > Right, the admission controller is needed for this. Until then, can another > check be added to the UI such as limiting the number of pools created in the > UI to 3? 3 sounds very low to me. Are we sure? 3 is certainly too limiting. Likely many more can be created, it just depends on the number of OSDs. As suggested in the linked BZ, sounds better to show an alert that tells them they need to expand the cluster, rather than adding a hard limit. @anbehl Yes Rook just started adding events to CRs, so we are planning to add an event to the pools in case of failure too. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |