Bug 1946243

Summary: No relevant error when pg limit is reached in block pools page
Product: OpenShift Container Platform Reporter: Shay Rozen <srozen>
Component: Console Storage PluginAssignee: gowtham <gshanmug>
Status: CLOSED ERRATA QA Contact: Shay Rozen <srozen>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.8CC: afrahman, anbehl, aos-bugs, gshanmug, jefbrown, madam, nberry, nthomas, ocs-bugs
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1959715 (view as bug list) Environment:
Last Closed: 2021-07-27 22:57:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1959715    
Attachments:
Description Flags
rbd pool create page after failure
none
storageclass create page failure
none
fixed_timed_out none

Description Shay Rozen 2021-04-05 13:46:56 UTC
Created attachment 1769267 [details]
rbd pool create page after failure

Description of problem (please be detailed as possible and provide log
snippests):
When reaching the pg limit by creating multiple pools, the console only state "Failure" instead of "Ready". I think the user should be notified when pg limit is reached.
Also in the storageclass creation you can create a pool and there the error message is "Pool as09 creation timed out. Please check if ocs-operator and rook operator are running". If you need a separate BZ for that let me know please. 


Version of all relevant components (if applicable):
ocs4.8 
ocp4.8 

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?
Only from UI

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install OCP and OCS
2. Create rbd pools till failure



Actual results:
User get the word "Failure" when pg limit is reached

Expected results:
User should get appropiate error regarding pg limit is reached.

Additional info:

Comment 1 Shay Rozen 2021-04-05 13:47:40 UTC
Created attachment 1769268 [details]
storageclass create page failure

Comment 3 Shay Rozen 2021-04-06 11:24:50 UTC
There is a same issue with pool creation in storageclass page. Maybe the fix can be for both https://bugzilla.redhat.com/show_bug.cgi?id=1890135

Comment 4 Yaniv Kaul 2021-04-06 12:22:36 UTC
I wonder what the user can do about such an alert.
How many pools are we talking about here?

Comment 6 Shay Rozen 2021-04-06 13:37:28 UTC
I hit 15 (with replication 2) till I've got the failure. With replication 3 I think it will hit sooner. Is 4.8.0-303.ci enough?
The alert can tell the user why is the failure and the fact that he needs to add capacity for more pools.

Comment 7 gowtham 2021-04-07 08:11:23 UTC
This bug needs to be fixed in backend first, Backend should pass a proper error message of failiure reason in ceph block pool yaml with phase "failiure". I agreed timeout is an UI bug, But still i can't think of any proper solution without the error message.

@Afreen if error message found in yaml is block pool list page (Default page created by OLM) will display error message along with phase?

Comment 11 gowtham 2021-04-17 10:56:22 UTC
Ui is not receiving a proper error message from the backend regarding the PG limit is reached, to fix the timeout issue I have send a PR: https://github.com/openshift/console/pull/8689. I am failing pool creating with some general error message: "Pool {poolName} got created with non ready state."

Pool CR creation is successful but status is a failure. Since UI is at CR level, I am specifying pool creation is done with some error. When PG limit is available this will become ready.

Comment 13 Shay Rozen 2021-05-31 12:36:01 UTC
@gowtham Why this bz moved to on_qa? Is there a fix that enables some error message except failure?

Comment 14 gowtham 2021-05-31 13:34:33 UTC
The fix is showing a failure error message instead of timing out the pool creation request "pool {pool-name} got created with errors". As per UI CR is created but the status is a failure.

Comment 15 Shay Rozen 2021-06-01 12:36:45 UTC
Where can I see the message. I've created pool limit yesterday and didn't see any message. Can you elaborate?

Comment 16 gowtham 2021-06-01 13:21:03 UTC
You have mentioned two errors here:
   1. Block pool creation page under OCS operator is showing "Failure" (Failure can happen for various reasons, and Ui is not receiving any failure error message from the backend. Please raise a separate BZ for a rook)
   2. Pool creation at storage class creation page is timed out without any error message (I fixed by showing generic error message)

Comment 17 gowtham 2021-06-01 13:25:13 UTC
Created attachment 1788508 [details]
fixed_timed_out

Comment 20 errata-xmlrpc 2021-07-27 22:57:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438