Bug 1847875

Summary: CephObjectStoreUser CRs are stuck in a "Created" phase in independent mode
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Ben Eli <belimele>
Component: ocs-operatorAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Rachael <rgeorge>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.5CC: ebenahar, jthottan, madam, nberry, ocs-bugs, rajasing, sabose, shan, sostapov, uchapaga
Target Milestone: ---Keywords: AutomationBackLog, TestBlocker
Target Release: OCS 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.5.0-477.ci Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1849663 (view as bug list) Environment:
Last Closed: 2020-09-15 10:17:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1849663    

Description Ben Eli 2020-06-17 09:24:28 UTC
Description of problem (please be detailed as possible and provide log
snippets):
When I try to create a CephObjectStoreUser CR on a cluster that uses OCS independent mode, it's stuck in a "Created" phase - and the appropriate configmap and secret aren't created.

This also leads to NooBaa failing to install on independent mode.
NooBaa tries to create a default backingstore, and it tries to create a new CephObjectStoreUser to create its target bucket on RGW. However, since the request isn't fulfilled, the backingstore creation failed, and NooBaa is stuck in the `Configuring` phase

Version of all relevant components (if applicable):
OCS 4.5.0-447.ci
OCP 4.5.0-0.nightly-2020-06-03-215545

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes - I cannot create CephObjectStoreUsers, and NooBaa can't be installed properly.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Was not tested

Can this issue reproduce from the UI?
NA

If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:
1. Deploy a cluster with OCP and OCS 4.5 on VMWare/Bare-metal
2. Run `oc -n openshift-storage get noobaa` and see that its phase is Configuring
3. Run `oc -n openshift-storage get backingstore` - see that none exist
4. (Optional) - Try to create a CephObjectStoreUser, see that it's stuck in the `Created` phase, no secret or configmap are created.


Actual results:
CephObjectStoreUser requests aren't fulfilled

Expected results:
CephObjectStoreUser requests are fulfilled, configmaps and secrets are created

Additional info:

Comment 4 Sahina Bose 2020-06-18 07:11:24 UTC
Can you look into this?

Comment 5 RAJAT SINGH 2020-06-18 09:33:24 UTC
Sure, will look into it.

Comment 6 RAJAT SINGH 2020-06-19 11:28:42 UTC
Hi Ben, when you say that you deployed OCS, what all YAMLs did you applied?.
I'm just curious what exact steps you did to deploy OCS.
Thanks

Comment 10 Neha Berry 2020-07-03 08:32:34 UTC
Latest update:

We all know, that OCS 4.5 install on latest OCP builds is blocked. Hence, we tried installing OCS 4.5 on old builds of OCP (in this case Jun 3rd nightly). And it seems the issue with noobaa is still there.

The noobaa issues are preventing the OCS CSV to come toSucceeded state. Aren't fixes Bug 1847875 and Bug 1849663 already part of the recent OCS 4.5 builds ?  When can we expect to have working deployment of OCS (atleast on older OCP builds until Bug 1852865 is fixed)


NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.5.0-470.ci   OpenShift Container Storage   4.5.0-470.ci              Installing


OCP build - 4.5.0-0.nightly-2020-06-03-215545 (old build from Jun 3)


status:
  conditions:
  - lastHeartbeatTime: "2020-07-03T08:18:17Z"
    lastTransitionTime: "2020-07-01T12:32:52Z"
    message: 'Error while reconciling: NooBaa.noobaa.io "noobaa" is invalid: metadata.labels:
      Invalid value: "MTAuMS44LjQxOjgwODA=": a valid label must be an empty string
      or consist of alphanumeric characters, ''-'', ''_'' or ''.'', and must start
      and end with an alphanumeric character (e.g. ''MyValue'',  or ''my_value'',  or
      ''12345'', regex used for validation is ''(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?'')'
    reason: ReconcileFailed

Comment 11 umanga 2020-07-03 09:50:27 UTC
(In reply to Neha Berry from comment #10)
> Latest update:
> 
> We all know, that OCS 4.5 install on latest OCP builds is blocked. Hence, we
> tried installing OCS 4.5 on old builds of OCP (in this case Jun 3rd
> nightly). And it seems the issue with noobaa is still there.
> 
> The noobaa issues are preventing the OCS CSV to come toSucceeded state.
> Aren't fixes Bug 1847875 and Bug 1849663 already part of the recent OCS 4.5
> builds ?  When can we expect to have working deployment of OCS (atleast on
> older OCP builds until Bug 1852865 is fixed)
> 
> 
> NAME                         DISPLAY                       VERSION       
> REPLACES   PHASE
> ocs-operator.v4.5.0-470.ci   OpenShift Container Storage   4.5.0-470.ci     
> Installing
> 
> 
> OCP build - 4.5.0-0.nightly-2020-06-03-215545 (old build from Jun 3)
> 
> 
> status:
>   conditions:
>   - lastHeartbeatTime: "2020-07-03T08:18:17Z"
>     lastTransitionTime: "2020-07-01T12:32:52Z"
>     message: 'Error while reconciling: NooBaa.noobaa.io "noobaa" is invalid:
> metadata.labels:
>       Invalid value: "MTAuMS44LjQxOjgwODA=": a valid label must be an empty
> string
>       or consist of alphanumeric characters, ''-'', ''_'' or ''.'', and must
> start
>       and end with an alphanumeric character (e.g. ''MyValue'',  or
> ''my_value'',  or
>       ''12345'', regex used for validation is
> ''(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?'')'
>     reason: ReconcileFailed

@seb Looks like https://github.com/openshift/ocs-operator/pull/585 doesn't work either.
Strip port info and recreate looks like the only possible option.

Comment 12 Michael Adam 2020-07-03 13:28:29 UTC
(In reply to Neha Berry from comment #10)
> Latest update:
> 
> We all know, that OCS 4.5 install on latest OCP builds is blocked. Hence, we
> tried installing OCS 4.5 on old builds of OCP (in this case Jun 3rd
> nightly). And it seems the issue with noobaa is still there.
> 
> The noobaa issues are preventing the OCS CSV to come toSucceeded state.
> Aren't fixes Bug 1847875 and Bug 1849663 already part of the recent OCS 4.5
> builds ?

I checked the BZs/PRs/branches/Builds, and it seems that all mentioned patches have been in the builds since https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCS%20Build%20Pipeline%204.5/49/ (that's 4.5.0-463.ci).


> When can we expect to have working deployment of OCS (atleast on
> older OCP builds until Bug 1852865 is fixed)
> 
> 
> NAME                         DISPLAY                       VERSION       
> REPLACES   PHASE
> ocs-operator.v4.5.0-470.ci   OpenShift Container Storage   4.5.0-470.ci     
> Installing
> 
> 
> OCP build - 4.5.0-0.nightly-2020-06-03-215545 (old build from Jun 3)
> 
> 
> status:
>   conditions:
>   - lastHeartbeatTime: "2020-07-03T08:18:17Z"
>     lastTransitionTime: "2020-07-01T12:32:52Z"
>     message: 'Error while reconciling: NooBaa.noobaa.io "noobaa" is invalid:
> metadata.labels:
>       Invalid value: "MTAuMS44LjQxOjgwODA=": a valid label must be an empty
> string
>       or consist of alphanumeric characters, ''-'', ''_'' or ''.'', and must
> start
>       and end with an alphanumeric character (e.g. ''MyValue'',  or
> ''my_value'',  or
>       ''12345'', regex used for validation is
> ''(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?'')'
>     reason: ReconcileFailed

By using base64 encoding, we avoid failure due to invalid char `:`, but now we have failure due to `=` if this is added because of padding...

So the patches were not sufficient.

Comment 15 Sébastien Han 2020-07-07 11:51:04 UTC
Merged and backported.

Comment 16 Michael Adam 2020-07-07 15:47:20 UTC
https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCS%20Build%20Pipeline%204.5/57/ has the fix.

4.5.0-477.ci

Comment 21 errata-xmlrpc 2020-09-15 10:17:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754