Bug 1847875 - CephObjectStoreUser CRs are stuck in a "Created" phase in independent mode
Summary: CephObjectStoreUser CRs are stuck in a "Created" phase in independent mode
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.5.0
Assignee: Sébastien Han
QA Contact: Rachael
URL:
Whiteboard:
Depends On:
Blocks: 1849663
TreeView+ depends on / blocked
 
Reported: 2020-06-17 09:24 UTC by Ben Eli
Modified: 2020-09-23 09:07 UTC (History)
10 users (show)

Fixed In Version: 4.5.0-477.ci
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1849663 (view as bug list)
Environment:
Last Closed: 2020-09-15 10:17:44 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 569 0 None closed independent mode: expose rgw endpoint as a label 2020-12-16 09:27:14 UTC
Github openshift ocs-operator pull 585 0 None closed Bug 1847875: [release-4.5] independent mode: expose rgw endpoint as a label 2020-12-16 09:27:14 UTC
Github openshift ocs-operator pull 610 0 None closed do not encode rgw label endpoint 2020-12-16 09:27:14 UTC
Red Hat Product Errata RHBA-2020:3754 0 None None None 2020-09-15 10:18:06 UTC

Description Ben Eli 2020-06-17 09:24:28 UTC
Description of problem (please be detailed as possible and provide log
snippets):
When I try to create a CephObjectStoreUser CR on a cluster that uses OCS independent mode, it's stuck in a "Created" phase - and the appropriate configmap and secret aren't created.

This also leads to NooBaa failing to install on independent mode.
NooBaa tries to create a default backingstore, and it tries to create a new CephObjectStoreUser to create its target bucket on RGW. However, since the request isn't fulfilled, the backingstore creation failed, and NooBaa is stuck in the `Configuring` phase

Version of all relevant components (if applicable):
OCS 4.5.0-447.ci
OCP 4.5.0-0.nightly-2020-06-03-215545

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes - I cannot create CephObjectStoreUsers, and NooBaa can't be installed properly.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Was not tested

Can this issue reproduce from the UI?
NA

If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:
1. Deploy a cluster with OCP and OCS 4.5 on VMWare/Bare-metal
2. Run `oc -n openshift-storage get noobaa` and see that its phase is Configuring
3. Run `oc -n openshift-storage get backingstore` - see that none exist
4. (Optional) - Try to create a CephObjectStoreUser, see that it's stuck in the `Created` phase, no secret or configmap are created.


Actual results:
CephObjectStoreUser requests aren't fulfilled

Expected results:
CephObjectStoreUser requests are fulfilled, configmaps and secrets are created

Additional info:

Comment 4 Sahina Bose 2020-06-18 07:11:24 UTC
Can you look into this?

Comment 5 RAJAT SINGH 2020-06-18 09:33:24 UTC
Sure, will look into it.

Comment 6 RAJAT SINGH 2020-06-19 11:28:42 UTC
Hi Ben, when you say that you deployed OCS, what all YAMLs did you applied?.
I'm just curious what exact steps you did to deploy OCS.
Thanks

Comment 10 Neha Berry 2020-07-03 08:32:34 UTC
Latest update:

We all know, that OCS 4.5 install on latest OCP builds is blocked. Hence, we tried installing OCS 4.5 on old builds of OCP (in this case Jun 3rd nightly). And it seems the issue with noobaa is still there.

The noobaa issues are preventing the OCS CSV to come toSucceeded state. Aren't fixes Bug 1847875 and Bug 1849663 already part of the recent OCS 4.5 builds ?  When can we expect to have working deployment of OCS (atleast on older OCP builds until Bug 1852865 is fixed)


NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.5.0-470.ci   OpenShift Container Storage   4.5.0-470.ci              Installing


OCP build - 4.5.0-0.nightly-2020-06-03-215545 (old build from Jun 3)


status:
  conditions:
  - lastHeartbeatTime: "2020-07-03T08:18:17Z"
    lastTransitionTime: "2020-07-01T12:32:52Z"
    message: 'Error while reconciling: NooBaa.noobaa.io "noobaa" is invalid: metadata.labels:
      Invalid value: "MTAuMS44LjQxOjgwODA=": a valid label must be an empty string
      or consist of alphanumeric characters, ''-'', ''_'' or ''.'', and must start
      and end with an alphanumeric character (e.g. ''MyValue'',  or ''my_value'',  or
      ''12345'', regex used for validation is ''(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?'')'
    reason: ReconcileFailed

Comment 11 umanga 2020-07-03 09:50:27 UTC
(In reply to Neha Berry from comment #10)
> Latest update:
> 
> We all know, that OCS 4.5 install on latest OCP builds is blocked. Hence, we
> tried installing OCS 4.5 on old builds of OCP (in this case Jun 3rd
> nightly). And it seems the issue with noobaa is still there.
> 
> The noobaa issues are preventing the OCS CSV to come toSucceeded state.
> Aren't fixes Bug 1847875 and Bug 1849663 already part of the recent OCS 4.5
> builds ?  When can we expect to have working deployment of OCS (atleast on
> older OCP builds until Bug 1852865 is fixed)
> 
> 
> NAME                         DISPLAY                       VERSION       
> REPLACES   PHASE
> ocs-operator.v4.5.0-470.ci   OpenShift Container Storage   4.5.0-470.ci     
> Installing
> 
> 
> OCP build - 4.5.0-0.nightly-2020-06-03-215545 (old build from Jun 3)
> 
> 
> status:
>   conditions:
>   - lastHeartbeatTime: "2020-07-03T08:18:17Z"
>     lastTransitionTime: "2020-07-01T12:32:52Z"
>     message: 'Error while reconciling: NooBaa.noobaa.io "noobaa" is invalid:
> metadata.labels:
>       Invalid value: "MTAuMS44LjQxOjgwODA=": a valid label must be an empty
> string
>       or consist of alphanumeric characters, ''-'', ''_'' or ''.'', and must
> start
>       and end with an alphanumeric character (e.g. ''MyValue'',  or
> ''my_value'',  or
>       ''12345'', regex used for validation is
> ''(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?'')'
>     reason: ReconcileFailed

@seb Looks like https://github.com/openshift/ocs-operator/pull/585 doesn't work either.
Strip port info and recreate looks like the only possible option.

Comment 12 Michael Adam 2020-07-03 13:28:29 UTC
(In reply to Neha Berry from comment #10)
> Latest update:
> 
> We all know, that OCS 4.5 install on latest OCP builds is blocked. Hence, we
> tried installing OCS 4.5 on old builds of OCP (in this case Jun 3rd
> nightly). And it seems the issue with noobaa is still there.
> 
> The noobaa issues are preventing the OCS CSV to come toSucceeded state.
> Aren't fixes Bug 1847875 and Bug 1849663 already part of the recent OCS 4.5
> builds ?

I checked the BZs/PRs/branches/Builds, and it seems that all mentioned patches have been in the builds since https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCS%20Build%20Pipeline%204.5/49/ (that's 4.5.0-463.ci).


> When can we expect to have working deployment of OCS (atleast on
> older OCP builds until Bug 1852865 is fixed)
> 
> 
> NAME                         DISPLAY                       VERSION       
> REPLACES   PHASE
> ocs-operator.v4.5.0-470.ci   OpenShift Container Storage   4.5.0-470.ci     
> Installing
> 
> 
> OCP build - 4.5.0-0.nightly-2020-06-03-215545 (old build from Jun 3)
> 
> 
> status:
>   conditions:
>   - lastHeartbeatTime: "2020-07-03T08:18:17Z"
>     lastTransitionTime: "2020-07-01T12:32:52Z"
>     message: 'Error while reconciling: NooBaa.noobaa.io "noobaa" is invalid:
> metadata.labels:
>       Invalid value: "MTAuMS44LjQxOjgwODA=": a valid label must be an empty
> string
>       or consist of alphanumeric characters, ''-'', ''_'' or ''.'', and must
> start
>       and end with an alphanumeric character (e.g. ''MyValue'',  or
> ''my_value'',  or
>       ''12345'', regex used for validation is
> ''(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?'')'
>     reason: ReconcileFailed

By using base64 encoding, we avoid failure due to invalid char `:`, but now we have failure due to `=` if this is added because of padding...

So the patches were not sufficient.

Comment 15 Sébastien Han 2020-07-07 11:51:04 UTC
Merged and backported.

Comment 16 Michael Adam 2020-07-07 15:47:20 UTC
https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCS%20Build%20Pipeline%204.5/57/ has the fix.

4.5.0-477.ci

Comment 21 errata-xmlrpc 2020-09-15 10:17:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754


Note You need to log in before you can comment on or make changes to this bug.