Bug 1968606

Summary: OCS CSV Status moves to Failed and Installs again when a StorageCluster is created
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: N Balachandran <nibalach>
Component: ocs-operatorAssignee: Jose A. Rivera <jarrpa>
Status: CLOSED ERRATA QA Contact: Shay Rozen <srozen>
Severity: medium Docs Contact:
Priority: high    
Version: 4.7CC: ebenahar, godas, jarrpa, madam, muagarwa, ocs-bugs, odf-bz-bot, sostapov
Target Milestone: ---   
Target Release: ODF 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-13 17:44:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1979168    

Description N Balachandran 2021-06-07 16:09:24 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Installing OCS operator 4.7 moves the csv status to Succeeded.
Creating the StorageCluster CR post that moves the status back to Failed and then Installing again. This happens periodically if the StorageCluster takes a long time to come up. This is particularly visible if there is an error in the StorageCluster and the operator never moves to ready.


Version of all relevant components (if applicable):


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1

Is this issue reproducible?
Yes

Can this issue reproduce from the UI?
Should be reproducible. We just need to query the CSV periodically.


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install OCS 4.7 on OCP
2. Start the following script in a different terminal

#!/bin/bash

namespace="openshift-storage"
csv="ocs-operator.v4.7.0"

while true 
do 
    status="$(oc -n "${namespace}" get csv "${csv}" -o jsonpath='{.status.phase}')"
    if [[ "$status" == "Succeeded" ]]; then
       echo "ClusterServiceVersion (${csv}) is ready"
    else
        echo "ClusterServiceVersion is $status"
    fi  
done

3. Create a StorageCluster.
4. Watch the output of the script.


Actual results:

After Operator Installation:
ClusterServiceVersion is
Error from server (NotFound): clusterserviceversions.operators.coreos.com "ocs-operator.v4.7.0" not found
ClusterServiceVersion is
Error from server (NotFound): clusterserviceversions.operators.coreos.com "ocs-operator.v4.7.0" not found
ClusterServiceVersion is
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is InstallReady
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready

...

After Storage Cluster creation:

ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion is Failed
ClusterServiceVersion is Failed
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is InstallReady
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing





Expected results:


Additional info:

AI queries the CSV status to determine the status of the operator deployment and storage cluster creation. Depending on the value returned at the time, the status sometimes shows failed even though the Storage Cluster creation is still progressing.

Comment 2 Jose A. Rivera 2021-06-08 15:00:57 UTC
While this seems a little odd, given that it's a transient thing I don't think it's urgent enough to take into OCS 4.8 by itself. That said, is this causing problems for the Automated Installer or can you work around it?

Comment 3 Gobinda Das 2021-06-15 08:59:42 UTC
@Jose can this be targeted for 4.9?

Comment 4 N Balachandran 2021-06-17 04:04:33 UTC
(In reply to Jose A. Rivera from comment #2)
> While this seems a little odd, given that it's a transient thing I don't
> think it's urgent enough to take into OCS 4.8 by itself. That said, is this
> causing problems for the Automated Installer or can you work around it?

They are looking at working around it for now.

Comment 5 N Balachandran 2021-07-01 04:47:26 UTC
Gobinda, Can this be moved to 4.8.z or would you consider it a blocker for AI?

Comment 6 Gobinda Das 2021-07-01 06:02:56 UTC
Hi Nithya,
 It is not a blocker for 4.8.0 as installation is not brocken. We can move to 4.8.z.

Comment 7 Jose A. Rivera 2021-07-27 13:07:16 UTC
We can certainly take this in. It would probably be bad UX for 4.9, in particular.

Comment 8 Priyanka 2021-08-19 11:30:57 UTC
Fix PR: https://github.com/openshift/ocs-operator/pull/1291

Comment 10 Shay Rozen 2021-09-13 13:20:43 UTC
ocs operator stay in ready state when storageSystem is installed (and storagecluster installation in the background).
Checked in odf 4.9.132ci. Moving to verified.

Comment 17 errata-xmlrpc 2021-12-13 17:44:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086