Bug 1968606 - OCS CSV Status moves to Failed and Installs again when a StorageCluster is created
Summary: OCS CSV Status moves to Failed and Installs again when a StorageCluster is cr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ODF 4.9.0
Assignee: Jose A. Rivera
QA Contact: Shay Rozen
URL:
Whiteboard:
Depends On:
Blocks: 1979168
TreeView+ depends on / blocked
 
Reported: 2021-06-07 16:09 UTC by N Balachandran
Modified: 2023-08-09 17:00 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-13 17:44:31 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 1291 0 None None None 2021-08-20 02:01:37 UTC
Red Hat Product Errata RHSA-2021:5086 0 None None None 2021-12-13 17:44:50 UTC

Internal Links: 1979168

Description N Balachandran 2021-06-07 16:09:24 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Installing OCS operator 4.7 moves the csv status to Succeeded.
Creating the StorageCluster CR post that moves the status back to Failed and then Installing again. This happens periodically if the StorageCluster takes a long time to come up. This is particularly visible if there is an error in the StorageCluster and the operator never moves to ready.


Version of all relevant components (if applicable):


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1

Is this issue reproducible?
Yes

Can this issue reproduce from the UI?
Should be reproducible. We just need to query the CSV periodically.


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install OCS 4.7 on OCP
2. Start the following script in a different terminal

#!/bin/bash

namespace="openshift-storage"
csv="ocs-operator.v4.7.0"

while true 
do 
    status="$(oc -n "${namespace}" get csv "${csv}" -o jsonpath='{.status.phase}')"
    if [[ "$status" == "Succeeded" ]]; then
       echo "ClusterServiceVersion (${csv}) is ready"
    else
        echo "ClusterServiceVersion is $status"
    fi  
done

3. Create a StorageCluster.
4. Watch the output of the script.


Actual results:

After Operator Installation:
ClusterServiceVersion is
Error from server (NotFound): clusterserviceversions.operators.coreos.com "ocs-operator.v4.7.0" not found
ClusterServiceVersion is
Error from server (NotFound): clusterserviceversions.operators.coreos.com "ocs-operator.v4.7.0" not found
ClusterServiceVersion is
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is InstallReady
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready

...

After Storage Cluster creation:

ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion (ocs-operator.v4.7.0) is ready
ClusterServiceVersion is Failed
ClusterServiceVersion is Failed
ClusterServiceVersion is Pending
ClusterServiceVersion is Pending
ClusterServiceVersion is InstallReady
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing
ClusterServiceVersion is Installing





Expected results:


Additional info:

AI queries the CSV status to determine the status of the operator deployment and storage cluster creation. Depending on the value returned at the time, the status sometimes shows failed even though the Storage Cluster creation is still progressing.

Comment 2 Jose A. Rivera 2021-06-08 15:00:57 UTC
While this seems a little odd, given that it's a transient thing I don't think it's urgent enough to take into OCS 4.8 by itself. That said, is this causing problems for the Automated Installer or can you work around it?

Comment 3 Gobinda Das 2021-06-15 08:59:42 UTC
@Jose can this be targeted for 4.9?

Comment 4 N Balachandran 2021-06-17 04:04:33 UTC
(In reply to Jose A. Rivera from comment #2)
> While this seems a little odd, given that it's a transient thing I don't
> think it's urgent enough to take into OCS 4.8 by itself. That said, is this
> causing problems for the Automated Installer or can you work around it?

They are looking at working around it for now.

Comment 5 N Balachandran 2021-07-01 04:47:26 UTC
Gobinda, Can this be moved to 4.8.z or would you consider it a blocker for AI?

Comment 6 Gobinda Das 2021-07-01 06:02:56 UTC
Hi Nithya,
 It is not a blocker for 4.8.0 as installation is not brocken. We can move to 4.8.z.

Comment 7 Jose A. Rivera 2021-07-27 13:07:16 UTC
We can certainly take this in. It would probably be bad UX for 4.9, in particular.

Comment 8 Priyanka 2021-08-19 11:30:57 UTC
Fix PR: https://github.com/openshift/ocs-operator/pull/1291

Comment 10 Shay Rozen 2021-09-13 13:20:43 UTC
ocs operator stay in ready state when storageSystem is installed (and storagecluster installation in the background).
Checked in odf 4.9.132ci. Moving to verified.

Comment 17 errata-xmlrpc 2021-12-13 17:44:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086


Note You need to log in before you can comment on or make changes to this bug.