Bug 1851328 - [AWS/VSPHERE]: ocs-operator.v4.5.0-463 is in Pending state in Latest OCP nightly builds
Summary: [AWS/VSPHERE]: ocs-operator.v4.5.0-463 is in Pending state in Latest OCP nigh...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: OCS 4.5.0
Assignee: umanga
QA Contact: Vijay Avuthu
URL:
Whiteboard:
: 1852607 (view as bug list)
Depends On: 1852865 1853022
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-26 07:29 UTC by Vijay Avuthu
Modified: 2020-09-15 10:18 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1852865 (view as bug list)
Environment:
Last Closed: 2020-09-15 10:17:53 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 613 0 None closed Fix problematic permissions 2020-10-08 13:10:02 UTC
Github openshift ocs-operator pull 617 0 None closed Bug 1851328: [release-4.5] Fix problematic permissions 2020-10-08 13:10:02 UTC
Red Hat Product Errata RHBA-2020:3754 0 None None None 2020-09-15 10:18:21 UTC

Description Vijay Avuthu 2020-06-26 07:29:54 UTC
Description of problem:

ocs-operator.v4.5.0-463 is in Pending state

Version-Release number of selected component (if applicable):

openshift installer (4.5.0-0.nightly-2020-06-26-023641)
ocs-operator.v4.5.0-463

How reproducible:
1/1

Steps to Reproduce:
1. Install OCS4.5 using ocs-ci
2. verify operator is installed or not


Actual results:

$ oc -n openshift-storage get csv
NAME                         DISPLAY                       VERSION        REPLACES              PHASE
awss3operator.1.0.1          AWS S3 Operator               1.0.1          awss3operator.1.0.0   Succeeded
ocs-operator.v4.5.0-463.ci   OpenShift Container Storage   4.5.0-463.ci                         Pending
[vavuthu@localhost rem]$ 


Expected results:

operator should be in succeed state

Additional info:

Jenkins Job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9166/consoleFull

Must gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-t1/jnk-ai3c33-t1_20200626T052528/logs/failed_testcase_ocs_logs_1593149791/deployment_ocs_logs/

Comment 3 Vijay Avuthu 2020-06-26 08:00:46 UTC
status condition in describe of operator

> $ oc describe csv ocs-operator.v4.5.0-463.ci

status:
  Conditions:
    Last Transition Time:  2020-06-26T06:06:25Z
    Last Update Time:      2020-06-26T06:06:25Z
    Message:               requirements not yet checked
    Phase:                 Pending
    Reason:                RequirementsUnknown
    Last Transition Time:  2020-06-26T06:06:25Z
    Last Update Time:      2020-06-26T06:06:26Z
    Message:               one or more requirements couldn't be found
    Phase:                 Pending
    Reason:                RequirementsNotMet
  Last Transition Time:    2020-06-26T06:06:25Z
  Last Update Time:        2020-06-26T06:06:26Z
  Message:                 one or more requirements couldn't be found
  Phase:                   Pending
  Reason:                  RequirementsNotMet

Requirement Status:
    Group:    apiextensions.k8s.io
    Kind:     CustomResourceDefinition
    Message:  CRD is present and Established condition is true
    Name:     backingstores.noobaa.io
    Status:   Present
    Uuid:     e89cbf42-75c4-419c-ad8a-505e384b91e9
    Version:  v1


Dependents:
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  namespaced rule:{"verbs":["get","list","watch"],"apiGroups":[""],"resources":["services","endpoints","pods"]}
      Status:   NotSatisfied
      Version:  v1

Group:      
    Kind:       ServiceAccount
    Message:    Policy rule not satisfied for service account
    Name:       noobaa-metrics
    Status:     PresentNotSatisfied
    Version:    v1
    Dependents:
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  namespaced rule:{"verbs":["get","watch","list","delete","update","create"],"apiGroups":[""],"resources":["endpoints"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  namespaced rule:{"verbs":["get","list","create","delete"],"apiGroups":[""],"resources":["configmaps"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  namespaced rule:{"verbs":["get","watch","list","delete","update","create"],"apiGroups":["coordination.k8s.io"],"resources":["leases"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["get","list"],"apiGroups":[""],"resources":["secrets"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["get","list","watch","create","delete","update","patch"],"apiGroups":[""],"resources":["persistentvolumes"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["get","list","watch","update"],"apiGroups":[""],"resources":["persistentvolumeclaims"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["get","list","watch"],"apiGroups":["storage.k8s.io"],"resources":["storageclasses"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["list","watch","create","update","patch"],"apiGroups":[""],"resources":["events"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["get","list","watch","update","patch"],"apiGroups":["storage.k8s.io"],"resources":["volumeattachments"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["get","list","watch"],"apiGroups":[""],"resources":["nodes"]}
      Status:   Satisfied
      Version:  v1
      Group:    rbac.authorization.k8s.io
      Kind:     PolicyRule
      Message:  cluster rule:{"verbs":["update","patch"],"apiGroups":[""],"resources":["persistentvolumeclaims/status"]}
      Status:   Satisfied
      Version:  v1



Events:
  Type    Reason               Age                From                        Message
  ----    ------               ----               ----                        -------
  Normal  RequirementsUnknown  98m (x2 over 98m)  operator-lifecycle-manager  requirements not yet checked
  Normal  RequirementsNotMet   98m                operator-lifecycle-manager  one or more requirements couldn't be found
$

Comment 6 Petr Balogh 2020-06-29 14:00:14 UTC
Build 466 has still the same issue: https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/480/console

Comment 8 Jose A. Rivera 2020-06-30 14:27:08 UTC
Looking through this, I can't find anything menaingful about what's going on. The only logs I'm finding are these lines in the olm-operator logs:

2020-06-30T02:34:03.193147487Z time="2020-06-30T02:34:03Z" level=info msg="csv in operatorgroup" csv=ocs-operator.v4.5.0-467.ci id=brBbF namespace=openshift-storage opgroup=openshift-storage-operatorgroup phase=Pending
2020-06-30T02:34:04.624762784Z time="2020-06-30T02:34:04Z" level=info msg="requirements were not met" csv=ocs-operator.v4.5.0-467.ci id=brBbF namespace=openshift-storage phase=Pending
2020-06-30T02:34:04.69350063Z E0630 02:34:04.693431       1 queueinformer_operator.go:290] sync {"update" "openshift-storage/ocs-operator.v4.5.0-467.ci"} failed: requirements were not met

Can we try running the last-known good build again, just to make sure it still works? If not then this may be a problem with OCP. Can we try deploying OCS 4.5 on OCP 4.4 as well?

Nonetheless, this looks like a genuine problem and should be taken care of, giving devel_ack+.

Comment 11 Ben Eli 2020-06-30 14:31:31 UTC
As noted in comment #5 - even older versions that are known to work, don't work anymore. 
I didn't test any other version than 462.

However, I agree that this might point to the problem being in OCP rather than OCS (or somewhere in the middle).
If the issue was entirely in OCS, 462 should have still been deployable.

Comment 12 Coady LaCroix 2020-06-30 21:01:03 UTC
*** Bug 1852607 has been marked as a duplicate of this bug. ***

Comment 13 umanga 2020-07-01 10:04:34 UTC
Looking at the CSV and all other resources, it seems like Requirements were met but for some reason CSV is not ready to accept it.
This seems to be an issue in OLM dependency resolution.

I also noticed that the OCS CSV is constantly getting refreshed which could be the reason for CSV not getting updated correctly.

Comment 14 Petr Balogh 2020-07-01 10:56:41 UTC
Trying to run verification of OCS 4.5 with OCP 4.4 here: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9353/

Comment 16 Jose A. Rivera 2020-07-01 13:38:45 UTC
So it sounds like this is an OCP level bug. Cloning this into OCP.

Comment 18 Petr Balogh 2020-07-01 15:04:59 UTC
Please let us know if someone is looking at cluster and for how long do you need it, otherwise we will destroy it in few hours.

Comment 21 Petr Balogh 2020-07-07 09:34:43 UTC
From what I remember and tested:
        | OCP 4.4 | OCP 4.5
OCS 4.4 |  works  | works
OCS 4.5 |  broken | broken

Comment 22 Michael Adam 2020-07-07 10:12:58 UTC
(In reply to Petr Balogh from comment #21)
> From what I remember and tested:
>         | OCP 4.4 | OCP 4.5
> OCS 4.4 |  works  | works
> OCS 4.5 |  broken | broken


Ugh, seriously?!

From all our discussions, I *thought* it was:

        | OCP 4.4 |  OCP 4.5 old | OCP 4.5 new
--------+---------+--------------+--------------
OCS 4.4 | works   |  works       | broken(?)
OCS 4.5 | works   |  works       | broken


@Petr, please double check

Comment 23 Petr Balogh 2020-07-07 10:27:55 UTC
@Michael:
OCS 4.4.1 on OCP 4.4 nightly: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9443/ 3days back - Deployment OK.

OCS 4.4.1 on OCP 4.5 nightly: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9453/ 3days back - Deployment OK.

OCS 4.5 (4.5.0-470.ci) on OCP 4.4 nightly: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9357/ 5days back - Deployment is failing:
14:01:16 - MainThread - ocs_ci.ocs.ocp - INFO - Resource ocs-operator.v4.5.0-470.ci is in phase: Pending!
14:01:16 - MainThread - ocs_ci.utility.utils - ERROR - (check_phase) return incorrect status after 720 second timeout

OCS 4.5 (4.5.0-470.ci) on OCP 4.5 nightly: https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/482/console 6days back (engineering job) - Deployment is failing:
22:36:48 - MainThread - ocs_ci.ocs.ocp - INFO - Resource ocs-operator.v4.5.0-470.ci is in phase: Pending!
22:36:48 - MainThread - ocs_ci.utility.utils - ERROR - (check_phase) return incorrect status after 720 second timeout


I think that I see what I said here https://bugzilla.redhat.com/show_bug.cgi?id=1851328#c21.  Do you want me to trigger the jobs again?

Petr

Comment 24 Neha Berry 2020-07-07 10:49:42 UTC
+1 to petr(In reply to Michael Adam from comment #22)
> (In reply to Petr Balogh from comment #21)
> > From what I remember and tested:
> >         | OCP 4.4 | OCP 4.5
> > OCS 4.4 |  works  | works
> > OCS 4.5 |  broken | broken
> 
> 
> Ugh, seriously?!
> 
> From all our discussions, I *thought* it was:
> 
>         | OCP 4.4 |  OCP 4.5 old | OCP 4.5 new
> --------+---------+--------------+--------------
> OCS 4.4 | works   |  works       | broken(?)
> OCS 4.5 | works   |  works       | broken
> 
> 
> @Petr, please double check

@michael if we see Comment#17 and Comment#19, OCS 4.5 is broken on latest nightlies of OCP 4.4, OCP 4.5 and OCP 4.6 too

>>What works for OCS 4.5 is -> OCS 4.5 on OCP 4.5 build older than OR equal to Jun 17th nightlies

>>What works for OCS 4.4 -> OCP 4.4, OCP 4.5 (even latest nightlies)

So to summarize from Petr's comment#23
>         | OCP 4.4 new |  OCP 4.5 old | OCP 4.5 new | OCP 4.6 new
> --------+---------+--------------+--------------
> OCS 4.4 | works       |  works       | works       | not tested (n+2)
> OCS 4.5 | broken      |  works       | broken      | broken

Comment 25 umanga 2020-07-07 11:40:26 UTC
Look at https://bugzilla.redhat.com/show_bug.cgi?id=1852865#c13 for more details.

Comment 26 Michael Adam 2020-07-07 15:05:19 UTC
https://github.com/openshift/ocs-operator/pull/613
master patch merged

Comment 27 Michael Adam 2020-07-07 17:11:21 UTC
https://github.com/openshift/ocs-operator/pull/617

backport PR. Merged.

Comment 28 Michael Adam 2020-07-08 11:35:17 UTC
Contained in 4.5.0-479.ci / https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCS%20Build%20Pipeline%204.5/58/

Comment 30 Vijay Avuthu 2020-07-10 08:00:16 UTC
Verified below combinations:

1) OCP 4.4 + OCS 4.5 - vSphere

ocs-operator.v4.5.0-482.ci
openshift installer (4.4.0-0.nightly-2020-07-09-063156)

Job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9641/console 

2) OCP 4.5 + OCS 4.5 -vSphere

ocs-operator.v4.5.0-484.ci
openshift installer (4.5.0-0.nightly-2020-07-07-210042)


https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9696/console


Marking as Verified.

Comment 33 errata-xmlrpc 2020-09-15 10:17:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754


Note You need to log in before you can comment on or make changes to this bug.