Bug 1851328
| Summary: | [AWS/VSPHERE]: ocs-operator.v4.5.0-463 is in Pending state in Latest OCP nightly builds | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Vijay Avuthu <vavuthu> | |
| Component: | ocs-operator | Assignee: | umanga <uchapaga> | |
| Status: | CLOSED ERRATA | QA Contact: | Vijay Avuthu <vavuthu> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.5 | CC: | assingh, belimele, bparees, clacroix, dmoessne, ebenahar, eparis, kramdoss, madam, nberry, ocs-bugs, pbalogh, rgeorge, sostapov, uchapaga | |
| Target Milestone: | --- | Keywords: | Automation, Regression, TestBlocker | |
| Target Release: | OCS 4.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1852865 (view as bug list) | Environment: | ||
| Last Closed: | 2020-09-15 10:17:53 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1852865, 1853022 | |||
| Bug Blocks: | ||||
|
Description
Vijay Avuthu
2020-06-26 07:29:54 UTC
status condition in describe of operator
> $ oc describe csv ocs-operator.v4.5.0-463.ci
status:
Conditions:
Last Transition Time: 2020-06-26T06:06:25Z
Last Update Time: 2020-06-26T06:06:25Z
Message: requirements not yet checked
Phase: Pending
Reason: RequirementsUnknown
Last Transition Time: 2020-06-26T06:06:25Z
Last Update Time: 2020-06-26T06:06:26Z
Message: one or more requirements couldn't be found
Phase: Pending
Reason: RequirementsNotMet
Last Transition Time: 2020-06-26T06:06:25Z
Last Update Time: 2020-06-26T06:06:26Z
Message: one or more requirements couldn't be found
Phase: Pending
Reason: RequirementsNotMet
Requirement Status:
Group: apiextensions.k8s.io
Kind: CustomResourceDefinition
Message: CRD is present and Established condition is true
Name: backingstores.noobaa.io
Status: Present
Uuid: e89cbf42-75c4-419c-ad8a-505e384b91e9
Version: v1
Dependents:
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: namespaced rule:{"verbs":["get","list","watch"],"apiGroups":[""],"resources":["services","endpoints","pods"]}
Status: NotSatisfied
Version: v1
Group:
Kind: ServiceAccount
Message: Policy rule not satisfied for service account
Name: noobaa-metrics
Status: PresentNotSatisfied
Version: v1
Dependents:
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: namespaced rule:{"verbs":["get","watch","list","delete","update","create"],"apiGroups":[""],"resources":["endpoints"]}
Status: Satisfied
Version: v1
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: namespaced rule:{"verbs":["get","list","create","delete"],"apiGroups":[""],"resources":["configmaps"]}
Status: Satisfied
Version: v1
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: namespaced rule:{"verbs":["get","watch","list","delete","update","create"],"apiGroups":["coordination.k8s.io"],"resources":["leases"]}
Status: Satisfied
Version: v1
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: cluster rule:{"verbs":["get","list"],"apiGroups":[""],"resources":["secrets"]}
Status: Satisfied
Version: v1
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: cluster rule:{"verbs":["get","list","watch","create","delete","update","patch"],"apiGroups":[""],"resources":["persistentvolumes"]}
Status: Satisfied
Version: v1
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: cluster rule:{"verbs":["get","list","watch","update"],"apiGroups":[""],"resources":["persistentvolumeclaims"]}
Status: Satisfied
Version: v1
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: cluster rule:{"verbs":["get","list","watch"],"apiGroups":["storage.k8s.io"],"resources":["storageclasses"]}
Status: Satisfied
Version: v1
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: cluster rule:{"verbs":["list","watch","create","update","patch"],"apiGroups":[""],"resources":["events"]}
Status: Satisfied
Version: v1
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: cluster rule:{"verbs":["get","list","watch","update","patch"],"apiGroups":["storage.k8s.io"],"resources":["volumeattachments"]}
Status: Satisfied
Version: v1
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: cluster rule:{"verbs":["get","list","watch"],"apiGroups":[""],"resources":["nodes"]}
Status: Satisfied
Version: v1
Group: rbac.authorization.k8s.io
Kind: PolicyRule
Message: cluster rule:{"verbs":["update","patch"],"apiGroups":[""],"resources":["persistentvolumeclaims/status"]}
Status: Satisfied
Version: v1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal RequirementsUnknown 98m (x2 over 98m) operator-lifecycle-manager requirements not yet checked
Normal RequirementsNotMet 98m operator-lifecycle-manager one or more requirements couldn't be found
$
Build 466 has still the same issue: https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/480/console Looking through this, I can't find anything menaingful about what's going on. The only logs I'm finding are these lines in the olm-operator logs:
2020-06-30T02:34:03.193147487Z time="2020-06-30T02:34:03Z" level=info msg="csv in operatorgroup" csv=ocs-operator.v4.5.0-467.ci id=brBbF namespace=openshift-storage opgroup=openshift-storage-operatorgroup phase=Pending
2020-06-30T02:34:04.624762784Z time="2020-06-30T02:34:04Z" level=info msg="requirements were not met" csv=ocs-operator.v4.5.0-467.ci id=brBbF namespace=openshift-storage phase=Pending
2020-06-30T02:34:04.69350063Z E0630 02:34:04.693431 1 queueinformer_operator.go:290] sync {"update" "openshift-storage/ocs-operator.v4.5.0-467.ci"} failed: requirements were not met
Can we try running the last-known good build again, just to make sure it still works? If not then this may be a problem with OCP. Can we try deploying OCS 4.5 on OCP 4.4 as well?
Nonetheless, this looks like a genuine problem and should be taken care of, giving devel_ack+.
As noted in comment #5 - even older versions that are known to work, don't work anymore. I didn't test any other version than 462. However, I agree that this might point to the problem being in OCP rather than OCS (or somewhere in the middle). If the issue was entirely in OCS, 462 should have still been deployable. *** Bug 1852607 has been marked as a duplicate of this bug. *** Looking at the CSV and all other resources, it seems like Requirements were met but for some reason CSV is not ready to accept it. This seems to be an issue in OLM dependency resolution. I also noticed that the OCS CSV is constantly getting refreshed which could be the reason for CSV not getting updated correctly. Trying to run verification of OCS 4.5 with OCP 4.4 here: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9353/ So it sounds like this is an OCP level bug. Cloning this into OCP. Please let us know if someone is looking at cluster and for how long do you need it, otherwise we will destroy it in few hours. From what I remember and tested:
| OCP 4.4 | OCP 4.5
OCS 4.4 | works | works
OCS 4.5 | broken | broken
(In reply to Petr Balogh from comment #21) > From what I remember and tested: > | OCP 4.4 | OCP 4.5 > OCS 4.4 | works | works > OCS 4.5 | broken | broken Ugh, seriously?! From all our discussions, I *thought* it was: | OCP 4.4 | OCP 4.5 old | OCP 4.5 new --------+---------+--------------+-------------- OCS 4.4 | works | works | broken(?) OCS 4.5 | works | works | broken @Petr, please double check @Michael: OCS 4.4.1 on OCP 4.4 nightly: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9443/ 3days back - Deployment OK. OCS 4.4.1 on OCP 4.5 nightly: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9453/ 3days back - Deployment OK. OCS 4.5 (4.5.0-470.ci) on OCP 4.4 nightly: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9357/ 5days back - Deployment is failing: 14:01:16 - MainThread - ocs_ci.ocs.ocp - INFO - Resource ocs-operator.v4.5.0-470.ci is in phase: Pending! 14:01:16 - MainThread - ocs_ci.utility.utils - ERROR - (check_phase) return incorrect status after 720 second timeout OCS 4.5 (4.5.0-470.ci) on OCP 4.5 nightly: https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/482/console 6days back (engineering job) - Deployment is failing: 22:36:48 - MainThread - ocs_ci.ocs.ocp - INFO - Resource ocs-operator.v4.5.0-470.ci is in phase: Pending! 22:36:48 - MainThread - ocs_ci.utility.utils - ERROR - (check_phase) return incorrect status after 720 second timeout I think that I see what I said here https://bugzilla.redhat.com/show_bug.cgi?id=1851328#c21. Do you want me to trigger the jobs again? Petr +1 to petr(In reply to Michael Adam from comment #22) > (In reply to Petr Balogh from comment #21) > > From what I remember and tested: > > | OCP 4.4 | OCP 4.5 > > OCS 4.4 | works | works > > OCS 4.5 | broken | broken > > > Ugh, seriously?! > > From all our discussions, I *thought* it was: > > | OCP 4.4 | OCP 4.5 old | OCP 4.5 new > --------+---------+--------------+-------------- > OCS 4.4 | works | works | broken(?) > OCS 4.5 | works | works | broken > > > @Petr, please double check @michael if we see Comment#17 and Comment#19, OCS 4.5 is broken on latest nightlies of OCP 4.4, OCP 4.5 and OCP 4.6 too >>What works for OCS 4.5 is -> OCS 4.5 on OCP 4.5 build older than OR equal to Jun 17th nightlies >>What works for OCS 4.4 -> OCP 4.4, OCP 4.5 (even latest nightlies) So to summarize from Petr's comment#23 > | OCP 4.4 new | OCP 4.5 old | OCP 4.5 new | OCP 4.6 new > --------+---------+--------------+-------------- > OCS 4.4 | works | works | works | not tested (n+2) > OCS 4.5 | broken | works | broken | broken Look at https://bugzilla.redhat.com/show_bug.cgi?id=1852865#c13 for more details. https://github.com/openshift/ocs-operator/pull/613 master patch merged https://github.com/openshift/ocs-operator/pull/617 backport PR. Merged. Contained in 4.5.0-479.ci / https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCS%20Build%20Pipeline%204.5/58/ Verified below combinations: 1) OCP 4.4 + OCS 4.5 - vSphere ocs-operator.v4.5.0-482.ci openshift installer (4.4.0-0.nightly-2020-07-09-063156) Job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9641/console 2) OCP 4.5 + OCS 4.5 -vSphere ocs-operator.v4.5.0-484.ci openshift installer (4.5.0-0.nightly-2020-07-07-210042) https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/9696/console Marking as Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3754 |