Bug 1867024
| Summary: | [ocs-operator] operator v4.6.0-519.ci is in Installing state | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Persona non grata <nobody+410372> |
| Component: | rook | Assignee: | Travis Nielsen <tnielsen> |
| Status: | CLOSED ERRATA | QA Contact: | Elad <ebenahar> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.6 | CC: | ebenahar, madam, muagarwa, ocs-bugs, owasserm, ratamir, shan, sostapov, tnielsen, vavuthu |
| Target Milestone: | --- | Keywords: | Automation, AutomationBlocker, Regression |
| Target Release: | OCS 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-12-17 06:23:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Persona non grata
2020-08-07 07:43:35 UTC
> Events for operator $ oc describe csv ocs-operator.v4.6.0-519.ci Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal RequirementsUnknown 50m (x3 over 50m) operator-lifecycle-manager requirements not yet checked Normal RequirementsNotMet 50m (x2 over 50m) operator-lifecycle-manager one or more requirements couldn't be found Normal InstallWaiting 50m operator-lifecycle-manager installing: waiting for deployment rook-ceph-operator to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available... Normal InstallSucceeded 49m (x2 over 49m) operator-lifecycle-manager install strategy completed with no errors Warning ComponentUnhealthy 49m (x2 over 49m) operator-lifecycle-manager installing: waiting for deployment ocs-operator to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available... Normal AllRequirementsMet 49m (x4 over 50m) operator-lifecycle-manager all requirements found, attempting install Normal InstallSucceeded 49m (x4 over 50m) operator-lifecycle-manager waiting for install components to report healthy Normal InstallWaiting 49m (x3 over 50m) operator-lifecycle-manager installing: waiting for deployment ocs-operator to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available... Normal NeedsReinstall 44m (x3 over 49m) operator-lifecycle-manager installing: waiting for deployment ocs-operator to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available... Warning InstallCheckFailed 4m18s (x17 over 44m) operator-lifecycle-manager install timeout > pods $ oc get pods NAME READY STATUS RESTARTS AGE noobaa-operator-8bbdb49b9-jfglj 1/1 Running 0 47m ocs-operator-577696f445-s7tl6 0/1 Running 0 47m rook-ceph-operator-8ff886855-htz6t 1/1 Running 0 47m Should this be a proposed blocker for 4.5? We have hit this in 4.5 (In reply to Mudit Agarwal from comment #4) > Should this be a proposed blocker for 4.5? We have hit this in 4.5 Deployment passed with OCP 4.5 + OCS 4.5 ( https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/10648/consoleFull ) Deployment failed with OCP 4.5 + OCS 4.6 ( eng job: https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/545/consoleFull ) Deploymnet failed with OCP 4.6 + OCS 4.6 ( https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/10647/ ) From the above, we can see issue with OCS 4.6 Thanks, moved it to 4.6 The StorageCluster is reporting "CephCluster not reporting status". Looking at the Rook-Ceph logs, we seem to have a problem with ServiceAccount permissions: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sshreeka-aws02/sshreeka-aws02_20200807T054732/logs/failed_testcase_ocs_logs_1596779805/deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-052faa341918e6f7f6543c26c9dd820aee27b1139f51e772302e978652e2b2a3/ceph/namespaces/openshift-storage/pods/rook-ceph-operator-8ff886855-htz6t/rook-ceph-operator/rook-ceph-operator/logs/current.log Travis, can you provide more insight? I can't seem to find information about ServiceAccounts in the must-gather... @Jose It seems none of the RBAC was applied for the operator to have access to the CRDs. Nothing seems to be working in the operator. The OCS Operator is picking up the latest rook v1.4.0 now, right? It smells like it could be related to the change of RBAC to remove the aggregate rules: https://github.com/rook/rook/pull/5970/commits/5b4d2c8cbc8832d40db7802bf3043fe798166131 Is there something in the CSV generation since that change? OCS-op just merged the rebase on Rook-Ceph 1.4 today, so this issue might go away with today's build. So let's try with another build soon and this might go away... Thanks. @Shreekar Do you have the full must gather for the failed cluster? Or a cluster that is still running with this issue? I'd like to look at the ClusterRoles and other RBAC that were generated in the 4.6 cluster. Found the issue, the service account names in the CSV was not being properly generated since the aggregated rules were removed. https://github.com/rook/rook/pull/6046 The fix has been merged to the downstream branch, it will be picked up in the next 4.6 build. https://github.com/openshift/rook/pull/103 OCS 4.6 deployment works well (v4.6.0-97.ci) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5605 |