Bug 2002852
| Summary: | ocs-operator update from v.4.7.2 to v4.7.3 is in Installing state | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Alvaro Soto <asoto> |
| Component: | rook | Assignee: | Sébastien Han <shan> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Raz Tamir <ratamir> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.7 | CC: | blaine, ccharron, dansmall, hchiramm, hnallurv, jpinto, kelwhite, madam, mrajanna, ocs-bugs, odf-bz-bot, rar, shan, sostapov, tdesala, tnielsen |
| Target Milestone: | --- | Flags: | asoto:
needinfo-
asoto: needinfo- jrivera: needinfo? |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-15 06:36:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Seems to be like this other bz #1867024 (different version) but I'll like confirmation Hello there! any update about this issue? Please attach a must-gather from the cluster. This issue does not have enough details to troubleshoot. The issue looks related to the csi driver, moving to csi to take a look None of the must-gather links work, how can we access the logs? Got the logs from Gabriel. Thanks Ok after looking, I couldn't find anything wrong with our configuration: * the Service Accounts are declared in the CSV, however, I couldn't verify they were actually created (would be good to validate that), but since the cluster was working in the previous version, we could assume they are still present * the SCC for ceph-csi points to the correct service accounts * the ceph-csi resources are configured with the correct service account What's really strange is that both noobaa and rook-ceph have the right annotation for their respective SCC. The only thing I can think of is if the SA does not exist, then the admission controller would default the SCC to the default "restricted". Honestly, at this point, it would be good to get the input from the OCP team to see if anything changed. Also, do you known why this was not caught by QE, and do we have other customers' cases? Or is it only this customer? Can you someone from support try the following: * edit the rook-ceph-csi CSI and remove the rook-csi-rbd-attacher-sa service account from the users list * remove all the ceph-csi resources (rbd/ceph plugin and provisioned) * restart the rook-ceph operator At this point, observe the newly created ceph-csi resources, look up the SCC they use in the annotation. Thanks. Hey Sébastien, we followed the instructions, but pods continue using privileged SCC. Talking to the customer we found this cluster has auto update flag enabled so, at the moment ocs-operator is on version 4.7.4. New must-gathers in the case. Cheers! Hi Alvaro, Waiting for the OCS must gather now. Thanks Hello Sébastien, customer decided to start all over again by deleting the cluster, what will be the next steps here? Cheers! Removing needsinfo since waiting to see if more details/repro The customer agreed to close the BZ. A new installation was done, no more logs or env are available to troubleshoot further. |
Description of problem (please be detailed as possible and provide log snippests): $ omg get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.7.3 OpenShift Container Storage 4.7.3 ocs-operator.v4.7.2 Installing ~~~~~ lastTransitionTime: '2021-09-03T17:10:07Z' lastUpdateTime: '2021-09-03T17:10:07Z' message: 'installing: waiting for deployment ocs-operator to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available... ' phase: Installing reason: InstallWaiting Version of all relevant components (if applicable): ocp 4.7.5 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ceph storage unusable Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: