Bug 2188427
Summary: | [External mode upgrade]: Upgrade from 4.12 -> 4.13 external mode is failing because rook-ceph-operator is not reaching clean state | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | shylesh <shmohan> |
Component: | ocs-operator | Assignee: | Malay Kumar parida <mparida> |
Status: | CLOSED ERRATA | QA Contact: | Petr Balogh <pbalogh> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.13 | CC: | mparida, muagarwa, nberry, nigoyal, ocs-bugs, odf-bz-bot, pbalogh, vavuthu |
Target Milestone: | --- | ||
Target Release: | ODF 4.13.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 4.13.0-181 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-06-21 15:25:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
shylesh
2023-04-20 17:39:55 UTC
The ocs-operator-config cm behaves in such a way that if the desired value is different than the current value than only it updates. In the case when for external mode upgrade happens from 4.12 to 4.13, It checks the desired value by calling the function getCephFSKernelMountOptions which returns a blank value, And compares it with the current value of the key, coincidently the configmap gives a blank value for the non-existent key. So it thinks the configmap is already in the correct state, skips adding the key to the cm. https://github.com/red-hat-storage/ocs-operator/blob/1c6879a5455aa669d8b984973cdcffd626c6eb32/controllers/storagecluster/ocs_operator_config.go#L58 A solution to this is in the function getCephFSKernelMountOptions, We can return ms_mode=legacy instead of a blank value which will have the same function but will help us to avoid this issue. https://github.com/red-hat-storage/ocs-operator/blob/1c6879a5455aa669d8b984973cdcffd626c6eb32/controllers/storagecluster/ocs_operator_config.go#L125. Will raise a fix accordingly. Hi @pbalogh , Initially, the bug was reported due to after upgrade the rook-ceph-operator pod was not coming up to running. I checked the must gather the earlier issue about rook-ceph-operator pod not starting up is not there. In fact, all the pods are clean & running. I checked the csvs all the CSV's also have succeeded. I also checked the storagecluster & it's also ready, the ceph cluster & it's connected. I checked the job details & I see few other problems like AssertionError: Job mcg-workload doesn't have any running pod & few other errors. Can you try rerunning the job? This failure looks like to be something else other than the original issue that was there, Can you rerun the job with a pause so that I can take a look inside the cluster to see whats wrong Any updates? Moving the discussion to Google chat for faster resolution. Thread Link- https://chat.google.com/room/AAAAREGEba8/vgRySkM1nXE Upgrade has passed ok again. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742 |