Description of problem (please be detailed as possible and provide log snippests): From events you can see: ook-ceph@sha256:e4e20a1e8756a8b9847def42a60aa117d8ab5633c6eaec3f8013132c2800c72c" already present on machine openshift-storage 27s Normal Started pod/rook-ceph-detect-version-rn4x6 Started container init-copy-binaries openshift-storage 27s Normal Created pod/rook-ceph-detect-version-rn4x6 Created container init-copy-binaries openshift-storage 26s Normal Started pod/rook-ceph-detect-version-rn4x6 Started container cmd-reporter openshift-storage 26s Normal Created pod/rook-ceph-detect-version-rn4x6 Created container cmd-reporter openshift-storage 26s Normal Pulled pod/rook-ceph-detect-version-rn4x6 Container image "quay.io/rhceph-dev/rhceph@sha256:2aca817ad21c8b204d8fdee03a0cfee6e2cc7a177b0b25b46d4fabb9c3f099b3" already present on machine openshift-storage <unknown> Normal Scheduled pod/rook-ceph-detect-version-wctf6 Successfully assigned openshift-storage/rook-ceph-detect-version-wctf6 to ip-10-0-129-52.us-east-2.compute.internal openshift-storage 21s Normal SuccessfulCreate job/rook-ceph-detect-version Created pod: rook-ceph-detect-version-wctf6 openshift-storage 19s Normal Created pod/rook-ceph-detect-version-wctf6 Created container init-copy-binaries openshift-storage 19s Normal Pulled pod/rook-ceph-detect-version-wctf6 Container image "quay.io/rhceph-dev/rook-ceph@sha256:e4e20a1e8756a8b9847def42a60aa117d8ab5633c6eaec3f8013132c2800c72c" already present on machine openshift-storage 19s Normal Started pod/rook-ceph-detect-version-wctf6 Started container init-copy-binaries openshift-storage 18s Normal Created pod/rook-ceph-detect-version-wctf6 Created container cmd-reporter openshift-storage 18s Normal Pulled pod/rook-ceph-detect-version-wctf6 Container image "quay.io/rhceph-dev/rhceph@sha256:2aca817ad21c8b204d8fdee03a0cfee6e2cc7a177b0b25b46d4fabb9c3f099b3" already present on machine openshift-storage 18s Normal Started pod/rook-ceph-detect-version-wctf6 Started container cmd-reporter openshift-storage 15s Normal SuccessfulCreate job/rook-ceph-detect-version Created pod: rook-ceph-detect-version-jgzsc openshift-storage <unknown> Normal Scheduled pod/rook-ceph-detect-version-jgzsc Successfully assigned openshift-storage/rook-ceph-detect-version-jgzsc to ip-10-0-129-52.us-east-2.compute.internal openshift-storage 13s Normal Pulled pod/rook-ceph-detect-version-jgzsc Container image "quay.io/rhceph-dev/rook-ceph@sha256:e4e20a1e8756a8b9847def42a60aa117d8ab5633c6eaec3f8013132c2800c72c" already present on machine openshift-storage 13s Normal Created pod/rook-ceph-detect-version-jgzsc Created container init-copy-binaries openshift-storage 13s Normal Started pod/rook-ceph-detect-version-jgzsc Started container init-copy-binaries openshift-storage 12s Normal Pulled pod/rook-ceph-detect-version-jgzsc Container image "quay.io/rhceph-dev/rhceph@sha256:2aca817ad21c8b204d8fdee03a0cfee6e2cc7a177b0b25b46d4fabb9c3f099b3" already present on machine openshift-storage 12s Normal Created pod/rook-ceph-detect-version-jgzsc Created container cmd-reporter openshift-storage 12s Normal Started pod/rook-ceph-detect-version-jgzsc Started container cmd-reporter openshift-storage <unknown> Normal Scheduled pod/rook-ceph-detect-version-qxhls Successfully assigned openshift-storage/rook-ceph-detect-version-qxhls to ip-10-0-129-52.us-east-2.compute.internal openshift-storage 9s Normal SuccessfulCreate job/rook-ceph-detect-version Created pod: rook-ceph-detect-version-qxhls openshift-storage 7s Normal Pulled pod/rook-ceph-detect-version-qxhls Container image "quay.io/rhceph-dev/rook-ceph@sha256:e4e20a1e8756a8b9847def42a60aa117d8ab5633c6eaec3f8013132c2800c72c" already present on machine openshift-storage 7s Normal Started pod/rook-ceph-detect-version-qxhls Started container init-copy-binaries openshift-storage 7s Normal Created pod/rook-ceph-detect-version-qxhls Created container init-copy-binaries openshift-storage 6s Normal Created pod/rook-ceph-detect-version-qxhls Created container cmd-reporter openshift-storage 6s Normal Pulled pod/rook-ceph-detect-version-qxhls Container image "quay.io/rhceph-dev/rhceph@sha256:2aca817ad21c8b204d8fdee03a0cfee6e2cc7a177b0b25b46d4fabb9c3f099b3" already present on machine openshift-storage 6s Normal Started pod/rook-ceph-detect-version-qxhls Started container cmd-reporter openshift-storage <unknown> Normal Scheduled pod/rook-ceph-detect-version-p65nt Successfully assigned openshift-storage/rook-ceph-detect-version-p65nt to ip-10-0-129-52.us-east-2.compute.internal openshift-storage 3s Normal SuccessfulCreate job/rook-ceph-detect-version Created pod: rook-ceph-detect-version-p65nt openshift-storage 1s Normal Started pod/rook-ceph-detect-version-p65nt Started container init-copy-binaries openshift-storage 1s Normal Pulled pod/rook-ceph-detect-version-p65nt Container image "quay.io/rhceph-dev/rook-ceph@sha256:e4e20a1e8756a8b9847def42a60aa117d8ab5633c6eaec3f8013132c2800c72c" already present on machine openshift-storage 1s Normal Created pod/rook-ceph-detect-version-p65nt Created container init-copy-binaries $ oc get pod -n openshift-storage NAME READY STATUS RESTARTS AGE noobaa-operator-b4ff6749d-fvphd 1/1 Running 0 33m ocs-operator-6b9cbfb878-w7c5x 0/1 Running 0 33m rook-ceph-detect-version-fprzc 0/1 Init:0/1 0 0s rook-ceph-operator-75b8479457-cm72h 1/1 Running 0 33m $ oc get pod -n openshift-storage NAME READY STATUS RESTARTS AGE noobaa-operator-b4ff6749d-fvphd 1/1 Running 0 32m ocs-operator-6b9cbfb878-w7c5x 0/1 Running 0 32m rook-ceph-detect-version-hkllm 0/1 Terminating 0 8s rook-ceph-operator-75b8479457-cm72h 1/1 Running 0 32m Operator logs: {"level":"info","ts":"2020-05-11T12:52:02.179Z","logger":"controller_storagecluster","msg":"Waiting on ceph cluster to initialize before starting noobaa","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"} {"level":"error","ts":"2020-05-11T12:52:02.184Z","logger":"controller_storagecluster","msg":"Failed to set PhaseProgressing","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","error":"Operation cannot be fulfilled on storageclusters.ocs.openshift.io \"ocs-storagecluster\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/go-logr/zapr/zapr.go:128\ngithub.com/openshift/ocs-operator/pkg/controller/storagecluster.(*ReconcileStorageCluster).Reconcile\n\t/go/src/github.com/openshift/ocs-operator/pkg/controller/storagecluster/reconcile.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker-fm\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} {"level":"error","ts":"2020-05-11T12:52:02.197Z","logger":"controller_storagecluster","msg":"Failed to set PhaseProgressing","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","error":"Operation cannot be fulfilled on storageclusters.ocs.openshift.io \"ocs-storagecluster\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/go-logr/zapr/zapr.go:128\ngithub.com/openshift/ocs-operator/pkg/controller/storagecluster.(*ReconcileStorageCluster).Reconcile\n\t/go/src/github.com/openshift/ocs-operator/pkg/controller/storagecluster/reconcile.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker-fm\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} {"level":"error","ts":"2020-05-11T12:52:02.203Z","logger":"controller_storagecluster","msg":"Failed to update status","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","error":"Operation cannot be fulfilled on storageclusters.ocs.openshift.io \"ocs-storagecluster\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/go-logr/zapr/zapr.go:128\ngithub.com/openshift/ocs-operator/pkg/controller/storagecluster.(*ReconcileStorageCluster).Reconcile\n\t/go/src/github.com/openshift/ocs-operator/pkg/controller/storagecluster/reconcile.go:332\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker-fm\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} {"level":"error","ts":"2020-05-11T12:52:02.203Z","logger":"controller-runtime.controller","msg":"Reconciler error","controller":"storagecluster-controller","request":"openshift-storage/ocs-storagecluster","error":"Operation cannot be fulfilled on storageclusters.ocs.openshift.io \"ocs-storagecluster\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker-fm\n\t/go/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} {"level":"info","ts":"2020-05-11T12:52:03.203Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"} {"level":"info","ts":"2020-05-11T12:52:03.218Z","logger":"controller_storagecluster","msg":"not creating a CephObjectStore because the platform is AWS"} {"level":"info","ts":"2020-05-11T12:52:03.271Z","logger":"controller_storagecluster","msg":"Waiting on ceph cluster to initialize before starting noobaa","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"} {"level":"info","ts":"2020-05-11T12:52:03.303Z","logger":"controller_storagecluster","msg":"Reconciling StorageCluster","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"} {"level":"info","ts":"2020-05-11T12:52:03.322Z","logger":"controller_storagecluster","msg":"not creating a CephObjectStore because the platform is AWS"} {"level":"info","ts":"2020-05-11T12:52:03.366Z","logger":"controller_storagecluster","msg":"Waiting on ceph cluster to initialize before starting noobaa","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"} W0511 13:04:10.857621 1 reflector.go:289] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204: watch of *v1.Template ended with: The resourceVersion for the provided watch is too old. W0511 13:12:40.910782 1 reflector.go:289] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204: watch of *v1.Template ended with: The resourceVersion for the provided watch is too old. W0511 13:25:17.991870 1 reflector.go:289] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:204: watch of *v1.Template ended with: The resourceVersion for the provided watch is too old. Version of all relevant components (if applicable): OCS 4.5 internal build over OCP 4.4 nightly build Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yep, as we cannot even deploy OCS Is there any workaround available to the best of your knowledge? NO Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Haven't tried yet but mostly yes. Can this issue reproduce from the UI? Haven't tried If this is a regression, please provide more details to justify this: Yes Steps to Reproduce: 1. Install OCS 4.5 from internal build Actual results: Not able to deploy Expected results: Have deployment of OCS passed Additional info: Must gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr2034-b1671/jnk-pr2034-b1671_20200511T121001/logs/failed_testcase_ocs_logs_1589199655/deployment_ocs_logs/ Jenkins run: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/7425/
Just missed to add OCS build: 4.5.0-419.ci
From the rook operator log, the csi version is not being detected and therefore failing to load the csi driver: 2020-05-11 12:51:47.996190 E | op-cluster: invalid csi version: failed to extract ceph CSI version: failed to parse version from: "quay.io/rhceph-dev/cephcsi@sha256:86087a7123945ce4f7f720539693395e5a6fc8175318d050d0d983af8ea0e216" Then later rook fails when attempting to start the cluster because the csi driver is not initialized. 2020-05-11 14:02:46.148402 E | op-cluster: failed to create cluster in namespace "openshift-storage". failed to start the mons: failed to initialize ceph cluster info: failed to save mons: failed to update csi cluster config: failed to fetch current csi config map: configmaps "rook-ceph-csi-config" not found OCS needs to set the operator flag ROOK_CSI_ALLOW_UNSUPPORTED_VERSION: "true" since the CSI version is not being detected from the downstream image. There is a fix for this in OCS: https://github.com/openshift/ocs-operator/pull/501/files and there is a related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1832889
I see that even 4.5.0-423.ci was not deployed with the same issue. Looking here to engineering job: https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/440/console I triggered on vmware here: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/Tier1/job/qe-trigger-vsphere-upi-1az-rhcos-vsan-3m-3w-tier1/4/console But expect this to fail as well.
We need to get this merged for the CSI versioning issue: https://github.com/openshift/ocs-operator/pull/501
Thanks Travis for info. I see the latest build was made 13 hours ago: 4.5.0-425.ci And deployment in engineering pipeline failing with the same: https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/441/console https://github.com/openshift/ocs-operator/pull/501 see got merged 18 hours ago. Not sure if this build suppose to have the fix or not but looks like it's still not deployable.
@Petr The backport PR was missed. Now we need to get this one merged and run a new build. https://github.com/openshift/ocs-operator/pull/512
Still doesn't work with latest 4.5 build. Our job failed here: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/7855/consoleFull quay.io/rhceph-dev/ocs-olm-operator:4.5.0-431.ci Logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr2133-b1784/jnk-pr2133-b1784_20200521T094017/logs/failed_testcase_ocs_logs_1590054189/deployment_ocs_logs/
@umanga thanks for the analysis. There are two possible changes for this: 1. The OCS operator would generate the storageClassDeviceSet in the CephCluster CR with the "data" name instead of blank 2. Rook should default to "data" if the name is blank We need #2 anyway for the Rook default and backward compatability, so will make the change in Rook.
Local testing looks good, now working on getting the PR merged and backported... https://github.com/rook/rook/pull/5524
The fix is merged to the downstream release-4.5 branch now and will be picked up in the next 4.5 build. https://github.com/openshift/rook/pull/60
In the latests builds we don't see this issue. So marking as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3754