Description of problem (please be detailed as possible and provide log snippests): Storagecluster move to error state after odf_lso installation via UI Version of all relevant components (if applicable): OpenShift version: 4.9.0-0.nightly-2021-09-10-170926 LSO Version:4.8.0-202107291502 ODF Version:4.9.0-132.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Install OCP cluster on Vmware platform: OpenShift version: 4.9.0-0.nightly-2021-09-10-170926 2.Install Local storage Operator: LSO Version:4.8.0-202107291502 3.Install ODF Operator: ODF Version:4.9.0-132.ci 4.Add Disks to worker nodes via vcenter. 5.Create StorageSystem 6.Check StorageCluster status: Get StorageCluster status: $ oc get storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 3m23s Error 2021-09-13T11:26:51Z $ oc describe storagecluster Status: Images: Phase: Error Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedValidation 3m51s controller_storagecluster failed to validate StorageDeviceSet 0: no StorageClass specified 7.Check pods status [on openshift-storage project] $ oc get pods NAME READY STATUS RESTARTS AGE noobaa-operator-7859f67cbc-2crmr 1/1 Running 0 17m ocs-metrics-exporter-787686dbfd-mlqzw 1/1 Running 0 17m ocs-operator-fd5fd568f-lxzft 1/1 Running 0 17m odf-console-75f8bb874d-k7jsp 2/2 Running 0 17m odf-operator-controller-manager-5c6f854875-6v72t 2/2 Running 0 17m rook-ceph-operator-bd8ffff7c-56qpb 1/1 Running 0 17m 7.Check pv status: $ oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-33c8520a 100Gi RWO Delete Available localblock 8m47s local-pv-bcd51894 100Gi RWO Delete Available localblock 8m47s local-pv-f1262717 100Gi RWO Delete Available localblock 8m46s 8.Check pvc status [on openshift-storage project]: $ oc get pvc No resources found in openshift-storage namespace. Actual results: Storagecluster move to error state Expected results: Storagecluster move to Ready state Additional info: https://docs.google.com/document/d/1Fo2qtBbYNaLzYSkw1qYbjGEBWfN44FZzz_EZ4sceWlg/edit
Moving it to the OCS
Must-gather: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2003651/
Also happens with internal mode. Maybe not related to LSO. Version installed ODF Version:4.9.0-132.ci oc get storageclusters.ocs.openshift.io NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 4m51s Error 2021-09-13T12:58:52Z 4.9.0 Status: Conditions: Last Heartbeat Time: 2021-09-13T12:59:34Z Last Transition Time: 2021-09-13T12:58:52Z Message: Error while reconciling: some StorageClasses [ocs-storagecluster-cephfs,ocs-storagecluster-ceph-rbd,ocs-storagecluster-ceph-rbd-thick] were skipped while waiting for pre-requisites to be met
Internal mode recovered after few minutes. Removing feature_blocker and urgent. I think I'll open a seperate BZ
@pjiandan Priyanka, do you know if anyone is looking at this?
This issue reconstructed with LSO4.9 OCP Vesrion:4.9.0-0.nightly-2021-09-10-170926 ODF Version:4.9.0-132.ci LSO Version:4.9.0-202109101110 for more information: https://docs.google.com/document/d/156nnw0XDoZnIHkalo5mycLEAaNH9RP6NZbiUhBlU9es/edit
This is a UI issue. The StorageClassName is not populated correctly: storageDeviceSets: - config: {} count: 3 dataPVCTemplate: metadata: {} spec: accessModes: - ReadWriteOnce resources: requests: storage: '1' storageClassName: '' volumeMode: Block @Afreen will be looking into it.
(In reply to afrahman from comment #10) > I looked into the issue.The issue is with create new sc step. > Till the time fix is merged. The workarounds are : > > 1) type out storage class name along with volume set in input field Isn't this the default behaviour in the UI? Or are you saying that the workaround is not to use UI at all, but come up with StorageCluster yaml file yourself and deploy it into openshift-storage namespace? > 2) existing storage class option can be used if you have a lvset created > already There is a bug which makes this no longer possible: bz 2004185
LSO deployment [Full deployment] via UI pass on OCP4.10 Setup: Provider:Vmware OCP version:4.10.0-0.nightly-2021-09-30-041351 ODF Version:4.9.0-164.ci LSO Version:4.9.0-202109210853 Test Procedure: 1.Deploy OCP4.10 cluster on Vmwarw plaform: OCP Version:4.10.0-0.nightly-2021-09-30-041351 2.Install LSO operator: LSO Version:4.9.0-202109210853 $ oc create -f https://raw.githubusercontent.com/red-hat-storage/ocs-ci/master/ocs_ci/templates/ocs-deployment/local-storage-optional-operators.yaml imagecontentsourcepolicy.operator.openshift.io/olmcontentsourcepolicy created catalogsource.operators.coreos.com/optional-operators created 3.Install ODF operator: ODF Version: 4.9.0-164.ci 4.Add Disks [100G]to Worker nodes via Vcenter: 5.Create Storage System 6.Get Ceph status: sh-4.4$ ceph status cluster: id: 574cedec-3e55-4985-9f0b-5bc1e3eec9ec health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 8m) mgr: a(active, since 8m) mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3 up (since 8m), 3 in (since 8m) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 11 pools, 177 pgs objects: 331 objects, 128 MiB usage: 322 MiB used, 300 GiB / 300 GiB avail pgs: 177 active+clean io: client: 852 B/s rd, 10 KiB/s wr, 1 op/s rd, 0 op/s wr for more details: https://docs.google.com/document/d/19xeFCYcERckWasC2fo_cIhgBcgeq4ElGXgZTHS-onFg/edit
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056