Description of problem (please be detailed as possible and provide log snippests): ODF4.12 Installation, ocs-operator.v4.12.0 and mcg-operator.v4.12.0 failed Version of all relevant components (if applicable): OCP version:4.12.0-0.nightly-2022-09-05-090751 ODF Version: 4.12.0-29 Provider: [Tested on AWS and Vmare] Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: Test Process: 1.Install ODF 4.12 via UI: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr6408b3472/jnk-pr6408b3472_20220905T195700/logs/ui_logs_dir_1662411046/screenshots_ui/test_deployment/ 2. Disabling default source: redhat-operators $ oc patch operatorhub.config.openshift.io/cluster -p='{"spec":{"sources":[{"disabled":true,"name":"redhat-operators"}]}}' --type=merge 3.Adding CatalogSource: apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: labels: ocs-operator-internal: 'true' name: redhat-operators namespace: openshift-marketplace spec: displayName: Openshift Container Storage icon: base64data: '' mediatype: '' image: quay.io/rhceph-dev/ocs-registry:4.12.0-29 priority: 100 publisher: Red Hat sourceType: grpc updateStrategy: registryPoll: interval: 15m $ oc apply -f /tmp/catalog_source_manifestjjp4iruz 4. Verify Catalog source redhat-operators is in READY state: $ oc -n openshift-marketplace get CatalogSource redhat-operators -n openshift-marketplace -o yaml 5.Check CSV: $ oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-operator-lifecycle-manager packageserver Package Server 0.19.0 Succeeded openshift-storage mcg-operator.v4.12.0 NooBaa Operator 4.12.0 Failed openshift-storage ocs-operator.v4.12.0 OpenShift Container Storage 4.12.0 Failed openshift-storage odf-csi-addons-operator.v4.12.0 CSI Addons 4.12.0 Succeeded openshift-storage odf-operator.v4.12.0 OpenShift Data Foundation 4.12.0 Succeeded $ oc get csv ocs-operator.v4.12.0 NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.12.0 OpenShift Container Storage 4.12.0 Failed Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal RequirementsUnknown 46m operator-lifecycle-manager requirements not yet checked Normal RequirementsNotMet 46m operator-lifecycle-manager one or more requirements couldn't be found Normal InstallWaiting 46m operator-lifecycle-manager installing: waiting for deployment ocs-operator to become ready: deployment "ocs-operator" not available: Deployment does not have minimum availability. Warning InstallCheckFailed 41m operator-lifecycle-manager install timeout Normal NeedsReinstall 41m (x2 over 41m) operator-lifecycle-manager installing: waiting for deployment rook-ceph-operator to become ready: deployment "rook-ceph-operator" not available: Deployment does not have minimum availability. Normal AllRequirementsMet 41m (x3 over 46m) operator-lifecycle-manager all requirements found, attempting install Normal InstallSucceeded 41m (x3 over 46m) operator-lifecycle-manager waiting for install components to report healthy Normal InstallWaiting 41m (x3 over 45m) operator-lifecycle-manager installing: waiting for deployment rook-ceph-operator to become ready: deployment "rook-ceph-operator" not available: Deployment does not have minimum availability. Warning InstallCheckFailed 36m operator-lifecycle-manager install failed: deployment rook-ceph-operator not ready before timeout: deployment "rook-ceph-operator" exceeded its progress deadline 6.Check StorageCluster: $ oc get storageclusters.ocs.openshift.io NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 32m Error 2022-09-05T19:32:18Z 4.11.0 Status: Conditions: Last Heartbeat Time: 2022-09-05T19:54:12Z Last Transition Time: 2022-09-05T19:32:19Z Message: Error while reconciling: some StorageClasses [ocs-storagecluster-cephfs,ocs-storagecluster-ceph-rbd] were skipped while waiting for pre-requisites to be met Reason: ReconcileFailed Status: False Type: ReconcileComplete Last Heartbeat Time: 2022-09-05T19:32:19Z Last Transition Time: 2022-09-05T19:32:19Z Message: Initializing StorageCluster Reason: Init Status: False Type: Available Last Heartbeat Time: 2022-09-05T19:32:19Z Last Transition Time: 2022-09-05T19:32:19Z Message: Initializing StorageCluster Reason: Init Status: True Type: Progressing Last Heartbeat Time: 2022-09-05T19:32:19Z Last Transition Time: 2022-09-05T19:32:19Z Message: Initializing StorageCluster Reason: Init Status: False Type: Degraded Last Heartbeat Time: 2022-09-05T19:32:19Z Last Transition Time: 2022-09-05T19:32:19Z Message: Initializing StorageCluster Reason: Init Status: Unknown Type: Upgradeable Actual results: Expected results: Additional info: OCP+ODF Must Gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr6408b3472/jnk-pr6408b3472_20220905T195700/logs/failed_testcase_ocs_logs_1662411046/deployment_ocs_logs/
Keeping this BZ for ocs-metrics-exporter, have cloned two BZs one for rook and another for noobaa
https://github.com/red-hat-storage/ocs-operator/pull/1813 removes privileged access from ocs-metrics-exporter and should fix these SCC errors. Any latest 4.12 builds can be used to test.
Bug Fixed. PR Validation Job pass. https://github.com/red-hat-storage/ocs-ci/pull/6573/files OCP Version: 4.12.0-0.nightly-2022-10-18-192348 ODF Version: 4.12.0-77 Provider: Vmware
ODF4.12 installation failed on AWS_UPI_RHEL without WA $ kubectl label --overwrite ns openshift-storage \ pod-security.kubernetes.io/enforce=privileged \ pod-security.kubernetes.io/warn=baseline \ pod-security.kubernetes.io/audit=baseline SetUp: OCP Version: 4.12 ODF Version: 4.12 Provider: AWS_RHEL_UPI OCP MG: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr6573b3719/jnk-pr6573b3719_20221024T112000/logs/failed_testcase_ocs_logs_1666610685/deployment_ocs_logs/ Jenkins Job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-trigger-test-pr/3719/testReport/tests.ecosystem.deployment/test_deployment/test_deployment/
This is now fixed in OLM, please try with the latest build.
Bug reproduced on latest version OCP Version:4.12 ODF Version:4.12 Provider: AWS_UPI https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-trigger-test-pr/3733/testReport/tests.ecosystem.deployment/test_deployment/test_deployment/ failed on setup with "ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n default create -f /tmp/POD_43ysp0qc -o yaml. Error is Error from server (Forbidden): error when creating "/tmp/POD_43ysp0qc": pods "rhel-ansible" is forbidden: violates PodSecurity "restricted:latest": host namespaces (hostNetwork=true, hostPID=true), privileged (container "rhel" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "rhel" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "rhel" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "rhel" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "rhel" must not set runAsUser=0), seccompProfile (pod or container "rhel" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")"
Hi Oded, I see you are trying to create a pod in the default namespace. And That is what is the cause of the error. In latest OLM changes, they are automatically labelling namespaces only which are prefixed with openshift-, They are not touching any other NS, and here the ns in question is not any openshift-* NS, hence the error. I am not very sure about how the installation happens in different methods, But it seems like if we want to use the default ns, we have to label it beforehand.
Bug Fixed. rhel-ansible pod is part of OCS-CI infra. PR validation job pass on AWS_UPI https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-trigger-test-pr/3744/