Description of problem (please be detailed as possible and provide log snippests): ---------------------------------------------------------------------- In all Oocs-ci deployments for OCS 4.6, the toolbox pod is stuck in ContainerCreationError, resulting in the failure of the deployment (ocs-ci mandates creation of toolbox for testing) Command used to create toolbox (from [1]) 08:18:44 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc patch ocsinitialization ocsinit -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]' >>POD rook-ceph-tools-5778b9cdcf-2n4jd 0/1 CreateContainerConfigError 0 14m 10.0.143.161 ip-10-0-143-161.us-west-1.compute.internal <none> <none> >>oc describe Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 10m default-scheduler Successfully assigned openshift-storage/rook-ceph-tools-5778b9cdcf-2n4jd to ip-10-0-143-161.us-west-1.compute.internal Warning Failed 7m57s (x12 over 10m) kubelet, ip-10-0-143-161.us-west-1.compute.internal Error: couldn't find key admin-secret in Secret openshift-storage/rook-ceph-mon Normal Pulled 5m1s (x25 over 10m) kubelet, ip-10-0-143-161.us-west-1.compute.internal Container image "quay.io/rhceph-dev/rook-ceph@sha256:7c75b8485dc2f922d6bade0e489ff05318d021e9ec634efd119132ff14949386" already present on machine Observations: ------------------ not sure if this is the issue, but till OCS 4.5, the key wa snamed as "admin-secret" in rook-ceph-mon secret . But in OCS 4.6, it is named as "ceph-secret". But, the toolbox pod still looks for the key "admin-secret" >> Logs toolbox.yaml = https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/521/artifact/logs/failed_testcase_ocs_logs_1595506351/deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-260c9c0e2cee6fdb87bf20fd11417483e9da9fa1fb6de1efd6ff2b0b9761d850/ceph/namespaces/openshift-storage/pods/rook-ceph-tools-5778b9cdcf-2n4jd/ Console: https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/521/consoleFull must-gather: https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/521/artifact/logs/failed_testcase_ocs_logs_1595506351/deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-260c9c0e2cee6fdb87bf20fd11417483e9da9fa1fb6de1efd6ff2b0b9761d850/ Version of all relevant components (if applicable): ---------------------------------------------------------------------- OCS = 4.6.0-26.ci / ocs-olm-operator:4.6.0-504.ci OCP = 4.6.0-0.nightly-2020-07-23-080857 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ---------------------------------------------------------------------- The automation deployment in ocs-ci fails. Is there any workaround available to the best of your knowledge? ---------------------------------------------------------------------- Not sure Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? ---------------------------------------------------------------------- 3 Can this issue reproducible? ---------------------------------------------------------------------- Yes Can this issue reproduce from the UI? ---------------------------------------------------------------------- Not tested If this is a regression, please provide more details to justify this: ---------------------------------------------------------------------- Yes Steps to Reproduce: ---------------------------------------------------------------------- 1. Install OCP 4.6 2. Install OCS 4.6 via ocs-ci 3. check the toolbox pod. The ocs-ci run fails due to Toolbox pod in ContainerCreationError state Actual results: ---------------------------------------------------------------------- Toolbox pod not able to find the admin-secret key Expected results: ---------------------------------------------------------------------- Toolbox pod should be running and ocs-ci install should succeed. Additional info: ---------------------------------------------------------------------- rook-ceph-mon secret from logs --------------------------------- - apiVersion: v1 data: ceph-secret: "" ceph-username: "" fsid: "" mon-secret: "" kind: Secret metadata: creationTimestamp: "2020-07-23T12:15:20Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: .: {} f:ceph-secret: {} f:ceph-username: {} f:fsid: {} f:mon-secret: {} f:metadata: f:ownerReferences: .: {} k:{"uid":"484eab68-643f-47de-ae16-d8a65394ad06"}: .: {} f:apiVersion: {} f:blockOwnerDeletion: {} f:controller: {} f:kind: {} f:name: {} f:uid: {} f:type: {} manager: rook operation: Update time: "2020-07-23T12:15:20Z" name: rook-ceph-mon namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true controller: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 484eab68-643f-47de-ae16-d8a65394ad06 resourceVersion: "34081" selfLink: /api/v1/namespaces/openshift-storage/secrets/rook-ceph-mon uid: ad953b71-551f-4adb-a430-05c8712bdcb5 type: kubernetes.io/rook
This is a 4.6 blocker, rather than 4.5... There was a recent change in rook master (4.6) for the name of the secret used by converged and independent mode clusters. See this commit for the changes to the toolbox upstream: https://github.com/rook/rook/commit/631b13b90643176e0a1ecace8a1560c9e096872b#diff-a3be284eba3dc857b402260db93eb100 We would need a similar change to the toolbox created by OCS.
Moving to ON_QA since this is in the latest 4.6 builds.
Verified in ocs-operator.v4.6.0-36.ci. The toolbox pod is created successfully in the OCS 4.6 latest build. Hence, moving the BZ to verified state. Logs folder - https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/539/artifact/logs/failed_testcase_ocs_logs_1596541019/test_deployment_ocs_logs/ocs_must_gather/ 07:42:52 - MainThread - ocs_ci.ocs.utils - INFO - starting ceph toolbox pod 07:42:52 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc patch ocsinitialization ocsinit -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]' ... 07:42:57 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage --kubeconfig cluster/auth/kubeconfig get Pod rook-ceph-tools-6f67984956-w9m62 -n openshift-storage 07:42:58 - MainThread - ocs_ci.ocs.ocp - INFO - 1 resources already reached condition! >> oc get rook-ceph-tools-6f67984956-w9m62 -o yaml status: conditions: - lastProbeTime: null lastTransitionTime: "2020-08-04T11:42:53Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2020-08-04T11:42:54Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: "2020-08-04T11:42:54Z" status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2020-08-04T11:42:53Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://e364d7e46e764a0046ca62a2162741c3a500764194764c8b053119aeb47c29d3 image: quay.io/rhceph-dev/rook-ceph@sha256:7fb53399b67dd59e5c810b1edea6cac0b4774dab1f850329244f68b2c03f37fc imageID: quay.io/rhceph-dev/rook-ceph@sha256:7fb53399b67dd59e5c810b1edea6cac0b4774dab1f850329244f68b2c03f37fc lastState: {} name: rook-ceph-tools ready: true restartCount: 0 started: true state: running: startedAt: "2020-08-04T11:42:54Z" hostIP: 10.0.172.27 phase: Running podIP: 10.0.172.27 podIPs: - ip: 10.0.172.27 qosClass: BestEffort startTime: "2020-08-04T11:42:53Z" >>oc describe rook-ceph-tools-6f67984956-w9m62 from [2] Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m46s Successfully assigned openshift-storage/rook-ceph-tools-6f67984956-w9m62 to ip-10-0-172-27.us-west-1.compute.internal Normal Pulled 4m46s kubelet, ip-10-0-172-27.us-west-1.compute.internal Container image "quay.io/rhceph-dev/rook-ceph@sha256:7fb53399b67dd59e5c810b1edea6cac0b4774dab1f850329244f68b2c03f37fc" already present on machine Normal Created 4m46s kubelet, ip-10-0-172-27.us-west-1.compute.internal Created container rook-ceph-tools Normal Started 4m46s kubelet, ip-10-0-172-27.us-west-1.compute.internal Started container rook-ceph-tools [2] - https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/539/artifact/logs/failed_testcase_ocs_logs_1596541019/test_deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-6bc402f2e92f1b5c72c4360b8da5aa6bfe91ee3634a608a937aa5eddab45598e/oc_output/describe_pods_-n_openshift-storage/*view*/ >>oc get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.6.0-36.ci OpenShift Container Storage 4.6.0-36.ci Succeeded >>oc get pods -o wide csi-cephfsplugin-2jflv 3/3 Running 0 8m26s 10.0.235.238 ip-10-0-235-238.us-west-1.compute.internal <none> <none> csi-cephfsplugin-7x46p 3/3 Running 0 8m26s 10.0.168.135 ip-10-0-168-135.us-west-1.compute.internal <none> <none> csi-cephfsplugin-bbc5r 3/3 Running 0 8m26s 10.0.172.27 ip-10-0-172-27.us-west-1.compute.internal <none> <none> csi-cephfsplugin-provisioner-5c8f64c977-47m7c 5/5 Running 0 8m26s 10.129.2.15 ip-10-0-172-27.us-west-1.compute.internal <none> <none> csi-cephfsplugin-provisioner-5c8f64c977-ktdl2 5/5 Running 0 8m26s 10.128.2.8 ip-10-0-168-135.us-west-1.compute.internal <none> <none> csi-rbdplugin-gvwtn 3/3 Running 0 8m27s 10.0.168.135 ip-10-0-168-135.us-west-1.compute.internal <none> <none> csi-rbdplugin-j9c4s 3/3 Running 0 8m27s 10.0.235.238 ip-10-0-235-238.us-west-1.compute.internal <none> <none> csi-rbdplugin-l8rpf 3/3 Running 0 8m27s 10.0.172.27 ip-10-0-172-27.us-west-1.compute.internal <none> <none> csi-rbdplugin-provisioner-78bf66999-45n7r 6/6 Running 0 8m26s 10.128.2.7 ip-10-0-168-135.us-west-1.compute.internal <none> <none> csi-rbdplugin-provisioner-78bf66999-nv6fr 6/6 Running 0 8m26s 10.131.0.23 ip-10-0-235-238.us-west-1.compute.internal <none> <none> noobaa-core-0 1/1 Running 0 5m16s 10.131.0.32 ip-10-0-235-238.us-west-1.compute.internal <none> <none> noobaa-db-0 1/1 Running 0 5m16s 10.129.2.23 ip-10-0-172-27.us-west-1.compute.internal <none> <none> noobaa-endpoint-6d84cf4645-49nqw 1/1 Running 0 3m29s 10.129.2.24 ip-10-0-172-27.us-west-1.compute.internal <none> <none> noobaa-operator-6c8489d556-8nm2w 1/1 Running 0 9m6s 10.129.2.12 ip-10-0-172-27.us-west-1.compute.internal <none> <none> ocs-operator-6cb5977cb7-52ng5 1/1 Running 0 9m7s 10.129.2.13 ip-10-0-172-27.us-west-1.compute.internal <none> <none> rook-ceph-crashcollector-ip-10-0-168-135-c5f67b4c5-hlbwg 1/1 Running 0 7m9s 10.128.2.15 ip-10-0-168-135.us-west-1.compute.internal <none> <none> rook-ceph-crashcollector-ip-10-0-172-27-6b4fc8c646-dqf85 1/1 Running 0 6m55s 10.129.2.19 ip-10-0-172-27.us-west-1.compute.internal <none> <none> rook-ceph-crashcollector-ip-10-0-235-238-7689557766-n6z67 1/1 Running 0 6m21s 10.131.0.30 ip-10-0-235-238.us-west-1.compute.internal <none> <none> rook-ceph-drain-canary-800411b2bffc077f1e724b2666dc76a0-68t9bdr 1/1 Running 0 5m13s 10.129.2.21 ip-10-0-172-27.us-west-1.compute.internal <none> <none> rook-ceph-drain-canary-85a40cc5f42ba13f517a273be730f279-869xpds 1/1 Running 0 5m14s 10.128.2.12 ip-10-0-168-135.us-west-1.compute.internal <none> <none> rook-ceph-drain-canary-a42cc55b9ca869a4e6fa95bebb7822ee-5dgqh6j 1/1 Running 0 5m13s 10.131.0.33 ip-10-0-235-238.us-west-1.compute.internal <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-bd5cb6d9hqklt 1/1 Running 0 4m59s 10.128.2.14 ip-10-0-168-135.us-west-1.compute.internal <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-59f98496d7gzb 1/1 Running 0 4m59s 10.131.0.35 ip-10-0-235-238.us-west-1.compute.internal <none> <none> rook-ceph-mgr-a-59f45c7599-cpjkt 1/1 Running 0 5m56s 10.129.2.18 ip-10-0-172-27.us-west-1.compute.internal <none> <none> rook-ceph-mon-a-b7675d879-w2h25 1/1 Running 0 7m9s 10.128.2.10 ip-10-0-168-135.us-west-1.compute.internal <none> <none> rook-ceph-mon-b-d4cb97979-shx2g 1/1 Running 0 6m56s 10.129.2.17 ip-10-0-172-27.us-west-1.compute.internal <none> <none> rook-ceph-mon-c-6b9b694b9c-2whqz 1/1 Running 0 6m21s 10.131.0.29 ip-10-0-235-238.us-west-1.compute.internal <none> <none> rook-ceph-operator-584998d899-5d4vg 1/1 Running 0 9m6s 10.129.2.14 ip-10-0-172-27.us-west-1.compute.internal <none> <none> rook-ceph-osd-0-9648c4785-b65gp 1/1 Running 0 5m24s 10.128.2.13 ip-10-0-168-135.us-west-1.compute.internal <none> <none> rook-ceph-osd-1-5557674b5d-42ccd 1/1 Running 0 5m25s 10.131.0.34 ip-10-0-235-238.us-west-1.compute.internal <none> <none> rook-ceph-osd-2-84b59db885-frd28 1/1 Running 0 5m18s 10.129.2.22 ip-10-0-172-27.us-west-1.compute.internal <none> <none> rook-ceph-osd-prepare-ocs-deviceset-0-data-0-frp99-99bxl 0/1 Completed 0 5m54s 10.131.0.31 ip-10-0-235-238.us-west-1.compute.internal <none> <none> rook-ceph-osd-prepare-ocs-deviceset-1-data-0-5cxpf-f9nh9 0/1 Completed 0 5m54s 10.128.2.11 ip-10-0-168-135.us-west-1.compute.internal <none> <none> rook-ceph-osd-prepare-ocs-deviceset-2-data-0-gm2gk-xz5sb 0/1 Completed 0 5m53s 10.129.2.20 ip-10-0-172-27.us-west-1.compute.internal <none> <none> rook-ceph-tools-6f67984956-w9m62 1/1 Running 0 4m44s 10.0.172.27 ip-10-0-172-27.us-west-1.compute.internal <none> <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5605