Bug 1860034 - OCS 4.6 Deployment in ocs-ci : Toolbox pod in ContainerCreationError due to key admin-secret not found
Summary: OCS 4.6 Deployment in ocs-ci : Toolbox pod in ContainerCreationError due to k...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.6.0
Assignee: Travis Nielsen
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-23 15:17 UTC by Neha Berry
Modified: 2020-12-17 06:23 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-17 06:23:00 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 647 0 None closed Bug 1860034: Toolbox updated to point to the renamed secret keys 2021-02-08 19:42:29 UTC
Red Hat Product Errata RHSA-2020:5605 0 None None None 2020-12-17 06:23:39 UTC

Description Neha Berry 2020-07-23 15:17:26 UTC
Description of problem (please be detailed as possible and provide log
snippests):
----------------------------------------------------------------------
In all Oocs-ci deployments for OCS 4.6, the toolbox pod is stuck in ContainerCreationError, resulting in the failure of the deployment (ocs-ci mandates creation of toolbox for testing)

Command used to create toolbox (from [1])

08:18:44 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc patch ocsinitialization ocsinit -n openshift-storage --type json --patch  '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'


>>POD

rook-ceph-tools-5778b9cdcf-2n4jd                                  0/1     CreateContainerConfigError   0          14m   10.0.143.161   ip-10-0-143-161.us-west-1.compute.internal   <none>           <none>

>>oc describe
Events:
  Type     Reason     Age                   From                                                 Message
  ----     ------     ----                  ----                                                 -------
  Normal   Scheduled  10m                   default-scheduler                                    Successfully assigned openshift-storage/rook-ceph-tools-5778b9cdcf-2n4jd to ip-10-0-143-161.us-west-1.compute.internal
  Warning  Failed     7m57s (x12 over 10m)  kubelet, ip-10-0-143-161.us-west-1.compute.internal  Error: couldn't find key admin-secret in Secret openshift-storage/rook-ceph-mon
  Normal   Pulled     5m1s (x25 over 10m)   kubelet, ip-10-0-143-161.us-west-1.compute.internal  Container image "quay.io/rhceph-dev/rook-ceph@sha256:7c75b8485dc2f922d6bade0e489ff05318d021e9ec634efd119132ff14949386" already present on machine

  Observations:
  ------------------

  not sure if this is the issue, but till OCS 4.5, the key wa snamed as "admin-secret" in rook-ceph-mon secret . But in OCS 4.6, it is named as "ceph-secret". But, the toolbox pod still looks for the key "admin-secret"


>> Logs

toolbox.yaml = https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/521/artifact/logs/failed_testcase_ocs_logs_1595506351/deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-260c9c0e2cee6fdb87bf20fd11417483e9da9fa1fb6de1efd6ff2b0b9761d850/ceph/namespaces/openshift-storage/pods/rook-ceph-tools-5778b9cdcf-2n4jd/

Console: https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/521/consoleFull

must-gather: https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/521/artifact/logs/failed_testcase_ocs_logs_1595506351/deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-260c9c0e2cee6fdb87bf20fd11417483e9da9fa1fb6de1efd6ff2b0b9761d850/



Version of all relevant components (if applicable):
----------------------------------------------------------------------
OCS =   4.6.0-26.ci /  ocs-olm-operator:4.6.0-504.ci
OCP  = 4.6.0-0.nightly-2020-07-23-080857

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
----------------------------------------------------------------------
The automation deployment in ocs-ci fails.

Is there any workaround available to the best of your knowledge?
----------------------------------------------------------------------
Not sure

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
----------------------------------------------------------------------
3

Can this issue reproducible?
----------------------------------------------------------------------
Yes

Can this issue reproduce from the UI?
----------------------------------------------------------------------
Not tested

If this is a regression, please provide more details to justify this:
----------------------------------------------------------------------
Yes


Steps to Reproduce:
----------------------------------------------------------------------
1. Install OCP 4.6
2. Install OCS 4.6 via ocs-ci 
3. check the toolbox pod. The ocs-ci run fails due to Toolbox pod in ContainerCreationError state

Actual results:
----------------------------------------------------------------------
Toolbox pod not able to find the admin-secret key

Expected results:
----------------------------------------------------------------------
Toolbox pod should be running and ocs-ci install should succeed.

Additional info:
----------------------------------------------------------------------

rook-ceph-mon secret from logs
---------------------------------
- apiVersion: v1
  data:
    ceph-secret: ""
    ceph-username: ""
    fsid: ""
    mon-secret: ""
  kind: Secret
  metadata:
    creationTimestamp: "2020-07-23T12:15:20Z"
    managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        f:data:
          .: {}
          f:ceph-secret: {}
          f:ceph-username: {}
          f:fsid: {}
          f:mon-secret: {}
        f:metadata:
          f:ownerReferences:
            .: {}
            k:{"uid":"484eab68-643f-47de-ae16-d8a65394ad06"}:
              .: {}
              f:apiVersion: {}
              f:blockOwnerDeletion: {}
              f:controller: {}
              f:kind: {}
              f:name: {}
              f:uid: {}
        f:type: {}
      manager: rook
      operation: Update
      time: "2020-07-23T12:15:20Z"
    name: rook-ceph-mon
    namespace: openshift-storage
    ownerReferences:
    - apiVersion: ceph.rook.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: CephCluster
      name: ocs-storagecluster-cephcluster
      uid: 484eab68-643f-47de-ae16-d8a65394ad06
    resourceVersion: "34081"
    selfLink: /api/v1/namespaces/openshift-storage/secrets/rook-ceph-mon
    uid: ad953b71-551f-4adb-a430-05c8712bdcb5
  type: kubernetes.io/rook

Comment 3 Travis Nielsen 2020-07-23 17:29:08 UTC
This is a 4.6 blocker, rather than 4.5...

There was a recent change in rook master (4.6) for the name of the secret used by converged and independent mode clusters. 
See this commit for the changes to the toolbox upstream: https://github.com/rook/rook/commit/631b13b90643176e0a1ecace8a1560c9e096872b#diff-a3be284eba3dc857b402260db93eb100
We would need a similar change to the toolbox created by OCS.

Comment 7 Travis Nielsen 2020-08-03 13:32:54 UTC
Moving to ON_QA since this is in the latest 4.6 builds.

Comment 8 Neha Berry 2020-08-04 19:49:41 UTC
Verified in ocs-operator.v4.6.0-36.ci. The toolbox pod is created successfully in the OCS 4.6 latest build. Hence, moving the BZ to verified state.

Logs folder - https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/539/artifact/logs/failed_testcase_ocs_logs_1596541019/test_deployment_ocs_logs/ocs_must_gather/



07:42:52 - MainThread - ocs_ci.ocs.utils - INFO - starting ceph toolbox pod
07:42:52 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc patch ocsinitialization ocsinit -n openshift-storage --type json --patch  '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'


...

07:42:57 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage --kubeconfig cluster/auth/kubeconfig get Pod rook-ceph-tools-6f67984956-w9m62 -n openshift-storage
07:42:58 - MainThread - ocs_ci.ocs.ocp - INFO - 1 resources already reached condition!


>> oc get rook-ceph-tools-6f67984956-w9m62 -o yaml
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-08-04T11:42:53Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-08-04T11:42:54Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-08-04T11:42:54Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-08-04T11:42:53Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://e364d7e46e764a0046ca62a2162741c3a500764194764c8b053119aeb47c29d3
    image: quay.io/rhceph-dev/rook-ceph@sha256:7fb53399b67dd59e5c810b1edea6cac0b4774dab1f850329244f68b2c03f37fc
    imageID: quay.io/rhceph-dev/rook-ceph@sha256:7fb53399b67dd59e5c810b1edea6cac0b4774dab1f850329244f68b2c03f37fc
    lastState: {}
    name: rook-ceph-tools
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2020-08-04T11:42:54Z"
  hostIP: 10.0.172.27
  phase: Running
  podIP: 10.0.172.27
  podIPs:
  - ip: 10.0.172.27
  qosClass: BestEffort
  startTime: "2020-08-04T11:42:53Z"


>>oc describe rook-ceph-tools-6f67984956-w9m62 from [2]

Events:
  Type    Reason     Age    From                                                Message
  ----    ------     ----   ----                                                -------
  Normal  Scheduled  4m46s                                                      Successfully assigned openshift-storage/rook-ceph-tools-6f67984956-w9m62 to ip-10-0-172-27.us-west-1.compute.internal
  Normal  Pulled     4m46s  kubelet, ip-10-0-172-27.us-west-1.compute.internal  Container image "quay.io/rhceph-dev/rook-ceph@sha256:7fb53399b67dd59e5c810b1edea6cac0b4774dab1f850329244f68b2c03f37fc" already present on machine
  Normal  Created    4m46s  kubelet, ip-10-0-172-27.us-west-1.compute.internal  Created container rook-ceph-tools
  Normal  Started    4m46s  kubelet, ip-10-0-172-27.us-west-1.compute.internal  Started container rook-ceph-tools


[2] - https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/ocs-ci/539/artifact/logs/failed_testcase_ocs_logs_1596541019/test_deployment_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-6bc402f2e92f1b5c72c4360b8da5aa6bfe91ee3634a608a937aa5eddab45598e/oc_output/describe_pods_-n_openshift-storage/*view*/


>>oc get csv
NAME                        DISPLAY                       VERSION       REPLACES   PHASE
ocs-operator.v4.6.0-36.ci   OpenShift Container Storage   4.6.0-36.ci              Succeeded

>>oc get pods -o wide

csi-cephfsplugin-2jflv                                            3/3     Running     0          8m26s   10.0.235.238   ip-10-0-235-238.us-west-1.compute.internal   <none>           <none>
csi-cephfsplugin-7x46p                                            3/3     Running     0          8m26s   10.0.168.135   ip-10-0-168-135.us-west-1.compute.internal   <none>           <none>
csi-cephfsplugin-bbc5r                                            3/3     Running     0          8m26s   10.0.172.27    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
csi-cephfsplugin-provisioner-5c8f64c977-47m7c                     5/5     Running     0          8m26s   10.129.2.15    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
csi-cephfsplugin-provisioner-5c8f64c977-ktdl2                     5/5     Running     0          8m26s   10.128.2.8     ip-10-0-168-135.us-west-1.compute.internal   <none>           <none>
csi-rbdplugin-gvwtn                                               3/3     Running     0          8m27s   10.0.168.135   ip-10-0-168-135.us-west-1.compute.internal   <none>           <none>
csi-rbdplugin-j9c4s                                               3/3     Running     0          8m27s   10.0.235.238   ip-10-0-235-238.us-west-1.compute.internal   <none>           <none>
csi-rbdplugin-l8rpf                                               3/3     Running     0          8m27s   10.0.172.27    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
csi-rbdplugin-provisioner-78bf66999-45n7r                         6/6     Running     0          8m26s   10.128.2.7     ip-10-0-168-135.us-west-1.compute.internal   <none>           <none>
csi-rbdplugin-provisioner-78bf66999-nv6fr                         6/6     Running     0          8m26s   10.131.0.23    ip-10-0-235-238.us-west-1.compute.internal   <none>           <none>
noobaa-core-0                                                     1/1     Running     0          5m16s   10.131.0.32    ip-10-0-235-238.us-west-1.compute.internal   <none>           <none>
noobaa-db-0                                                       1/1     Running     0          5m16s   10.129.2.23    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
noobaa-endpoint-6d84cf4645-49nqw                                  1/1     Running     0          3m29s   10.129.2.24    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
noobaa-operator-6c8489d556-8nm2w                                  1/1     Running     0          9m6s    10.129.2.12    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
ocs-operator-6cb5977cb7-52ng5                                     1/1     Running     0          9m7s    10.129.2.13    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
rook-ceph-crashcollector-ip-10-0-168-135-c5f67b4c5-hlbwg          1/1     Running     0          7m9s    10.128.2.15    ip-10-0-168-135.us-west-1.compute.internal   <none>           <none>
rook-ceph-crashcollector-ip-10-0-172-27-6b4fc8c646-dqf85          1/1     Running     0          6m55s   10.129.2.19    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
rook-ceph-crashcollector-ip-10-0-235-238-7689557766-n6z67         1/1     Running     0          6m21s   10.131.0.30    ip-10-0-235-238.us-west-1.compute.internal   <none>           <none>
rook-ceph-drain-canary-800411b2bffc077f1e724b2666dc76a0-68t9bdr   1/1     Running     0          5m13s   10.129.2.21    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
rook-ceph-drain-canary-85a40cc5f42ba13f517a273be730f279-869xpds   1/1     Running     0          5m14s   10.128.2.12    ip-10-0-168-135.us-west-1.compute.internal   <none>           <none>
rook-ceph-drain-canary-a42cc55b9ca869a4e6fa95bebb7822ee-5dgqh6j   1/1     Running     0          5m13s   10.131.0.33    ip-10-0-235-238.us-west-1.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-bd5cb6d9hqklt   1/1     Running     0          4m59s   10.128.2.14    ip-10-0-168-135.us-west-1.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-59f98496d7gzb   1/1     Running     0          4m59s   10.131.0.35    ip-10-0-235-238.us-west-1.compute.internal   <none>           <none>
rook-ceph-mgr-a-59f45c7599-cpjkt                                  1/1     Running     0          5m56s   10.129.2.18    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
rook-ceph-mon-a-b7675d879-w2h25                                   1/1     Running     0          7m9s    10.128.2.10    ip-10-0-168-135.us-west-1.compute.internal   <none>           <none>
rook-ceph-mon-b-d4cb97979-shx2g                                   1/1     Running     0          6m56s   10.129.2.17    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
rook-ceph-mon-c-6b9b694b9c-2whqz                                  1/1     Running     0          6m21s   10.131.0.29    ip-10-0-235-238.us-west-1.compute.internal   <none>           <none>
rook-ceph-operator-584998d899-5d4vg                               1/1     Running     0          9m6s    10.129.2.14    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
rook-ceph-osd-0-9648c4785-b65gp                                   1/1     Running     0          5m24s   10.128.2.13    ip-10-0-168-135.us-west-1.compute.internal   <none>           <none>
rook-ceph-osd-1-5557674b5d-42ccd                                  1/1     Running     0          5m25s   10.131.0.34    ip-10-0-235-238.us-west-1.compute.internal   <none>           <none>
rook-ceph-osd-2-84b59db885-frd28                                  1/1     Running     0          5m18s   10.129.2.22    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-frp99-99bxl          0/1     Completed   0          5m54s   10.131.0.31    ip-10-0-235-238.us-west-1.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-5cxpf-f9nh9          0/1     Completed   0          5m54s   10.128.2.11    ip-10-0-168-135.us-west-1.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-gm2gk-xz5sb          0/1     Completed   0          5m53s   10.129.2.20    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>
rook-ceph-tools-6f67984956-w9m62                                  1/1     Running     0          4m44s   10.0.172.27    ip-10-0-172-27.us-west-1.compute.internal    <none>           <none>

Comment 11 errata-xmlrpc 2020-12-17 06:23:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605


Note You need to log in before you can comment on or make changes to this bug.