Bug 1870338

Summary: OCS 4.6 must-gather : ocs-must-gather-xxx-helper pod in ContainerCreationError (couldn't find key admin-secret)
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Neha Berry <nberry>
Component: must-gatherAssignee: Pulkit Kundra <pkundra>
Status: CLOSED ERRATA QA Contact: Shay Rozen <srozen>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: ebenahar, muagarwa, ocs-bugs, sabose
Target Milestone: ---Keywords: AutomationBackLog
Target Release: OCS 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.6.0-98.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-17 06:23:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Neha Berry 2020-08-19 19:51:55 UTC
Description of problem (please be detailed as possible and provide log
snippests):
----------------------------------------------------------------------
OCS 4.6 must-gather -xxx-helper pod unable to come to Running state due to following error

Events:
  Type     Reason          Age               From                                                 Message
  ----     ------          ----              ----                                                 -------
  Normal   Scheduled       <unknown>                                                              Successfully assigned openshift-storage/must-gather-l69j4-helper to ip-10-0-218-163.us-east-2.compute.internal
  Normal   AddedInterface  21s               multus                                               Add eth0 [10.129.2.94/23]
  Normal   Pulled          7s (x4 over 21s)  kubelet, ip-10-0-218-163.us-east-2.compute.internal  Container image "quay.io/rhceph-dev/rook-ceph@sha256:12f0c3a9b0bf8d9b48b1b7f8a73e5826e43d87fb1d67b16d58a01efb47ea0b8b" already present on machine
  Warning  Failed          7s (x4 over 21s)  kubelet, ip-10-0-218-163.us-east-2.compute.internal  Error: couldn't find key admin-secret in Secret openshift-storage/rook-ceph-mon


$ oc logs pod must-gather-l69j4-helper -n openshift-storage
Error from server (NotFound): pods "pod" not found


$ oc logs  must-gather-l69j4-helper -n openshift-storage
Error from server (BadRequest): container "must-gather-helper" in pod "must-gather-l69j4-helper" is waiting to start: CreateContainerConfigError

Moreover, the must-gather keeps waiting for the pod for 50 retries, which results in extra time to be wasted.

waiting for helper pod to come up in openshift-storage namespace. Retrying 1
waiting for helper pod to come up in openshift-storage namespace. Retrying 2



Version of all relevant components (if applicable):
----------------------------------------------------------------------

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-08-18-165040   True        False         4h5m    Cluster version is 4.6.0-0.nightly-2020-08-18-165040

$ oc get csv -n openshift-storage
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.6.0-533.ci   OpenShift Container Storage   4.6.0-533.ci              Succeeded




Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
----------------------------------------------------------------------
Yes the Clone PVC is in Pending state and endless retries are happening

Is there any workaround available to the best of your knowledge?
----------------------------------------------------------------------
Not sure

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
----------------------------------------------------------------------

2

Can this issue reproducible?
----------------------------------------------------------------------

Yes

Can this issue reproduce from the UI?
----------------------------------------------------------------------
Yes

If this is a regression, please provide more details to justify this:
----------------------------------------------------------------------
No

Steps to Reproduce:
----------------------------------------------------------------------
1. Create an OCS + OCP 4.6 cluster on AWS/Vmware
2. Start OCS must-gather and confirm that the must-gather-helper pod stays in ContainerCreationError due to 
"Error: couldn't find key admin-secret in Secret openshift-storage/rook-ceph-mon"


Actual results:
----------------------------------------------------------------------
Must-gather collects log but the must-gather-helper pod stays in ContainerCreationError state

Error: couldn't find key admin-secret in Secret openshift-storage/rook-ceph-mon


Expected results:
----------------------------------------------------------------------
No error should be seen


Additional info:
----------------------------------------------------------------------

$ oc get pods -A -w|grep must-gather
openshift-must-gather-5fqhk                        must-gather-vh4jw                                                     1/1     Running                      0          79m
openshift-must-gather-dch9x                        must-gather-l69j4                                                     0/1     Init:0/1                     0          39s
openshift-must-gather-jcm49                        must-gather-pk42m                                                     1/1     Running                      0          21m
openshift-storage                                  must-gather-l69j4-helper                                              0/1     CreateContainerConfigError   0          30s
openshift-must-gather-dch9x                        ip-10-0-151-182us-east-2computeinternal-debug                         0/1     Pending                      0          0s
openshift-must-gather-dch9x                        ip-10-0-151-182us-east-2computeinternal-debug                         0/1     ContainerCreating            0          1s
openshift-must-gather-dch9x                        ip-10-0-151-182us-east-2computeinternal-debug                         0/1     Error                        0          2s
openshift-must-gather-dch9x                        ip-10-0-151-182us-east-2computeinternal-debug                         0/1     Terminating                  0          2s
openshift-must-gather-dch9x                        ip-10-0-151-182us-east-2computeinternal-debug                         0/1     Terminating                  0          2s
openshift-must-gather-dch9x                        ip-10-0-165-190us-east-2computeinternal-debug                         0/1     Pending                      0          0s
openshift-must-gather-dch9x                        ip-10-0-165-190us-east-2computeinternal-debug                         0/1     ContainerCreating            0          0s
openshift-must-gather-dch9x                        ip-10-0-165-190us-east-2computeinternal-debug                         0/1     Error                        0          1s
openshift-must-gather-dch9x                        ip-10-0-165-190us-east-2computeinternal-debug                         0/1     Terminating                  0          1s
openshift-must-gather-dch9x                        ip-10-0-165-190us-east-2computeinternal-debug                         0/1     Terminating                  0          1s
openshift-must-gather-dch9x                        ip-10-0-218-163us-east-2computeinternal-debug                         0/1     Pending                      0          0s
openshift-must-gather-dch9x                        ip-10-0-218-163us-east-2computeinternal-debug                         0/1     ContainerCreating            0          0s
openshift-must-gather-dch9x                        ip-10-0-218-163us-east-2computeinternal-debug                         0/1     Error                        0          1s
openshift-must-gather-dch9x                        ip-10-0-218-163us-east-2computeinternal-debug                         0/1     Terminating                  0          2s
openshift-must-gather-dch9x                        ip-10-0-218-163us-east-2computeinternal-debug                         0/1     Terminating                  0          2s
openshift-storage                                  must-gather-l69j4-helper                                              0/1     Terminating                  0          4m46s
openshift-storage                                  must-gather-l69j4-helper                                              0/1     Terminating                  0          4m53s
openshift-storage                                  must-gather-l69j4-helper                                              0/1     Terminating                  0          4m53s
openshift-must-gather-dch9x                        must-gather-l69j4                                                     0/1     PodInitializing              0          5m2s
openshift-must-gather-dch9x                        must-gather-l69j4                                                     1/1     Running                      0          5m3s
openshift-must-gather-dch9x                        must-gather-l69j4                                                     1/1     Terminating                  0          6m31s
openshift-must-gather-dch9x                        must-gather-l69j4                                                     1/1     Terminating                  0          6m31s


$ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.6 
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/rhceph-dev/ocs-must-gather:latest-4.6
[must-gather      ] OUT namespace/openshift-must-gather-dch9x created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-fp728 created
[must-gather      ] OUT pod for plug-in image quay.io/rhceph-dev/ocs-must-gather:latest-4.6 created

...

[must-gather-l69j4] POD collecting dump of clusterresourceversion
[must-gather-l69j4] POD waiting for helper pod to come up in openshift-storage namespace. Retrying 1
[must-gather-l69j4] POD waiting for helper pod to come up in openshift-storage namespace. Retrying 2

...

[must-gather-l69j4] POD waiting for helper pod to come up in openshift-storage namespace. Retrying 50
[must-gather-l69j4] POD collecting command output for: ceph auth list

...

[must-gather-l69j4] POD error: unable to upgrade connection: container not found ("must-gather-helper")
[must-gather-l69j4] POD collecting command output for: ceph-volume lvm list

Comment 8 Shay Rozen 2020-11-09 11:23:01 UTC
Verified on version 4.6.0-149.ci

openshift-storage                                  must-gather-6h9l7-helper                                          1/1     Running     0          31s

Comment 10 errata-xmlrpc 2020-12-17 06:23:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605