1879008 – ocs-osd-removal job fails because it can't find admin-secret in rook-ceph-mon secret

Bug 1879008 - ocs-osd-removal job fails because it can't find admin-secret in rook-ceph-mon secret

Summary: ocs-osd-removal job fails because it can't find admin-secret in rook-ceph-mon...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.6.0
Assignee:	Servesha
QA Contact:	Rachael
Docs Contact:
URL:
Whiteboard:
Depends On:	1886348
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-15 08:11 UTC by Rachael
Modified:	2020-12-17 06:24 UTC (History)
CC List:	8 users (show)
Fixed In Version:	4.6.0-144.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-12-17 06:24:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift ocs-operator pull 648	0	None	closed	Avoid bash script to purge osds in job template	2021-02-19 00:41:36 UTC
Red Hat Product Errata	RHSA-2020:5605	0	None	None	None	2020-12-17 06:24:35 UTC

Description Rachael 2020-09-15 08:11:17 UTC

Description of problem (please be detailed as possible and provide log
snippests):

The ocs-osd-removal job fails on OCS 4.6 with the following error:

ocs-osd-removal-0-t67lq                                           0/1     Init:CreateContainerConfigError   0          2s


  Warning  Failed          8m50s (x12 over 10m)  kubelet, compute-0  Error: couldn't find key admin-secret in Secret openshift-storage/rook-ceph-mon

It looks like the admin-secret key in the rook-ceph-mon secret is not present in OCS 4.6:

$ oc get secret rook-ceph-mon -o yaml
apiVersion: v1
data:
  ceph-secret: QVFDYWIyQmZreGVnTkJBQWJXRTBJbmR5dVRDWWZLYmtTTXpjRFE9PQ==
  ceph-username: Y2xpZW50LmFkbWlu
  fsid: ZDQ3ZjNhZGYtODJjNy00YTViLTk4ZDEtZmU1YTA1NDk0MjZh
  mon-secret: QVFDYWIyQmZlT3VwTVJBQUdPUllPUHlhdDQyaWJ2cnRtdzZEMmc9PQ==
kind: Secret

rook-ceph-mon secret from OCS 4.5:
==================================
$ oc get secret rook-ceph-mon -o yaml
apiVersion: v1
data:
  admin-secret: QVFBd2IxOWZzbE1yTEJBQTFwZnVFSlZ1SVJkM3RBNTUrRTgzdXc9PQ==
  cluster-name: b3BlbnNoaWZ0LXN0b3JhZ2U=
  fsid: ZjI1YWNkMDQtMzhkYi00NDk2LTk1NGEtODlmNDY3MmUyNjIx
  mon-secret: QVFBd2IxOWYxOTBGS1JBQW1XQ3c4ZEV4bWxubzBxOTAvWlJzanc9PQ==
kind: Secret


Version of all relevant components (if applicable):
OCP: 4.6.0-0.nightly-2020-09-12-080441
ocs-operator.v4.6.0-553.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, disk replacement can't be performed


Is there any workaround available to the best of your knowledge?

Not that I am aware of


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2


Can this issue reproducible?
Yes


Can this issue reproduce from the UI?
No


If this is a regression, please provide more details to justify this:
Yes, because ocs-osd-removal job was successful in OCS 4.5


Steps to Reproduce:
1. Scale down osd to be replaced/removed
2. Run the osd removal job
$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_ID=${osd_id_to_remove} | oc create -f -

3. Check ocs-osd-removal pod status


Actual results:
The pod is in Init:CreateContainerConfigError


Expected results:
The job should be successful and the status should be Completed

Comment 4 Travis Nielsen 2020-09-15 17:29:03 UTC

This will be resolved by the PR already in progress. 
https://github.com/openshift/ocs-operator/pull/648

Comment 10 Mudit Agarwal 2020-10-11 13:21:42 UTC

We have two opened BZs for the same issue, if the issue reported in https://bugzilla.redhat.com/show_bug.cgi?id=1886348 is different than the one originally reported in this BZ then the current BZ should be moved back to ON_QA and can be verified once https://bugzilla.redhat.com/show_bug.cgi?id=1886348 is fixed.

If both issues are same then https://bugzilla.redhat.com/show_bug.cgi?id=1886348 should be duped to this.

Comment 11 Servesha 2020-10-12 08:03:20 UTC

@Mudit I don't think both the issues are same. For now we can keep both the BZs. The current one and https://bugzilla.redhat.com/show_bug.cgi?id=1886348

Comment 12 Mudit Agarwal 2020-10-12 09:13:32 UTC

Discussed offline with Servesha, this doesn't need any further code change, however the testing would be blocked till we have a fix for https://bugzilla.redhat.com/show_bug.cgi?id=1886348

Hence, moving it to MODIFIED. Will move to ON_QA once https://bugzilla.redhat.com/show_bug.cgi?id=1886348 is fixed.

Comment 15 errata-xmlrpc 2020-12-17 06:24:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605

Note You need to log in before you can comment on or make changes to this bug.