1867130 – [External] Provisioning fails for OCS PVCs when the MON leader is down on the RHCS cluster

Bug 1867130 - [External] Provisioning fails for OCS PVCs when the MON leader is down on the RHCS cluster

Summary: [External] Provisioning fails for OCS PVCs when the MON leader is down on the...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.5.0
Assignee:	Sébastien Han
QA Contact:	Rachael
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-07 12:31 UTC by Rachael
Modified:	2020-09-23 09:06 UTC (History)
CC List:	7 users (show)
Fixed In Version:	4.5.0-526.ci
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-15 10:18:38 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	rook rook pull 6018	0	None	closed	ceph: fix health check for external cluster	2020-12-16 06:16:23 UTC
Red Hat Product Errata	RHBA-2020:3754	0	None	None	None	2020-09-15 10:19:02 UTC

Comment 4 Travis Nielsen 2020-08-07 14:09:47 UTC

Rook only requires a single mon to seed the external cluster, but then it is expected that Rook will query for the full mon quorum periodically and update the configmap with the full set of mons.

@Rachael Was the mon down since OCS first connected, or did the mon go down after that initial connection? 

The rook operator log [1] is full of these messages. Since it says it could connect successfully, it doesn't make sense why it stilled the health check.

2020-08-03T08:35:13.798015964Z 2020-08-03 08:35:13.797974 W | op-mon: skipping mon health check since cluster details are not initialized
2020-08-03T08:35:24.325634174Z 2020-08-03 08:35:24.325596 I | op-config: CephCluster "openshift-storage" status: "Connected". "Cluster connected successfully"


[1] http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/BZ-1867130/must-gather.local.9029136130735763706/quay-io-rhceph-dev-ocs-must-gather-sha256-6639c93bf3de91c2a71d069d2e2516d6b4aa894b8101852abb8a0aafccfa976c/ceph/namespaces/openshift-storage/pods/rook-ceph-operator-66c694f9b9-c5jjw/rook-ceph-operator/rook-ceph-operator/logs/current.log

Comment 6 Sébastien Han 2020-08-10 09:48:54 UTC

Neha, the cm "regression" is due to the bug and will be solved by my current PR.
Expect this to land today.

Thanks

Comment 12 errata-xmlrpc 2020-09-15 10:18:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754

Note You need to log in before you can comment on or make changes to this bug.