2102397 – OpenShift Regional Disaster Recovery with Advanced Cluster Management

Bug 2102397 - OpenShift Regional Disaster Recovery with Advanced Cluster Management

Summary: OpenShift Regional Disaster Recovery with Advanced Cluster Management

Keywords:
Status:	CLOSED DUPLICATE of bug 2104971
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.10
Hardware:	All
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	umanga
QA Contact:	krishnaram Karthick
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	2072996 2100751 (view as bug list)
Depends On:
Blocks:	2094357
TreeView+	depends on / blocked

Reported:	2022-06-29 20:50 UTC by rskruhak
Modified:	2023-08-09 17:00 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	.Ceph does not recognize global IP assigned by Globalnet. Ceph does not recognize global IP assigned by Globalnet, so disaster recovery solution cannot be configured between clusters with overlapping service CIDR using Globalnet. Due to this disaster recovery solution does not work when service `CIDR` overlaps.
Clone Of:
Environment:
Last Closed:	2022-10-29 03:39:11 UTC
Embargoed:

Attachments	(Terms of Use)

Description rskruhak 2022-06-29 20:50:00 UTC

Description of problem:
We are working on setting up DR and troubleshooting an issue we are seeing setting up a peer mirror between two clusters.  We are working to set up DR from ACM (hub cluster) and two of our other clusters.  We are running OCP version 4.10.18 and ODF 4.10.4.  So far we have successfully set up submariner add on from ACM and working to set up the peer mirror via the ODF Multicluster Orchestrator operator but still show the ceph block pool daemon and health in a warning state.
mirroringStatus:
    lastChecked: '2022-06-28T14:37:31Z'
    summary:
      daemon_health: WARNING
      health: WARNING
      image_health: OK
      states: {}
  phase: Ready

We see some errors in the token-exchange pods: 

base_controller.go:264] "managedcluster-secret-tokenexchange-controller" controller failed to sync "openshift-storage/odrbucket-6xxxxxx”, err: mirrorpeers.multicluster.odf.openshift.io is forbidden: User "system:open-cluster-management:cluster:test:addon:tokenexchange:agent:tokenexchange" cannot list resource "mirrorpeers" in API group "multicluster.odf.openshift.io" at the cluster scope
64
E0629 14:49:20.995205 1 reflector.go:138] pkg/mod/k8s.io/client-go.3/tools/cache/reflector.go:167: Failed to watch *v1alpha1.MirrorPeer: failed to list *v1alpha1.MirrorPeer: mirrorpeers.multicluster.odf.openshift.io is forbidden: User "system:open-cluster-management:cluster:test:addon:tokenexchange:agent:tokenexchange" cannot list resource "mirrorpeers" in API group "multicluster.odf.openshift.io" at the cluster scope


Version-Release number of selected component (if applicable):
ODF 4.10
OCP 4.10.18
RHACM 2.5
Submariner 12
How reproducible:
n/a

Steps to Reproduce:
1.
2.
3.

Actual results:
The token-exchange pods on the two managed clusters do not seem to be able to access the mirror peer resource on the acm hub cluster.  The ceph block pools are unable to be mirrored and can not see the peer cluster

Expected results:
The ocp/ceph clusters should be able to be mirrored and be able to set up DR
Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 rskruhak 2022-06-29 21:01:42 UTC

See this is similar to the following BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=2100751

Comment 2 Jonathan Dobson 2022-06-29 22:49:19 UTC

Re-assigning to the ODF team to investigate

Comment 4 ncho 2022-07-06 12:23:32 UTC

also seeing on 
rook-ceph-rbd-mirror pod on both clusters "stderr Failed to find physical volume "/dev/sdb"

Here is log below

[2022-06-28 17:20:25,282][ceph_volume.main][INFO  ] Running command: ceph-volume --log-path /var/log/ceph/ocs-deviceset-drstorage-0-data-XXXX raw prepare --bluestore --data /mnt/ocs-deviceset-drstorage-0-data-XXXX
[2022-06-28 17:20:25,283][ceph_volume.process][INFO  ] Running command: /usr/bin/lsblk -plno KNAME,NAME,TYPE
[2022-06-28 17:20:25,292][ceph_volume.process][INFO  ] stdout /dev/loop0 /dev/loop0 loop
[2022-06-28 17:20:25,292][ceph_volume.process][INFO  ] stdout /dev/sda   /dev/sda   disk
[2022-06-28 17:20:25,292][ceph_volume.process][INFO  ] stdout /dev/sda1  /dev/sda1  part
[2022-06-28 17:20:25,292][ceph_volume.process][INFO  ] stdout /dev/sda2  /dev/sda2  part
[2022-06-28 17:20:25,293][ceph_volume.process][INFO  ] stdout /dev/sda3  /dev/sda3  part
[2022-06-28 17:20:25,293][ceph_volume.process][INFO  ] stdout /dev/sda4  /dev/sda4  part
[2022-06-28 17:20:25,293][ceph_volume.process][INFO  ] stdout /dev/sdb   /dev/sdb   disk
[2022-06-28 17:20:25,301][ceph_volume.process][INFO  ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S lv_path=/mnt/ocs-deviceset-drstorage-0-data-XXXX -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2022-06-28 17:20:25,443][ceph_volume.process][INFO  ] Running command: /usr/bin/lsblk --nodeps -P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /mnt/ocs-deviceset-drstorage-0-data-XXXX
[2022-06-28 17:20:25,452][ceph_volume.process][INFO  ] stdout NAME="sdb" KNAME="sdb" MAJ:MIN="8:16" FSTYPE="" MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="0" MODEL="Virtual disk    " SIZE="500G" STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw----" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" SCHED="mq-deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0" PKNAME="" PARTLABEL=""
[2022-06-28 17:20:25,453][ceph_volume.process][INFO  ] Running command: /usr/sbin/blkid -c /dev/null -p /mnt/ocs-deviceset-drstorage-0-data-XXXX
[2022-06-28 17:20:25,460][ceph_volume.process][INFO  ] Running command: /usr/sbin/pvs --noheadings --readonly --units=b --nosuffix --separator=";" -o vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size /mnt/ocs-deviceset-drstorage-0-data-XXXX
[2022-06-28 17:20:25,598][ceph_volume.process][INFO  ] stderr Failed to find physical volume "/dev/sdb".
[2022-06-28 17:20:25,599][ceph_volume.util.disk][INFO  ] opening device /mnt/ocs-deviceset-drstorage-0-data-XXXX to check for BlueStore label
[2022-06-28 17:20:25,599][ceph_volume.util.disk][INFO  ] opening device /mnt/ocs-deviceset-drstorage-0-data-XXXX to check for BlueStore label
[2022-06-28 17:20:25,600][ceph_volume.process][INFO  ] Running command: /usr/sbin/udevadm info --query=property /mnt/ocs-deviceset-drstorage-0-data-XXXX
[2022-06-28 17:20:25,716][ceph_volume.process][INFO  ] stderr Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.

Comment 8 rskruhak 2022-07-07 17:27:07 UTC

One thing to add is we are using globalnet with the submariner add on with ACM due to our clusters having overlapping CIDRs.

Comment 10 umanga 2022-07-12 14:23:18 UTC

There seems to be multiple issues in this setup.

The issue with using gobalnet is tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=2104971. We'll document a workaround in that BZ.

The issue with RBACs in token-exchange pods where fixed in version 4.11. So it should no longer be an issue.
Even in 4.10, without the RBACs, there is no impact on functionality, only status updates are affected.

Comment 11 umanga 2022-07-12 14:28:23 UTC

*** Bug 2100751 has been marked as a duplicate of this bug. ***

Comment 12 umanga 2022-07-12 14:31:39 UTC

*** Bug 2072996 has been marked as a duplicate of this bug. ***

Comment 13 rskruhak 2022-07-12 20:08:55 UTC

Is there a way I can get added to the bugzilla for globalnet?
https://bugzilla.redhat.com/show_bug.cgi?id=2104971

Comment 15 Mudit Agarwal 2022-08-11 05:09:06 UTC

Pls fill doc text

Comment 17 Mudit Agarwal 2022-10-29 03:39:11 UTC


*** This bug has been marked as a duplicate of bug 2104971 ***

Note You need to log in before you can comment on or make changes to this bug.