Description of problem: We are working on setting up DR and troubleshooting an issue we are seeing setting up a peer mirror between two clusters. We are working to set up DR from ACM (hub cluster) and two of our other clusters. We are running OCP version 4.10.18 and ODF 4.10.4. So far we have successfully set up submariner add on from ACM and working to set up the peer mirror via the ODF Multicluster Orchestrator operator but still show the ceph block pool daemon and health in a warning state. mirroringStatus: lastChecked: '2022-06-28T14:37:31Z' summary: daemon_health: WARNING health: WARNING image_health: OK states: {} phase: Ready We see some errors in the token-exchange pods: base_controller.go:264] "managedcluster-secret-tokenexchange-controller" controller failed to sync "openshift-storage/odrbucket-6xxxxxx”, err: mirrorpeers.multicluster.odf.openshift.io is forbidden: User "system:open-cluster-management:cluster:test:addon:tokenexchange:agent:tokenexchange" cannot list resource "mirrorpeers" in API group "multicluster.odf.openshift.io" at the cluster scope 64 E0629 14:49:20.995205 1 reflector.go:138] pkg/mod/k8s.io/client-go.3/tools/cache/reflector.go:167: Failed to watch *v1alpha1.MirrorPeer: failed to list *v1alpha1.MirrorPeer: mirrorpeers.multicluster.odf.openshift.io is forbidden: User "system:open-cluster-management:cluster:test:addon:tokenexchange:agent:tokenexchange" cannot list resource "mirrorpeers" in API group "multicluster.odf.openshift.io" at the cluster scope Version-Release number of selected component (if applicable): ODF 4.10 OCP 4.10.18 RHACM 2.5 Submariner 12 How reproducible: n/a Steps to Reproduce: 1. 2. 3. Actual results: The token-exchange pods on the two managed clusters do not seem to be able to access the mirror peer resource on the acm hub cluster. The ceph block pools are unable to be mirrored and can not see the peer cluster Expected results: The ocp/ceph clusters should be able to be mirrored and be able to set up DR Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
See this is similar to the following BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2100751
Re-assigning to the ODF team to investigate
also seeing on rook-ceph-rbd-mirror pod on both clusters "stderr Failed to find physical volume "/dev/sdb" Here is log below [2022-06-28 17:20:25,282][ceph_volume.main][INFO ] Running command: ceph-volume --log-path /var/log/ceph/ocs-deviceset-drstorage-0-data-XXXX raw prepare --bluestore --data /mnt/ocs-deviceset-drstorage-0-data-XXXX [2022-06-28 17:20:25,283][ceph_volume.process][INFO ] Running command: /usr/bin/lsblk -plno KNAME,NAME,TYPE [2022-06-28 17:20:25,292][ceph_volume.process][INFO ] stdout /dev/loop0 /dev/loop0 loop [2022-06-28 17:20:25,292][ceph_volume.process][INFO ] stdout /dev/sda /dev/sda disk [2022-06-28 17:20:25,292][ceph_volume.process][INFO ] stdout /dev/sda1 /dev/sda1 part [2022-06-28 17:20:25,292][ceph_volume.process][INFO ] stdout /dev/sda2 /dev/sda2 part [2022-06-28 17:20:25,293][ceph_volume.process][INFO ] stdout /dev/sda3 /dev/sda3 part [2022-06-28 17:20:25,293][ceph_volume.process][INFO ] stdout /dev/sda4 /dev/sda4 part [2022-06-28 17:20:25,293][ceph_volume.process][INFO ] stdout /dev/sdb /dev/sdb disk [2022-06-28 17:20:25,301][ceph_volume.process][INFO ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S lv_path=/mnt/ocs-deviceset-drstorage-0-data-XXXX -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size [2022-06-28 17:20:25,443][ceph_volume.process][INFO ] Running command: /usr/bin/lsblk --nodeps -P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /mnt/ocs-deviceset-drstorage-0-data-XXXX [2022-06-28 17:20:25,452][ceph_volume.process][INFO ] stdout NAME="sdb" KNAME="sdb" MAJ:MIN="8:16" FSTYPE="" MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="0" MODEL="Virtual disk " SIZE="500G" STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw----" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" SCHED="mq-deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0" PKNAME="" PARTLABEL="" [2022-06-28 17:20:25,453][ceph_volume.process][INFO ] Running command: /usr/sbin/blkid -c /dev/null -p /mnt/ocs-deviceset-drstorage-0-data-XXXX [2022-06-28 17:20:25,460][ceph_volume.process][INFO ] Running command: /usr/sbin/pvs --noheadings --readonly --units=b --nosuffix --separator=";" -o vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size /mnt/ocs-deviceset-drstorage-0-data-XXXX [2022-06-28 17:20:25,598][ceph_volume.process][INFO ] stderr Failed to find physical volume "/dev/sdb". [2022-06-28 17:20:25,599][ceph_volume.util.disk][INFO ] opening device /mnt/ocs-deviceset-drstorage-0-data-XXXX to check for BlueStore label [2022-06-28 17:20:25,599][ceph_volume.util.disk][INFO ] opening device /mnt/ocs-deviceset-drstorage-0-data-XXXX to check for BlueStore label [2022-06-28 17:20:25,600][ceph_volume.process][INFO ] Running command: /usr/sbin/udevadm info --query=property /mnt/ocs-deviceset-drstorage-0-data-XXXX [2022-06-28 17:20:25,716][ceph_volume.process][INFO ] stderr Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
One thing to add is we are using globalnet with the submariner add on with ACM due to our clusters having overlapping CIDRs.
There seems to be multiple issues in this setup. The issue with using gobalnet is tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=2104971. We'll document a workaround in that BZ. The issue with RBACs in token-exchange pods where fixed in version 4.11. So it should no longer be an issue. Even in 4.10, without the RBACs, there is no impact on functionality, only status updates are affected.
*** Bug 2100751 has been marked as a duplicate of this bug. ***
*** Bug 2072996 has been marked as a duplicate of this bug. ***
Is there a way I can get added to the bugzilla for globalnet? https://bugzilla.redhat.com/show_bug.cgi?id=2104971
Pls fill doc text
*** This bug has been marked as a duplicate of bug 2104971 ***