Bug 2072996 - [GSS] ODF 4.9 RBD failed to connect to peer cluster mirror connection timeout
Summary: [GSS] ODF 4.9 RBD failed to connect to peer cluster mirror connection timeout
Keywords:
Status: CLOSED DUPLICATE of bug 2102397
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.9
Hardware: All
OS: All
unspecified
low
Target Milestone: ---
: ---
Assignee: umanga
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-07 12:41 UTC by Levy Sant'Anna
Modified: 2023-08-09 17:00 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-12 14:31:39 UTC
Embargoed:


Attachments (Terms of Use)

Comment 2 Madhu Rajanna 2022-04-11 05:45:41 UTC
>: failed to reconcile. failed to add ceph rbd mirror peer: failed to import bootstrap peer token: failed to add rbd-mirror peer token for pool "ocs-storagecluster-cephblockpool". . 2022-03-25T19:24:13.048+0000 7f829
36dc2c0 -1 librbd::api::Mirror: peer_bootstrap_import: failed to connect to peer cluster: (110) Connection timed out


This is mostly due to the network issue, rbd is not able to import the bootstrap token from the remote clusters.

Mon on ocpa cluster

a=172.40.126.252:6789,b=172.40.195.16:6789, c=172.40.111.218:6789

Mon on ocpb cluster

a=172.30.186.194:6789,b=172.30.244.89:6789,c=172.30.161.165:6789


1) We need to check is networking is proper and we are able to reach remote mon endpoints from the other cluster
2) Can we get the secret content of 5f4bdf98d073ced07cbbddf249d9f434aea2df0 (its a secret name) (which contains remote cluster details) from the ocpb cluster


Moving it to Rook, as Rook is responsible for setting up the mirroring.

Comment 3 Sébastien Han 2022-04-14 15:29:52 UTC
They are some questions coming from the support team which have not been answered yet.
Depending on the response no eng investigation would be needed.

What kind of help are you expecting from eng that we did not get from support yet?
This BZ seems a bit premature to me.

Has anyone done some simple connectivity tests between both clusters?
Thanks!

Comment 5 Sébastien Han 2022-04-19 07:25:26 UTC
The ping is not helping unfortunately, we need to test the connectivity to the mons directly so either using curl or telnet like:

curl <mon-ip>:<port> 

And see we get a reply.
Thanks!

Comment 9 Sébastien Han 2022-04-25 12:47:26 UTC
Are the telnet command executed from inside the rook-ceph operator pod? It is the one adding the peers so it's important.
Thanks.

Comment 19 Sébastien Han 2022-05-04 09:54:40 UTC
Thanks Kelson, Steve, great to see a connection could be established but it seems that the peer was not added.

Can we get:

* complete rook-ceph-operator log with DEBUG mode (you can edit the rook-ceph-operator-config and set ROOK_LOG_LEVEL: DEBUG)
* the output of "oc get cephblockpool -n openshift-storage  -o yaml"
* logs from the rbd-mirror daemon pod

Thanks.

Comment 25 Sébastien Han 2022-05-06 08:19:17 UTC
Thanks Steve, one more thing can you attach the content of the secret "0c0e4c098c43c0ddd72f93031f92addff6cfc3d"? Thanks.
I wonder if the information in the secret are correct.

Also, have you tried adding the peer "manually" while logging inside the rook operator?

Thanks

Comment 29 Sébastien Han 2022-05-09 10:16:36 UTC
Kelson,

Can you update the secret from the UI? The Operator would reconcile that afterward.
Thanks!

Comment 31 Sébastien Han 2022-05-09 15:11:14 UTC
Not sure if Ramen is the right sub-component but that's DR. This is related to the component exchanging connection details between cluster (for peer addition).
In this unsupported case, the secret content was invalid and we are trying to understand why.

Thanks!


Note You need to log in before you can comment on or make changes to this bug.