Bug 2166354
| Summary: | [RDR][CEPHFS][Tracker] sync/replication is getting stopped for some pvc rsync: connection unexpectedly closed | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Pratik Surve <prsurve> |
| Component: | odf-dr | Assignee: | Vishal Thapar <vthapar> |
| odf-dr sub component: | ramen | QA Contact: | krishnaram Karthick <kramdoss> |
| Status: | ASSIGNED --- | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | bmekhiss, kseeger, muagarwa, nyechiel, odf-bz-bot, rtalur, vthapar |
| Version: | 4.12 | Flags: | vthapar:
needinfo-
vthapar: needinfo- |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Known Issue | |
| Doc Text: |
Cause:
Endpointslices synced to broker don't use source namespace and are stored in broker namespace. When two different clusters export same service name but different namespace only one of them can exist in broker namespace
Consequence:
Only one Endpointslice will be synced to remote cluster. As different clusters try to keep syncing their endpointslices, one in broker keeps flipping to service from different clusters. Depending on which EPSlice is currently synced, DNS queries for one not synced will fail.
Workaround (if any):
Don't use same service name but different namespace when exporting service in different clusters. If it is essentially same service, use same namespace. If not, use a different service name.
Result:
Queries to one of the service can intermittently fail.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Pratik Surve
2023-02-01 14:50:04 UTC
Issue root caused to be same as https://github.com/submariner-io/lighthouse/pull/964 This is list of serviceimports on broker: kubectl get serviceimports -n submariner-broker |grep vmware-dccp-one volsync-rsync-dst-dd-io-pvc-1-busybox-workloads-2-vmware-dccp-one ClusterSetIP ["172.30.223.46"] 15d volsync-rsync-dst-dd-io-pvc-1-busybox-workloads-3-vmware-dccp-one ClusterSetIP ["172.30.159.0"] 6d16h volsync-rsync-dst-dd-io-pvc-2-busybox-workloads-2-vmware-dccp-one ClusterSetIP ["172.30.193.46"] 15d volsync-rsync-dst-dd-io-pvc-2-busybox-workloads-3-vmware-dccp-one ClusterSetIP ["172.30.221.179"] 6d16h volsync-rsync-dst-dd-io-pvc-3-busybox-workloads-2-vmware-dccp-one ClusterSetIP ["172.30.234.101"] 15d volsync-rsync-dst-dd-io-pvc-3-busybox-workloads-3-vmware-dccp-one ClusterSetIP ["172.30.214.203"] 6d16h volsync-rsync-dst-dd-io-pvc-4-busybox-workloads-2-vmware-dccp-one ClusterSetIP ["172.30.129.69"] 15d volsync-rsync-dst-dd-io-pvc-4-busybox-workloads-3-vmware-dccp-one ClusterSetIP ["172.30.117.108"] 6d16h volsync-rsync-dst-dd-io-pvc-5-busybox-workloads-2-vmware-dccp-one ClusterSetIP ["172.30.169.148"] 15d volsync-rsync-dst-dd-io-pvc-5-busybox-workloads-3-vmware-dccp-one ClusterSetIP ["172.30.18.31"] 6d16h volsync-rsync-dst-dd-io-pvc-6-busybox-workloads-2-vmware-dccp-one ClusterSetIP ["172.30.199.117"] 15d volsync-rsync-dst-dd-io-pvc-6-busybox-workloads-3-vmware-dccp-one ClusterSetIP ["172.30.81.47"] 6d16h volsync-rsync-dst-dd-io-pvc-7-busybox-workloads-2-vmware-dccp-one ClusterSetIP ["172.30.58.224"] 15d volsync-rsync-dst-dd-io-pvc-7-busybox-workloads-3-vmware-dccp-one ClusterSetIP ["172.30.244.154"] 6d16h This is list of endpointslices: kubectl get endpointslices -n submariner-broker |grep vmware-dccp-one volsync-rsync-dst-dd-io-pvc-1-vmware-dccp-one IPv4 8022 10.131.1.64 15d volsync-rsync-dst-dd-io-pvc-2-vmware-dccp-one IPv4 8022 10.131.1.45 15d volsync-rsync-dst-dd-io-pvc-3-vmware-dccp-one IPv4 8022 10.131.1.70 15d volsync-rsync-dst-dd-io-pvc-4-vmware-dccp-one IPv4 8022 10.128.3.194 15d volsync-rsync-dst-dd-io-pvc-5-vmware-dccp-one IPv4 8022 10.131.1.48 15d volsync-rsync-dst-dd-io-pvc-6-vmware-dccp-one IPv4 8022 10.129.2.138 15d volsync-rsync-dst-dd-io-pvc-7-vmware-dccp-one IPv4 8022 10.131.1.49 15d This causes endpointslice information on dst cluster to flip. In lghthouse Coredns we also use the namespace information when replying to queries. Depending on which endpointslice is currently synced from broker, queries can fail. Not familiar enough with sync/replication solution to give hypothesis for why failure is not too frequent. A workaround to try for now would be to avoid using same servicename across namespaces. If this workaround works, it will also confirm the issue and we can work for getting the fix into ACM 2.7. Currently fix is only in 0.15.0 and won't land until 2.8. requesting "requires_doc_text" as the fix won't land in 4.13 timeframe. |