Back to bug 2136864

Who When What Removed Added
Shyamsundar 2022-10-24 17:23:44 UTC CC srangana
Benamar Mekhissi 2022-10-25 23:02:08 UTC Flags needinfo?(prsurve)
Benamar Mekhissi 2022-10-27 17:00:01 UTC Flags needinfo?(prsurve)
Nir Soffer 2022-10-27 17:54:57 UTC CC nsoffer
Mudit Agarwal 2022-10-29 03:56:35 UTC CC bmekhiss
Status NEW ASSIGNED
Flags needinfo?(bmekhiss)
Sunil Kumar Acharya 2022-11-02 03:27:35 UTC Flags needinfo?(bmekhiss)
Sridhar Gaddam 2022-11-04 06:21:38 UTC Flags needinfo?(bmekhiss)
CC sgaddam
Sridhar Gaddam 2022-11-04 06:22:38 UTC CC nyechiel, yboaron
Nir Yechiel 2022-11-07 08:48:10 UTC Flags needinfo?(prsurve) needinfo?(prsurve) needinfo?(bmekhiss) needinfo?(bmekhiss) needinfo?(bmekhiss) needinfo?(prsurve)
Mudit Agarwal 2022-11-08 00:24:51 UTC Status ASSIGNED MODIFIED
Summary [DR][CEPHFS] volsync-rsync-src pods are in Error state as they are unable to connect to volsync-rsync-dst [ACM Tracker] [DR][CEPHFS] volsync-rsync-src pods are in Error state as they are unable to connect to volsync-rsync-dst
krishnaram Karthick 2022-11-08 09:40:32 UTC CC kramdoss
RHEL Program Management 2022-11-08 10:34:43 UTC Target Release --- ODF 4.12.0
Pratik Surve 2022-12-07 13:16:28 UTC Flags needinfo?(prsurve)
Sunil Kumar Acharya 2022-12-08 12:52:38 UTC Flags needinfo?(bmekhiss)
Karolin Seeger 2022-12-08 12:57:38 UTC CC kseeger
errata-xmlrpc 2022-12-13 11:18:48 UTC Status MODIFIED ON_QA
Red Hat Bugzilla 2022-12-31 19:22:33 UTC QA Contact prsurve kramdoss
Red Hat Bugzilla 2022-12-31 22:36:52 UTC CC nsoffer
Red Hat Bugzilla 2022-12-31 23:45:34 UTC CC kseeger
Red Hat Bugzilla 2023-01-01 05:47:47 UTC CC srangana
Red Hat Bugzilla 2023-01-01 08:30:05 UTC CC bmekhiss
Assignee bmekhiss nobody
Red Hat Bugzilla 2023-01-01 08:32:33 UTC QA Contact kramdoss
CC kramdoss
Pratik Surve 2023-01-02 12:05:13 UTC Assignee nobody prsurve
Pratik Surve 2023-01-02 12:13:40 UTC QA Contact prsurve
Assignee prsurve nobody
Alasdair Kergon 2023-01-04 04:38:31 UTC Assignee nobody bmekhiss
Alasdair Kergon 2023-01-04 04:48:40 UTC CC bmekhiss
Alasdair Kergon 2023-01-04 05:07:00 UTC CC kramdoss
Alasdair Kergon 2023-01-04 05:08:18 UTC CC kseeger
Alasdair Kergon 2023-01-04 05:23:38 UTC CC nsoffer
Alasdair Kergon 2023-01-04 05:46:39 UTC CC srangana
Pratik Surve 2023-01-06 05:20:52 UTC Status ON_QA VERIFIED
Benamar Mekhissi 2023-01-25 14:33:10 UTC Doc Type If docs needed, set a value Known Issue
Doc Text Issues with MTU size may occur between the source and destination Pods, which can be identified by analyzing the VolSync Mover pod log output. In every schedule interval, a VolSync job is created for every schedule time interval to sync the delta change between the source and the destination. The following log output pattern may indicate an MTU size issue.
```
Syncing data to volsync-rsync-dst-busybox-pvc-1.busybox-workloads-1.svc.clusterset.local:22 ...
Connection closed by 172.31.211.240 port 22
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]
Syncronization failed. Retrying in 2 seconds. Retry 1/5.
```

To workaround the issue, add the following annotation
```
oc annotate node <gateway_node_name> submariner.io/tcp-clamp-mss=<mtuvalue>
```
Where mtuvalue is set to a value that is less than the Cluster Network MTU. For example, if the Cluster Network MTU is 1400 bytes, you can set the tcp-clamp-mss value to 1300 bytes and adjust it until you find an optimal value that works.

After adding the annotation, you will also need to restart the submariner-routeagent pod to ensure that the changes take effect.
You can do that by running the command

```
oc delete pods -n submariner-operator -l app=submariner-routeagent
```
Benamar Mekhissi 2023-01-25 14:34:27 UTC Doc Text Issues with MTU size may occur between the source and destination Pods, which can be identified by analyzing the VolSync Mover pod log output. In every schedule interval, a VolSync job is created for every schedule time interval to sync the delta change between the source and the destination. The following log output pattern may indicate an MTU size issue.
```
Syncing data to volsync-rsync-dst-busybox-pvc-1.busybox-workloads-1.svc.clusterset.local:22 ...
Connection closed by 172.31.211.240 port 22
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]
Syncronization failed. Retrying in 2 seconds. Retry 1/5.
```

To workaround the issue, add the following annotation
```
oc annotate node <gateway_node_name> submariner.io/tcp-clamp-mss=<mtuvalue>
```
Where mtuvalue is set to a value that is less than the Cluster Network MTU. For example, if the Cluster Network MTU is 1400 bytes, you can set the tcp-clamp-mss value to 1300 bytes and adjust it until you find an optimal value that works.

After adding the annotation, you will also need to restart the submariner-routeagent pod to ensure that the changes take effect.
You can do that by running the command

```
oc delete pods -n submariner-operator -l app=submariner-routeagent
```
Issues with MTU size may occur between the source and destination Pods, which can be identified by analyzing the VolSync Mover pod log output. In every schedule interval, a VolSync job is created for every schedule time interval to sync the delta change between the source and the destination. The following log output pattern may indicate an MTU size issue.
```
Syncing data to volsync-rsync-dst-busybox-pvc-1.busybox-workloads-1.svc.clusterset.local:22 ...
Connection closed by 172.31.211.240 port 22
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]
Syncronization failed. Retrying in 2 seconds. Retry 1/5.
```

To workaround the issue, add the following annotation
```
oc annotate node <gateway_node_name> submariner.io/tcp-clamp-mss=<mtuvalue>
```
Where `mtuvalue` is set to a value that is less than the Cluster Network MTU. For example, if the Cluster Network MTU is 1400 bytes, you can set the `tcp-clamp-mss` value to 1300 bytes and adjust it until you find an optimal value that works.

After adding the annotation, you will also need to restart the submariner-routeagent pod to ensure that the changes take effect.
You can do that by running the command

```
oc delete pods -n submariner-operator -l app=submariner-routeagent
```
Shyamsundar 2023-01-30 17:33:38 UTC Doc Type Known Issue Bug Fix
Doc Text Issues with MTU size may occur between the source and destination Pods, which can be identified by analyzing the VolSync Mover pod log output. In every schedule interval, a VolSync job is created for every schedule time interval to sync the delta change between the source and the destination. The following log output pattern may indicate an MTU size issue.
```
Syncing data to volsync-rsync-dst-busybox-pvc-1.busybox-workloads-1.svc.clusterset.local:22 ...
Connection closed by 172.31.211.240 port 22
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]
Syncronization failed. Retrying in 2 seconds. Retry 1/5.
```

To workaround the issue, add the following annotation
```
oc annotate node <gateway_node_name> submariner.io/tcp-clamp-mss=<mtuvalue>
```
Where `mtuvalue` is set to a value that is less than the Cluster Network MTU. For example, if the Cluster Network MTU is 1400 bytes, you can set the `tcp-clamp-mss` value to 1300 bytes and adjust it until you find an optimal value that works.

After adding the annotation, you will also need to restart the submariner-routeagent pod to ensure that the changes take effect.
You can do that by running the command

```
oc delete pods -n submariner-operator -l app=submariner-routeagent
```
Cause: Incorrect MTU size configuration for intra cluster submariner based setups

Consequence: CephFS volumes that are DR protected, fail to sync their data across clusters

Fix: Issues with MTU size may occur between the source and destination Pods, which can be identified by analyzing the VolSync Mover pod log output. In every schedule interval, a VolSync job is created for every schedule time interval to sync the delta change between the source and the destination. The following log output pattern may indicate an MTU size issue.
```
Syncing data to volsync-rsync-dst-busybox-pvc-1.busybox-workloads-1.svc.clusterset.local:22 ...
Connection closed by 172.31.211.240 port 22
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]
Syncronization failed. Retrying in 2 seconds. Retry 1/5.
```

To workaround the issue, add the following annotation
```
oc annotate node <gateway_node_name> submariner.io/tcp-clamp-mss=<mtuvalue>
```
Where `mtuvalue` is set to a value that is less than the Cluster Network MTU. For example, if the Cluster Network MTU is 1400 bytes, you can set the `tcp-clamp-mss` value to 1300 bytes and adjust it until you find an optimal value that works.

After adding the annotation, you will also need to restart the submariner-routeagent pod to ensure that the changes take effect.
You can do that by running the command

```
oc delete pods -n submariner-operator -l app=submariner-routeagent
```

Result: CephFS volumes that are DR protected regularly sync their data across clusters as per the defined schedules
Kusuma 2023-01-30 18:00:44 UTC CC kbg
Doc Text Cause: Incorrect MTU size configuration for intra cluster submariner based setups

Consequence: CephFS volumes that are DR protected, fail to sync their data across clusters

Fix: Issues with MTU size may occur between the source and destination Pods, which can be identified by analyzing the VolSync Mover pod log output. In every schedule interval, a VolSync job is created for every schedule time interval to sync the delta change between the source and the destination. The following log output pattern may indicate an MTU size issue.
```
Syncing data to volsync-rsync-dst-busybox-pvc-1.busybox-workloads-1.svc.clusterset.local:22 ...
Connection closed by 172.31.211.240 port 22
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]
Syncronization failed. Retrying in 2 seconds. Retry 1/5.
```

To workaround the issue, add the following annotation
```
oc annotate node <gateway_node_name> submariner.io/tcp-clamp-mss=<mtuvalue>
```
Where `mtuvalue` is set to a value that is less than the Cluster Network MTU. For example, if the Cluster Network MTU is 1400 bytes, you can set the `tcp-clamp-mss` value to 1300 bytes and adjust it until you find an optimal value that works.

After adding the annotation, you will also need to restart the submariner-routeagent pod to ensure that the changes take effect.
You can do that by running the command

```
oc delete pods -n submariner-operator -l app=submariner-routeagent
```

Result: CephFS volumes that are DR protected regularly sync their data across clusters as per the defined schedules
Previously, CephFS volumes that are DR protected, failed to sync their data across clusters because of the incorrect MTU size configuration for intra-cluster submariner based setups. With this update, in every schedule interval, a `VolSync` job is created for every schedule time interval to sync the delta change between the source and the destination.
errata-xmlrpc 2023-01-30 22:27:31 UTC Status VERIFIED RELEASE_PENDING
errata-xmlrpc 2023-01-31 00:19:51 UTC Resolution --- ERRATA
Status RELEASE_PENDING CLOSED
Last Closed 2023-01-31 00:19:51 UTC
errata-xmlrpc 2023-01-31 00:20:13 UTC Link ID Red Hat Product Errata RHBA-2023:0551
Elad 2023-08-09 17:00:43 UTC CC odf-bz-bot

Back to bug 2136864