Description of problem (please be detailed as possible and provide log snippets): cephblockpool status remains in a warning state. Because of the replication is not happening between the clusters. [root@m1301015 ~]# oc get cephblockpool ocs-storagecluster-cephblockpool -n openshift-storage -o jsonpath='{.status.mirroringStatus.summary}{"\n"}' {"daemon_health":"WARNING","health":"WARNING","image_health":"OK","states":{}} [root@m1301015 ~]# Because of this Application failover is not possible as the data replication is not happening. Version of all relevant components (if applicable): ocp : 4.13 odf : 4.13.0-157 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? yes Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Setup regional DR environment with 2 managed clusters and a hub cluster 2. check cephblockpool mirroring health 3. Actual results: [root@m1301015 ~]# oc get cephblockpool ocs-storagecluster-cephblockpool -n openshift-storage -o jsonpath='{.status.mirroringStatus.summary}{"\n"}' {"daemon_health":"WARNING","health":"WARNING","image_health":"OK","states":{}} [root@m1301015 ~]# Expected results: all daemons in healthy state. Additional info: must-gather logs are available in google drive : https://drive.google.com/file/d/1vGsZWZQ7yA5PGiShZzaDvbWpowCodaQ6/view?usp=sharing
Hi @rtalur, May I know what is the needed info? It looks like I don't have permission to view the comment.
Output on dr health from rook krew plugin: ➜ kubectl rook-ceph -n openshift-storage dr health Info: fetching the cephblockpools with mirroring enabled Info: found 'ocs-storagecluster-cephblockpool' cephblockpool with mirroring enabled Info: running ceph status from peer cluster timed out command terminated with exit code 1 Error: failed to get ceph status from peer cluster, please check for network issues between the clusters Info: running mirroring daemon health health: WARNING daemon health: WARNING image health: OK images: 0 total ➜ Submariner connection status: ➜ ~ subctl show connections Cluster "ocsm4204001" ✓ Showing Connections GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg. worker-1.ocsm1301015.lnxero1.b ocsm1301015 172.23.232.81 no libreswan 172.30.0.0/16, 10.128.0.0/14 connected 4.465786ms Cluster "ocsm1301015" ✓ Showing Connections GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg. worker-0.ocsm4204001.lnxero1.b ocsm4204001 172.23.232.89 no libreswan 172.31.0.0/16, 10.132.0.0/14 connected 4.047405ms ➜ ~
Created attachment 1957521 [details] submariner operator and gateway logs submariner operator and submariner gateway logs attached as per discussed with Raghavendra
Investigation still ongoing, moving out to 4.14
Created attachment 1959508 [details] subctl gather logs Please find below the requested command outputs. [abdul@m1301015 kubeconfigs]$ subctl diagnose all Cluster "ocsm1301015" ✓ Checking Submariner support for the Kubernetes version ✓ Kubernetes version "v1.26.2+22308ca" is supported ✓ Checking Submariner support for the CNI network plugin ✓ The detected CNI network plugin ("OpenShiftSDN") is supported ✓ Trying to detect the Calico ConfigMap ✓ Checking gateway connections ✓ All connections are established ✓ Non-Globalnet deployment detected - checking if cluster CIDRs overlap ✓ Clusters do not have overlapping CIDRs ⚠ Checking Submariner pods ⚠ Pod "submariner-gateway-tm58c" has restarted 11 times ⚠ Pod "submariner-metrics-proxy-fvpzs" has restarted 12 times ✓ All Submariner pods are up and running ✓ Checking if gateway metrics are accessible from non-gateway nodes ✓ The gateway metrics are accessible ✓ Checking Submariner support for the kube-proxy mode ✓ The kube-proxy mode is supported ✓ Checking the firewall configuration to determine if intra-cluster VXLAN traffic is allowed ✓ The firewall configuration allows intra-cluster VXLAN traffic ✓ Globalnet is not installed - skipping ✓ Checking if services have been exported properly ✓ All services have been exported properly Cluster "ocsm4204001" ✓ Checking Submariner support for the Kubernetes version ✓ Kubernetes version "v1.26.2+22308ca" is supported ✓ Checking Submariner support for the CNI network plugin ✓ The detected CNI network plugin ("OpenShiftSDN") is supported ✓ Trying to detect the Calico ConfigMap ✓ Checking gateway connections ✓ All connections are established ✓ Non-Globalnet deployment detected - checking if cluster CIDRs overlap ✓ Clusters do not have overlapping CIDRs ⚠ Checking Submariner pods ⚠ Pod "submariner-gateway-45qnq" has restarted 12 times ⚠ Pod "submariner-metrics-proxy-62hr8" has restarted 12 times ✓ All Submariner pods are up and running ✓ Checking if gateway metrics are accessible from non-gateway nodes ✓ The gateway metrics are accessible ✓ Checking Submariner support for the kube-proxy mode ✓ The kube-proxy mode is supported ✓ Checking the firewall configuration to determine if intra-cluster VXLAN traffic is allowed ✓ The firewall configuration allows intra-cluster VXLAN traffic ✓ Globalnet is not installed - skipping ✓ Checking if services have been exported properly ✓ All services have been exported properly Skipping inter-cluster firewall check as it requires two kubeconfigs. Please run "subctl diagnose firewall inter-cluster" command manually. [abdul@m1301015 kubeconfigs]$ [abdul@m1301015 ~]$ subctl verify --context ocsm1301015 --tocontext ocsm4204001 ? You have specified disruptive verifications (gateway-failover). Are you sure you want to run them? Yes Performing the following verifications: connectivity, service-discovery, gateway-failover Running Suite: Submariner E2E suite =================================== Random Seed: 1682323277 Will run 44 of 44 specs •• ------------------------------ • [SLOW TEST:72.699 seconds] [discovery] Test Stateful Sets Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/statefulsets.go:33 when a pod tries to resolve a podname from stateful set in a local cluster /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/statefulsets.go:42 should resolve the pod IP from the local cluster /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/statefulsets.go:43 ------------------------------ • [SLOW TEST:117.772 seconds] [discovery] Test Stateful Sets Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/statefulsets.go:33 when the number of active pods backing a stateful set changes /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/statefulsets.go:48 should only resolve the IPs from the active pods /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/statefulsets.go:49 ------------------------------ STEP: Creating namespace objects with basename "dataplane-conn-nd" STEP: Generated namespace "e2e-tests-dataplane-conn-nd-wpr2p" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-dataplane-conn-nd-wpr2p" in cluster "ocsm4204001" STEP: Creating a listener pod in cluster "ocsm4204001", which will wait for a handshake over TCP Apr 24 10:05:23.036: INFO: Will send traffic to IP: 10.132.2.104 STEP: Creating a connector pod in cluster "ocsm1301015", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-podqq7t8" to exit, returning what connector sent Apr 24 10:08:33.127: INFO: Pod "tcp-check-podqq7t8" on node "master-1.ocsm1301015.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. STEP: Waiting for the listener pod "tcp-check-listenernzddw" to exit, returning what listener sent Apr 24 10:12:33.170: INFO: Connector pod has IP: 10.130.1.48 STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-wpr2p" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-wpr2p" on cluster "ocsm4204001" • Failure [435.628 seconds] [dataplane] Basic TCP connectivity tests across clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:28 when a pod connects via TCP to a remote pod /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:59 when the pod is not on a gateway and the remote pod is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:67 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:36 Failed to await pod "tcp-check-listenernzddw" finished. Pod status is Running Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 ------------------------------ STEP: Creating namespace objects with basename "dataplane-conn-nd" STEP: Generated namespace "e2e-tests-dataplane-conn-nd-pdvjb" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-dataplane-conn-nd-pdvjb" in cluster "ocsm4204001" STEP: Creating a listener pod in cluster "ocsm4204001", which will wait for a handshake over TCP Apr 24 10:12:38.374: INFO: Will send traffic to IP: 10.133.2.224 STEP: Creating a connector pod in cluster "ocsm1301015", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-pod2fq4z" to exit, returning what connector sent Apr 24 10:15:48.429: INFO: Pod "tcp-check-pod2fq4z" on node "master-1.ocsm1301015.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. STEP: Waiting for the listener pod "tcp-check-listener7z778" to exit, returning what listener sent Apr 24 10:19:48.475: INFO: Connector pod has IP: 10.130.1.49 STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-pdvjb" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-pdvjb" on cluster "ocsm4204001" • Failure [435.266 seconds] [dataplane] Basic TCP connectivity tests across clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:28 when a pod connects via TCP to a remote pod /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:59 when the pod is not on a gateway and the remote pod is on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:71 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:36 Failed to await pod "tcp-check-listener7z778" finished. Pod status is Running Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 ------------------------------ STEP: Creating namespace objects with basename "dataplane-conn-nd" STEP: Generated namespace "e2e-tests-dataplane-conn-nd-jn24h" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-dataplane-conn-nd-jn24h" in cluster "ocsm4204001" STEP: Creating a listener pod in cluster "ocsm4204001", which will wait for a handshake over TCP Apr 24 10:19:54.634: INFO: Will send traffic to IP: 10.132.2.106 STEP: Creating a connector pod in cluster "ocsm1301015", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-pod7xgrg" to exit, returning what connector sent Apr 24 10:23:04.670: INFO: Pod "tcp-check-pod7xgrg" on node "worker-1.ocsm1301015.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. STEP: Waiting for the listener pod "tcp-check-listenerjx29v" to exit, returning what listener sent Apr 24 10:27:04.793: INFO: Connector pod has IP: 10.131.0.50 STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-jn24h" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-jn24h" on cluster "ocsm4204001" • Failure [436.330 seconds] [dataplane] Basic TCP connectivity tests across clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:28 when a pod connects via TCP to a remote pod /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:59 when the pod is on a gateway and the remote pod is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:75 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:36 Failed to await pod "tcp-check-listenerjx29v" finished. Pod status is Running Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 ------------------------------ •STEP: Creating namespace objects with basename "dataplane-conn-nd" STEP: Generated namespace "e2e-tests-dataplane-conn-nd-d4sjj" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-dataplane-conn-nd-d4sjj" in cluster "ocsm4204001" STEP: Creating a listener pod in cluster "ocsm4204001", which will wait for a handshake over TCP STEP: Pointing a service ClusterIP to the listener pod in cluster "ocsm4204001" Apr 24 10:27:26.232: INFO: Will send traffic to IP: 172.31.234.61 STEP: Creating a connector pod in cluster "ocsm1301015", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-podnj8mj" to exit, returning what connector sent Apr 24 10:30:36.342: INFO: Pod "tcp-check-podnj8mj" on node "master-1.ocsm1301015.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. STEP: Waiting for the listener pod "tcp-check-listenerbvrk8" to exit, returning what listener sent Apr 24 10:34:36.507: INFO: Connector pod has IP: 10.130.1.52 STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-d4sjj" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-d4sjj" on cluster "ocsm4204001" ------------------------------ • Failure [436.603 seconds] [dataplane] Basic TCP connectivity tests across clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:28 when a pod connects via TCP to a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:84 when the pod is not on a gateway and the remote service is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:92 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:36 Failed to await pod "tcp-check-listenerbvrk8" finished. Pod status is Running Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 ------------------------------ STEP: Creating namespace objects with basename "dataplane-conn-nd" STEP: Generated namespace "e2e-tests-dataplane-conn-nd-9d8h9" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-dataplane-conn-nd-9d8h9" in cluster "ocsm4204001" STEP: Creating a listener pod in cluster "ocsm4204001", which will wait for a handshake over TCP STEP: Pointing a service ClusterIP to the listener pod in cluster "ocsm4204001" Apr 24 10:34:42.409: INFO: Will send traffic to IP: 172.31.169.251 STEP: Creating a connector pod in cluster "ocsm1301015", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-podrdxjr" to exit, returning what connector sent Apr 24 10:37:52.456: INFO: Pod "tcp-check-podrdxjr" on node "master-1.ocsm1301015.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. STEP: Waiting for the listener pod "tcp-check-listenerh49p4" to exit, returning what listener sent Apr 24 10:41:52.484: INFO: Connector pod has IP: 10.130.1.54 STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-9d8h9" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-9d8h9" on cluster "ocsm4204001" • Failure [435.849 seconds] [dataplane] Basic TCP connectivity tests across clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:28 when a pod connects via TCP to a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:84 when the pod is not on a gateway and the remote service is on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:96 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:36 Failed to await pod "tcp-check-listenerh49p4" finished. Pod status is Running Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 ------------------------------ STEP: Creating namespace objects with basename "dataplane-conn-nd" STEP: Generated namespace "e2e-tests-dataplane-conn-nd-h55p4" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-dataplane-conn-nd-h55p4" in cluster "ocsm4204001" STEP: Creating a listener pod in cluster "ocsm4204001", which will wait for a handshake over TCP STEP: Pointing a service ClusterIP to the listener pod in cluster "ocsm4204001" Apr 24 10:41:58.655: INFO: Will send traffic to IP: 172.31.128.208 STEP: Creating a connector pod in cluster "ocsm1301015", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-podf4bw7" to exit, returning what connector sent Apr 24 10:45:03.684: INFO: Pod "tcp-check-podf4bw7" on node "worker-1.ocsm1301015.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. STEP: Waiting for the listener pod "tcp-check-listenerzqlpb" to exit, returning what listener sent Apr 24 10:49:03.747: INFO: Connector pod has IP: 10.131.0.52 STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-h55p4" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-h55p4" on cluster "ocsm4204001" • Failure [431.259 seconds] [dataplane] Basic TCP connectivity tests across clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:28 when a pod connects via TCP to a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:84 when the pod is on a gateway and the remote service is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:100 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:36 Failed to await pod "tcp-check-listenerzqlpb" finished. Pod status is Running Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 ------------------------------ • STEP: Creating namespace objects with basename "dataplane-conn-nd" STEP: Generated namespace "e2e-tests-dataplane-conn-nd-52bbj" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-dataplane-conn-nd-52bbj" in cluster "ocsm4204001" STEP: Creating a listener pod in cluster "ocsm4204001", which will wait for a handshake over TCP Apr 24 10:49:24.262: INFO: Will send traffic to IP: 10.132.2.112 STEP: Creating a connector pod in cluster "ocsm1301015", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-pod5tqfk" to exit, returning what connector sent Apr 24 10:52:29.286: INFO: Pod "tcp-check-pod5tqfk" on node "master-1.ocsm1301015.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. STEP: Waiting for the listener pod "tcp-check-listenercqdnf" to exit, returning what listener sent Apr 24 10:56:29.375: INFO: Connector pod has IP: 172.23.232.78 STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-52bbj" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-52bbj" on cluster "ocsm4204001" ------------------------------ • Failure [430.365 seconds] [dataplane] Basic TCP connectivity tests across clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:28 when a pod with HostNetworking connects via TCP to a remote pod /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:109 when the pod is not on a gateway and the remote pod is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:117 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:36 Failed to await pod "tcp-check-listenercqdnf" finished. Pod status is Running Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 ------------------------------ STEP: Creating namespace objects with basename "dataplane-conn-nd" STEP: Generated namespace "e2e-tests-dataplane-conn-nd-hgnqw" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-dataplane-conn-nd-hgnqw" in cluster "ocsm4204001" STEP: Creating a listener pod in cluster "ocsm4204001", which will wait for a handshake over TCP Apr 24 10:56:34.791: INFO: Will send traffic to IP: 10.132.2.114 STEP: Creating a connector pod in cluster "ocsm1301015", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-pod8lgm2" to exit, returning what connector sent Apr 24 10:59:39.847: INFO: Pod "tcp-check-pod8lgm2" on node "worker-1.ocsm1301015.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. STEP: Waiting for the listener pod "tcp-check-listener8fhgl" to exit, returning what listener sent Apr 24 11:03:39.920: INFO: Connector pod has IP: 172.23.232.81 STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-hgnqw" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-hgnqw" on cluster "ocsm4204001" • Failure [430.546 seconds] [dataplane] Basic TCP connectivity tests across clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:28 when a pod with HostNetworking connects via TCP to a remote pod /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:109 when the pod is on a gateway and the remote pod is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:121 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:36 Failed to await pod "tcp-check-listener8fhgl" finished. Pod status is Running Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 ------------------------------ STEP: Creating namespace objects with basename "dataplane-conn-nd" STEP: Generated namespace "e2e-tests-dataplane-conn-nd-h9jr8" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-dataplane-conn-nd-h9jr8" in cluster "ocsm4204001" STEP: Creating a listener pod in cluster "ocsm1301015", which will wait for a handshake over TCP Apr 24 11:03:45.083: INFO: Will send traffic to IP: 10.130.1.57 STEP: Creating a connector pod in cluster "ocsm4204001", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-podlkjhd" to exit, returning what connector sent Apr 24 11:06:55.227: INFO: Pod "tcp-check-podlkjhd" on node "master-2.ocsm4204001.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. STEP: Waiting for the listener pod "tcp-check-listener5t6hr" to exit, returning what listener sent Apr 24 11:10:55.268: INFO: Connector pod has IP: 10.132.2.115 STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-h9jr8" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-dataplane-conn-nd-h9jr8" on cluster "ocsm4204001" • Failure [435.326 seconds] [dataplane] Basic TCP connectivity tests across clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:28 when a pod connects via TCP to a remote pod in reverse direction /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:126 when the pod is not on a gateway and the remote pod is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:134 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:36 Failed to await pod "tcp-check-listener5t6hr" finished. Pod status is Running Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 ------------------------------ STEP: Creating namespace objects with basename "gateway-redundancy" STEP: Generated namespace "e2e-tests-gateway-redundancy-n7kj7" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-gateway-redundancy-n7kj7" in cluster "ocsm4204001" STEP: Sanity check - find a cluster with only one gateway node STEP: Detected primary cluster "ocsm1301015" with single gateway node STEP: Detected secondary cluster "ocsm4204001" STEP: Found gateway on node "worker-1.ocsm1301015.lnxero1.boe" on "ocsm1301015" STEP: Found submariner gateway pod "submariner-gateway-tm58c" on "ocsm1301015", checking node and HA status labels STEP: Ensuring that the gateway reports as active on "ocsm1301015" STEP: Deleting namespace "e2e-tests-gateway-redundancy-n7kj7" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-gateway-redundancy-n7kj7" on cluster "ocsm4204001" • Failure [240.130 seconds] [redundancy] Gateway fail-over tests /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/gateway_failover.go:41 when one gateway node is configured and the submariner gateway pod fails /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/gateway_failover.go:47 should start a new submariner gateway pod and be able to connect from another cluster [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/gateway_failover.go:48 Failed to await Gateway on "worker-1" with status active and connections UP. gateway not found yet Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/framework.go:563 ------------------------------ S [SKIPPING] [0.213 seconds] [redundancy] Gateway fail-over tests /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/gateway_failover.go:41 when multiple gateway nodes are configured and fail-over is initiated /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/gateway_failover.go:53 should activate the passive gateway and be able to connect from another cluster [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/gateway_failover.go:54 Apr 24 11:14:55.485: No cluster found with multiple gateways, skipping the fail-over test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.146 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod connects via TCP to the globalIP of a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:60 when the pod is not on a gateway and the remote service is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:69 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:55.693: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.083 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod connects via TCP to the globalIP of a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:60 when the pod is on a gateway and the remote service is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:73 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:55.831: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.147 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod connects via TCP to the globalIP of a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:60 when the pod is on a gateway and the remote service is on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:77 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:55.899: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.190 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod matching an egress IP namespace selector connects via TCP to the globalIP of a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:82 when the pod is not on a gateway and the remote service is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:91 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:56.139: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.105 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod matching an egress IP namespace selector connects via TCP to the globalIP of a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:82 when the pod is on a gateway and the remote service is on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:95 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:56.277: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.361 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod matching an egress IP pod selector connects via TCP to the globalIP of a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:100 when the pod is not on a gateway and the remote service is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:109 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:56.553: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.445 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod matching an egress IP pod selector connects via TCP to the globalIP of a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:100 when the pod is on a gateway and the remote service is on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:113 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:56.925: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.322 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod with HostNetworking connects via TCP to the globalIP of a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:118 when the pod is not on a gateway and the remote service is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:127 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:57.280: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.474 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod with HostNetworking connects via TCP to the globalIP of a remote service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:118 when the pod is on a gateway and the remote service is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:131 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:57.626: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.575 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod connects via TCP to the globalIP of a remote headless service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:136 when the pod is not on a gateway and the remote service is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:145 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:58.265: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.729 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod connects via TCP to the globalIP of a remote headless service /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:136 when the pod is on a gateway and the remote service is on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:149 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:58.738: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ S [SKIPPING] [0.339 seconds] [dataplane-globalnet] Basic TCP connectivity tests across overlapping clusters without discovery /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:28 when a pod connects via TCP to the globalIP of a remote service in reverse direction /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:154 when the pod is not on a gateway and the remote service is not on a gateway /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:163 should have sent the expected data from the pod to the other pod [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_gn_pod_connectivity.go:37 Apr 24 11:14:59.438: Globalnet is not enabled, skipping the test... /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/logging.go:60 ------------------------------ • ------------------------------ • [SLOW TEST:72.808 seconds] [discovery] Test Headless Service Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:37 when a pod tries to resolve a headless service which is exported locally and in a remote cluster /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:46 should resolve the backing pod IPs from both clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:47 ------------------------------ • [SLOW TEST:97.247 seconds] [discovery] Test Headless Service Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:37 when the number of active pods backing a service changes /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:52 should only resolve the IPs from the active pods /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:53 ------------------------------ • [SLOW TEST:96.364 seconds] [discovery] Test Headless Service Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:37 when the number of active pods backing a service changes /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:52 should resolve the local pod IPs /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:57 ------------------------------ • [SLOW TEST:117.916 seconds] [discovery] Test Headless Service Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:37 when a pod tries to resolve a headless service in a specific remote cluster by its cluster name /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:62 should resolve the backing pod IPs from the specified remote cluster /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/headless_services.go:63 ------------------------------ • [SLOW TEST:97.608 seconds] [discovery] Test Service Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:42 when a pod tries to resolve a service in a remote cluster /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:45 should be able to discover the remote service successfully /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:46 ------------------------------ •• ------------------------------ • [SLOW TEST:73.944 seconds] [discovery] Test Service Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:42 when there are no active pods for a service /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:62 should not resolve the service /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:63 ------------------------------ S [SKIPPING] [0.372 seconds] [discovery] Test Service Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:42 when there are active pods for a service in only one cluster /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:68 should not resolve the service on the cluster without active pods [It] /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:69 Only two clusters are deployed and hence skipping the test /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:293 ------------------------------ • [SLOW TEST:76.881 seconds] [discovery] Test Service Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:42 when a pod tries to resolve a service in a specific remote cluster by its cluster name /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:74 should resolve the service on the specified cluster /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:75 ------------------------------ S [SKIPPING] [0.093 seconds] [discovery] Test Service Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:42 when a pod tries to resolve a service multiple times /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:80 should resolve the service from both the clusters in a round robin fashion [It] /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:81 Only two clusters are deployed and hence skipping the test /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:415 ------------------------------ S [SKIPPING] in Spec Setup (BeforeEach) [0.123 seconds] [discovery] Test Service Discovery Across Clusters /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:42 when one of the clusters with a service is not healthy /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:86 should not resolve that cluster's service IP [BeforeEach] /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:106 Only two clusters are deployed and hence skipping the test /remote-source/app/vendor/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:91 ------------------------------ STEP: Creating namespace objects with basename "route-agent-restart" STEP: Generated namespace "e2e-tests-route-agent-restart-b8s5t" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-route-agent-restart-b8s5t" in cluster "ocsm4204001" STEP: Found node "worker-1.ocsm1301015.lnxero1.boe" on "ocsm1301015" STEP: Found route agent pod "submariner-routeagent-qhkhb" on node "worker-1.ocsm1301015.lnxero1.boe" STEP: Deleting route agent pod "submariner-routeagent-qhkhb" STEP: Found new route agent pod "submariner-routeagent-sbb4k" on node "worker-1.ocsm1301015.lnxero1.boe" STEP: Verifying TCP connectivity from gateway node on "ocsm4204001" to gateway node on "ocsm1301015" STEP: Creating a listener pod in cluster "ocsm1301015", which will wait for a handshake over TCP Apr 24 11:28:05.401: INFO: Will send traffic to IP: 10.131.0.55 STEP: Creating a connector pod in cluster "ocsm4204001", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-pod68w4m" to exit, returning what connector sent Apr 24 11:28:15.601: INFO: Pod "tcp-check-pod68w4m" on node "worker-0.ocsm4204001.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.131.0.55:1234. [dataplane] listener says a049cbdb-5e94-4ce9-81c8-2c898928269c Ncat: 3200 bytes sent, 3150 bytes received in 0.03 seconds. STEP: Waiting for the listener pod "tcp-check-listenersmg5q" to exit, returning what listener sent Apr 24 11:28:15.620: INFO: Pod "tcp-check-listenersmg5q" on node "worker-1.ocsm1301015.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Listening on 0.0.0.0:1234 Ncat: Connection from 10.133.2.249. Ncat: Connection from 10.133.2.249:51444. [dataplane] connector says d8ac81f0-2565-4714-8106-3f0a93fb4dd4 Apr 24 11:28:15.620: INFO: Connector pod has IP: 10.133.2.249 STEP: Verifying that the listener got the connector's data and the connector got the listener's data STEP: Verifying the output of listener pod which must contain the source IP STEP: Verifying TCP connectivity from non-gateway node on "ocsm4204001" to non-gateway node on "ocsm1301015" STEP: Creating a listener pod in cluster "ocsm1301015", which will wait for a handshake over TCP Apr 24 11:28:20.662: INFO: Will send traffic to IP: 10.130.1.60 STEP: Creating a connector pod in cluster "ocsm4204001", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-podzrf86" to exit, returning what connector sent Apr 24 11:31:30.705: INFO: Pod "tcp-check-podzrf86" on node "master-2.ocsm4204001.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. STEP: Waiting for the listener pod "tcp-check-listenerkbrkx" to exit, returning what listener sent Apr 24 11:35:30.787: INFO: Connector pod has IP: 10.132.2.118 STEP: Deleting namespace "e2e-tests-route-agent-restart-b8s5t" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-route-agent-restart-b8s5t" on cluster "ocsm4204001" • Failure [455.703 seconds] [redundancy] Route Agent restart tests /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/route_agent_restart.go:31 when a route agent pod running on a gateway node is restarted /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/route_agent_restart.go:34 should start a new route agent pod and be able to connect from another cluster [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/route_agent_restart.go:35 Failed to await pod "tcp-check-listenerkbrkx" finished. Pod status is Running Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 ------------------------------ STEP: Creating namespace objects with basename "route-agent-restart" STEP: Generated namespace "e2e-tests-route-agent-restart-dnm49" in cluster "ocsm1301015" to execute the tests in STEP: Creating namespace "e2e-tests-route-agent-restart-dnm49" in cluster "ocsm4204001" STEP: Found node "master-0.ocsm1301015.lnxero1.boe" on "ocsm1301015" STEP: Found route agent pod "submariner-routeagent-s5hbw" on node "master-0.ocsm1301015.lnxero1.boe" STEP: Deleting route agent pod "submariner-routeagent-s5hbw" STEP: Found new route agent pod "submariner-routeagent-gktg6" on node "master-0.ocsm1301015.lnxero1.boe" STEP: Verifying TCP connectivity from gateway node on "ocsm4204001" to gateway node on "ocsm1301015" STEP: Creating a listener pod in cluster "ocsm1301015", which will wait for a handshake over TCP Apr 24 11:35:41.057: INFO: Will send traffic to IP: 10.131.0.56 STEP: Creating a connector pod in cluster "ocsm4204001", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-podtrpqq" to exit, returning what connector sent Apr 24 11:35:51.109: INFO: Pod "tcp-check-podtrpqq" on node "worker-0.ocsm4204001.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.131.0.56:1234. [dataplane] listener says 43221715-80e7-4066-8126-395de28db8b6 Ncat: 3200 bytes sent, 3150 bytes received in 0.02 seconds. STEP: Waiting for the listener pod "tcp-check-listenern6k6z" to exit, returning what listener sent Apr 24 11:35:51.113: INFO: Pod "tcp-check-listenern6k6z" on node "worker-1.ocsm1301015.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Listening on 0.0.0.0:1234 Ncat: Connection from 10.133.2.251. Ncat: Connection from 10.133.2.251:56952. [dataplane] connector says 34a22e25-a85f-45ae-afc0-e8bc1de3ad79 Apr 24 11:35:51.113: INFO: Connector pod has IP: 10.133.2.251 STEP: Verifying that the listener got the connector's data and the connector got the listener's data STEP: Verifying the output of listener pod which must contain the source IP STEP: Verifying TCP connectivity from non-gateway node on "ocsm4204001" to non-gateway node on "ocsm1301015" STEP: Creating a listener pod in cluster "ocsm1301015", which will wait for a handshake over TCP Apr 24 11:35:56.172: INFO: Will send traffic to IP: 10.130.1.62 STEP: Creating a connector pod in cluster "ocsm4204001", which will attempt the specific UUID handshake over TCP STEP: Waiting for the connector pod "tcp-check-podshlfr" to exit, returning what connector sent Apr 24 11:39:06.252: INFO: Pod "tcp-check-podshlfr" on node "master-2.ocsm4204001.lnxero1.boe" output: Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connection timed out. STEP: Waiting for the listener pod "tcp-check-listenercx2dw" to exit, returning what listener sent Apr 24 11:43:06.278: INFO: Connector pod has IP: 10.132.2.120 STEP: Deleting namespace "e2e-tests-route-agent-restart-dnm49" on cluster "ocsm1301015" STEP: Deleting namespace "e2e-tests-route-agent-restart-dnm49" on cluster "ocsm4204001" • Failure [455.486 seconds] [redundancy] Route Agent restart tests /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/route_agent_restart.go:31 when a route agent pod running on a non-gateway node is restarted /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/route_agent_restart.go:40 should start a new route agent pod and be able to connect from another cluster [It] /remote-source/app/vendor/github.com/submariner-io/submariner/test/e2e/redundancy/route_agent_restart.go:41 Failed to await pod "tcp-check-listenercx2dw" finished. Pod status is Running Unexpected error: <*errors.errorString | 0xc000244bb0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 ------------------------------ Summarizing 12 Failures: [Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is not on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 [Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is not on a gateway and the remote pod is on a gateway [It] should have sent the expected data from the pod to the other pod /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 [Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 [Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is not on a gateway and the remote service is not on a gateway [It] should have sent the expected data from the pod to the other pod /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 [Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is not on a gateway and the remote service is on a gateway [It] should have sent the expected data from the pod to the other pod /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 [Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is on a gateway and the remote service is not on a gateway [It] should have sent the expected data from the pod to the other pod /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 [Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod with HostNetworking connects via TCP to a remote pod when the pod is not on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 [Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod with HostNetworking connects via TCP to a remote pod when the pod is on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 [Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod in reverse direction when the pod is not on a gateway and the remote pod is not on a gateway [It] should have sent the expected data from the pod to the other pod /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 [Fail] [redundancy] Gateway fail-over tests when one gateway node is configured and the submariner gateway pod fails [It] should start a new submariner gateway pod and be able to connect from another cluster /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/framework.go:563 [Fail] [redundancy] Route Agent restart tests when a route agent pod running on a gateway node is restarted [It] should start a new route agent pod and be able to connect from another cluster /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 [Fail] [redundancy] Route Agent restart tests when a route agent pod running on a non-gateway node is restarted [It] should start a new route agent pod and be able to connect from another cluster /remote-source/app/vendor/github.com/submariner-io/shipyard/test/e2e/framework/network_pods.go:187 Ran 28 of 44 Specs in 6106.810 seconds FAIL! -- 16 Passed | 12 Failed | 0 Pending | 16 Skipped subctl version: v0.14.3 [abdul@m1301015 ~]$ [abdul@m1301015 ~]$ subctl gather --context ocsm1301015 Gathering information from cluster "ocsm1301015" ✓ Gathering operator logs ✓ Found 0 pods matching label selector "name=submariner-operator" ✓ Gathering operator resources ✓ Found 1 submariners in namespace "submariner-operator" ✓ Found 1 servicediscoveries in namespace "submariner-operator" ✓ Found 1 deployments by field selector "metadata.name=submariner-operator" in namespace "submariner-operator" ✓ Found 1 daemonsets by label selector "app=submariner-gateway" in namespace "submariner-operator" ✓ Found 1 daemonsets by label selector "app=submariner-routeagent" in namespace "submariner-operator" ✓ Found 0 daemonsets by label selector "app=submariner-globalnet" in namespace "submariner-operator" ✓ Found 0 deployments by label selector "app=submariner-networkplugin-syncer" in namespace "submariner-operator" ✓ Found 1 deployments by label selector "app=submariner-lighthouse-agent" in namespace "submariner-operator" ✓ Found 1 deployments by label selector "app=submariner-lighthouse-coredns" in namespace "submariner-operator" ⚠ Gathering connectivity logs ✓ Found 1 pods matching label selector "app=submariner-gateway" ⚠ Found logs for previous instances of pod submariner-gateway-tm58c ✓ Found 6 pods matching label selector "app=submariner-routeagent" ✓ Found 0 pods matching label selector "app=submariner-globalnet" ✓ Found 0 pods matching label selector "app=submariner-networkplugin-syncer" ✓ Found 1 pods matching label selector "app=submariner-addon" ✓ Gathering connectivity resources ✓ Gathering CNI data from 6 pods matching label selector "app=submariner-routeagent" ✓ Gathering CNI data from 1 pods matching label selector "app=submariner-gateway" ✓ Gathering globalnet data from 0 pods matching label selector "app=submariner-globalnet" ✓ Gathering cable driver data from 1 pods matching label selector "app=submariner-gateway" ✓ Found 2 endpoints in namespace "submariner-operator" ✓ Found 2 clusters in namespace "submariner-operator" ✓ Found 1 gateways in namespace "submariner-operator" ✓ Found 0 clusterglobalegressips in namespace "" ✓ Found 0 globalegressips in namespace "" ✓ Found 0 globalingressips in namespace "" ✓ Gathering service-discovery logs ✓ Found 3 pods matching label selector "component=submariner-lighthouse" ✓ Found 6 pods matching label selector "dns.operator.openshift.io/daemonset-dns=default" ✓ Gathering service-discovery resources ✓ Found 0 serviceexports in namespace "" ✓ Found 0 serviceimports in namespace "" ✓ Found 0 endpointslices by label selector "endpointslice.kubernetes.io/managed-by=lighthouse-agent.submariner.io" in namespace "" ✓ Found 1 configmaps by label selector "component=submariner-lighthouse" in namespace "submariner-operator" ✓ Found 1 configmaps by field selector "metadata.name=dns-default" in namespace "openshift-dns" ✓ Found 0 services by label selector "submariner.io/exportedServiceRef" in namespace "" ✓ Gathering broker logs ✓ Gathering broker resources ✓ Found 2 endpoints in namespace "odf-dr-broker" ✓ Found 2 clusters in namespace "odf-dr-broker" ✓ Found 0 endpointslices by label selector "endpointslice.kubernetes.io/managed-by=lighthouse-agent.submariner.io" in namespace "odf-dr-broker" ✓ Found 0 serviceimports in namespace "odf-dr-broker" Files are stored under directory "submariner-20230424095045/ocsm1301015" [abdul@m1301015 ~]$ [abdul@m1301015 ~]$ subctl gather --context ocsm4204001 Gathering information from cluster "ocsm4204001" ✓ Gathering service-discovery logs ✓ Found 3 pods matching label selector "component=submariner-lighthouse" ✓ Found 6 pods matching label selector "dns.operator.openshift.io/daemonset-dns=default" ✓ Gathering service-discovery resources ✓ Found 0 serviceexports in namespace "" ✓ Found 0 serviceimports in namespace "" ✓ Found 0 endpointslices by label selector "endpointslice.kubernetes.io/managed-by=lighthouse-agent.submariner.io" in namespace "" ✓ Found 1 configmaps by label selector "component=submariner-lighthouse" in namespace "submariner-operator" ✓ Found 1 configmaps by field selector "metadata.name=dns-default" in namespace "openshift-dns" ✓ Found 0 services by label selector "submariner.io/exportedServiceRef" in namespace "" ✓ Gathering broker logs ✓ Gathering broker resources ✓ Found 2 endpoints in namespace "odf-dr-broker" ✓ Found 2 clusters in namespace "odf-dr-broker" ✓ Found 0 endpointslices by label selector "endpointslice.kubernetes.io/managed-by=lighthouse-agent.submariner.io" in namespace "odf-dr-broker" ✓ Found 0 serviceimports in namespace "odf-dr-broker" ✓ Gathering operator logs ✓ Found 0 pods matching label selector "name=submariner-operator" ✓ Gathering operator resources ✓ Found 1 submariners in namespace "submariner-operator" ✓ Found 1 servicediscoveries in namespace "submariner-operator" ✓ Found 1 deployments by field selector "metadata.name=submariner-operator" in namespace "submariner-operator" ✓ Found 1 daemonsets by label selector "app=submariner-gateway" in namespace "submariner-operator" ✓ Found 1 daemonsets by label selector "app=submariner-routeagent" in namespace "submariner-operator" ✓ Found 0 daemonsets by label selector "app=submariner-globalnet" in namespace "submariner-operator" ✓ Found 0 deployments by label selector "app=submariner-networkplugin-syncer" in namespace "submariner-operator" ✓ Found 1 deployments by label selector "app=submariner-lighthouse-agent" in namespace "submariner-operator" ✓ Found 1 deployments by label selector "app=submariner-lighthouse-coredns" in namespace "submariner-operator" ⚠ Gathering connectivity logs ✓ Found 1 p
Hello Abdul, Thanks for sharing the logs. I was able to figure out the reason for the connection failure. If you look at the e2e tests that failed, you will notice that Gateway to Gateway connectivity between the clusters is working fine, but when one of the client/server is scheduled on the non-Gateway node, the test-case fails. Basically, the connections between the Gateway nodes is all good, but the connectivity between non-Gateway to non-GW/GW is having an issue. We noticed this error recently and fixed it upstream. The issue will be seen when using a RHEL-9 platform where there is some change of behaviour. Upstream Issue has more details: https://github.com/submariner-io/submariner/issues/2422 Fixes (merged upstream): https://github.com/submariner-io/submariner/pull/2423 and https://github.com/submariner-io/submariner/pull/2433 Also, please note that this issue will be seen only when using OpenShiftSDN CNI and NOT with OVN-Kubernetes. The clusters you are using are running with OCP4.13 with non-default CNI (i.e., instead of OVNK, they are deployed with OpenShiftSDN). So, how to fix this issue in the current setup? ----------------------------------------------- There is a simple work-around for the issue. export KUBECONFIG=<path-to-kubeconfig-of-cluster> kubectl delete pod -n submariner-operator -l app=submariner-routeagent Any alternate fix? ------------------ Yes, if you use OCP 4.13 clusters with OVN-K CNI, this problem will not be seen. Also, once the upstream fix is made available in a downstream ACM image, this issue will not be seen. Please give it a try and let me know if you need any more info.
Abdul, another observation: You seem to be running the "subctl verify ..." command along with disruptive verifications. ``` [abdul@m1301015 ~]$ subctl verify --context ocsm1301015 --tocontext ocsm4204001 ? You have specified disruptive verifications (gateway-failover). Are you sure you want to run them? Yes Performing the following verifications: connectivity, service-discovery, gateway-failover Running Suite: Submariner E2E suite ``` I'd suggest you not to run the disruptive verifications (aka gateway-failover) as it has some expectations about the number of Gateway nodes in the cluster and will not work on all cloud platforms unless properly configured. You can either say "No" when prompted or explicitly specify the options "--only connectivity,service-discovery" to the "subctl verify ..." command.
Abdul, I made two comments about public which give more info on the bug. I see that the backport https://github.com/submariner-io/submariner/pull/2475 has merged in the submariner 0.14 branch. I will add the fixed in version when available.
@rtalur @sgaddam I can confirm the workaround suggested in comment 11 is working. The daemon health turned healthy after restarting the pods as suggested.