**What happened**: When running "subctl diagnose firewall inter-cluster" test between clusters in azure platform and other platform, incorrect state returned. ACM 2.5.0 Submariner 0.12.0 (Globalnet enabled) OCP versions 4.9.25 ✗ Checking if tunnels can be setup on the gateway node of cluster "mbabushk-sub3" ✗ The tcpdump output from the sniffer pod does not include the message sent from client pod. Please check that your firewall configuration allows UDP/4505 traffic on the "mbabushk-sub3-7ptvl-worker-centralus1-vsrvt" node. But the UDP/4505 port opened on both clusters. **What you expected to happen**: The test of the firewall between clusters should pass. **How to reproduce it (as minimally and precisely as possible)**: Deploy two clusters. One in Azure platform and one in other platform. Run the firewall test. It will return an error, while the configuration is correct. **Anything else we need to know?**: Debugging output by Sridhar (Slack thread): ========================================================== Maxim Babushkin Hi @multiclusternetwork-team Have a question regarding azure platform. I made a deployment of submariner on aws, gcp and azure platforms. All the cloud prepare steps I made manually. When I'm running the subctl diagnose firewall inter-cluster command and specifying the azure cluster, I get the following error: ✗ Checking if tunnels can be setup on the gateway node of cluster "mbabushk-sub3" ✗ The tcpdump output from the sniffer pod does not include the message sent from client pod. Please check that your firewall configuration allows UDP/4505 traffic on the "mbabushk-sub3-7ptvl-worker-centralus1-vsrvt" node. The port is open and all e2e tests are passing. Only this check fails. When I'm testing the firewall between aws and gcp, no issue. It happens only when azure involved. Is it happens because cloud prepare is not yet implemented and some check of the firewall relies on it? Or is there any other reason? Thanks. Sridhar Gaddam Hello Maxim, well if the tunnels are successfully created and since e2e is passing, I dont think there was an issue with cloud-prepare. Sridhar Gaddam Please share the kubeconfigs of the three clusters (in private), I can take a look and provide an update. Maxim Babushkin @sridharg Thanks. Will share in a sec. Sridhar Gaddam @mbabushk there is a problem while deploying the diagnose pods on the Azure cluster. Sridhar Gaddam For some reason, K8s running on Azure is automatically adding some volumes to the pod and this seems to be causing issues Sridhar Gaddam │Events: │ │ Type Reason Age From Message │ │ ---- ------ ---- ---- ------- │ │ Normal Scheduled <unknown> Successfully assigned default/validate-clientnvkbb to mbabushk-sub3-7ptvl-master-2 │ │ Normal AddedInterface 24s multus Add eth0 [10.130.0.33/23] from openshift-sdn │ │ Normal Pulled 24s kubelet, mbabushk-sub3-7ptvl-master-2 Container image "quay.io/submariner/nettest:devel" already present on machine │ │ Normal Created 24s kubelet, mbabushk-sub3-7ptvl-master-2 Created container validate-client │ │ Normal Started 24s kubelet, mbabushk-sub3-7ptvl-master-2 Started container validate-client │ │ Warning FailedMount 13s (x2 over 14s) kubelet, mbabushk-sub3-7ptvl-master-2 MountVolume.SetUp failed for volume "kube-api-access-5dhqq" : [object "default"/"kube-root-ca.crt" not registered, object "default"/"openshift-service-│ │ca.crt" not registered] Sridhar Gaddam This is the corresponding Pod.Spec Sridhar Gaddam volumes: - name: kube-api-access-b7jk6 projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace - configMap: items: - key: service-ca.crt path: service-ca.crt name: openshift-service-ca.crt Sridhar Gaddam We do not add this to the pod, its getting auto-added by some component and because of this, the pod is not starting (hence diagnose is failing) Sridhar Gaddam These volumes seem to be added even on the other OCP clusters, but on Azure its failing with FailedMount error. We need to debug this further to understand why these volumes are added and if we can figure out a way to get past this issue. ==========================================================
The root cause seems to be related to this bug https://bugzilla.redhat.com/show_bug.cgi?id=1999325
Hi @maafried , is there any fix required in Submariner now that bug https://bugzilla.redhat.com/show_bug.cgi?id=1999325 has been verified? If not, and an existing Submariner build can be used for verification of this bug, please move this to ON_QA.
The issue reported in this bug, related to the following bug - https://bugzilla.redhat.com/show_bug.cgi?id=1999325. The https://bugzilla.redhat.com/show_bug.cgi?id=1999325 bug was fixed in ocp version 4.11 and we are waiting for it to be backported to the earlier ocp versions. Until backported, I have not way to verify it.
G2Bsync 1139250998 comment nelsonjean Fri, 27 May 2022 03:46:40 UTC G2Bsync Do you know when the backport is estimated to be available?
It looks like the fix was backported to ocp versions 4.10 and 4.9. https://bugzilla.redhat.com/show_bug.cgi?id=2067464 https://bugzilla.redhat.com/show_bug.cgi?id=2075704 I'll wait for the hub to get the ocp versions with the fix, will verify and then close the bz. Eveline, do you know when we expect the new release of ocp 4.9 and 4.10 to be available in the acm hub?
Although seems that the fix was backported to ocp versions 4.10 and 4.9, when I tested on the latest 4.10, I still facing the issue.
Looks like we need to re-verify this fix.
@Aswin, can you please take a look at this issue again? It looks like it was failed QE, even after the OCP fix went in.
Since the diagnose try with port 9898 from the source cluster to see if 4500 is reachable and in Azure by default egress traffic is not allowed. The packets are getting dropped.
(In reply to Aswin Suryanarayanan from comment #11) > Since the diagnose try with port 9898 from the source cluster to see if 4500 > is reachable and in Azure by default egress traffic is not allowed. The > packets are getting dropped. This needs further investigation , the port 9898 seems to be not the reason for failure. Will update after further investigation.
*** Bug 2119719 has been marked as a duplicate of this bug. ***
@Aswin, can you remind me if this one is still relevant? If so, we should move it over to Jira. Thanks!
@Nir, yes, this is still relevant.
Migrated to Jira: https://issues.redhat.com/browse/ACM-3316
removing need info as the discussion can continue in Jira.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days