Bug 2119362 - submariner GW pods are unable to resolve the DNS of the Broker K8s API URL
Summary: submariner GW pods are unable to resolve the DNS of the Broker K8s API URL
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Submariner
Version: rhacm-2.6.z
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: rhacm-2.7
Assignee: Vishal Thapar
QA Contact: Noam Manos
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-18 13:01 UTC by daniel parkes
Modified: 2023-01-31 21:49 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-31 21:49:34 UTC
Target Upstream Version:
Embargoed:
nyechiel: rhacm-2.7+
nyechiel: rhacm-2.7.z+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 25169 0 None None None 2022-08-18 16:28:57 UTC
Github submariner-io submariner-operator pull 2192 0 None open Set DNSPolicy to ClusterFirstWithHostNet 2022-08-22 11:07:20 UTC
Github submariner-io submariner-operator pull 2197 0 None Merged Automated backport of #2192: Set DNSPolicy to ClusterFirstWithHostNet 2022-09-15 13:35:28 UTC

Description daniel parkes 2022-08-18 13:01:06 UTC
**What happened**:

Submariner GW log has the following ERROR:

local -> broker for *v1.Endpoint: Failed to process object with key "submariner-operator/sl716-submariner-cable-sl716-10-92-118-238": error distributing resource "submariner-operator/sl716-submariner-cable-sl716-10-92-118-238": error creating or updating resource: error retrieving "sl716-submariner-cable-sl716-10-92-118-238": Get "https://api.neo3598.netact.nsn-rdnet.net:6443/apis/submariner.io/v1/namespaces/testgr-broker/endpoints/sl716-submariner-cable-sl716-10-92-118-238": dial tcp: lookup api.neo3598.netact.nsn-rdnet.net on 10.92.118.240:53: server misbehaving
I0812 13:31:32.008953       1 datastoresyncer.go:100] Datastore syncer started

It can't resolve: api.neo3598.netact.nsn-rdnet.net the broker fqdn

The DNS operator has a forwarding rule for CoreDNS to use a specific DNS server to resolve the netact.nsn-rdnet.net domain.

- forwardPlugin:
      policy: Random
      upstreams:
      - 10.158.52.11
    name: inbhdc005.apac.nsn-net.net
    zones:
    - netact.nsn-rdnet.net

Because the Submariner GW pods are using hostnetwork and DNSpolicy: Clusterfirst it's directly using the /etc/resolv.conf DNS servers on the host where it is running, ignoring the CoreDNS upstream configuration.

Would it be possible to set the DNSpolicy to 'dnsPolicy: ClusterFirstWithHostNet' so the pod will be able to query the CoreDNS server even if using a hostnetwork config?

https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
 

**What you expected to happen**:

Submariner pods are able to use CoreDNS forward rules configured in the OCP cluster.

**How to reproduce it (as minimally and precisely as possible)**:

**Anything else we need to know?**:

**Environment**:
- Submariner version (use `subctl version`):
- Kubernetes version (use `kubectl version`):
- Diagnose information (use `subctl diagnose all`):
- Gather information (use `subctl gather`)
- Cloud provider or hardware configuration:
- OS (e.g `cat /etc/os-release`):
- Kernel (e.g `uname -a`):
- Install tools:
- Others:

Comment 2 Daniel Farrell 2022-08-25 12:44:03 UTC
This has been fixed upstream and backported to all the relevant release branches. Waiting on a release to get this fix downstream.

Comment 3 Noam Manos 2022-11-17 22:54:31 UTC
Hi @daniel parkes, can you specify the exact steps (commands if possible) to reproduce this scenario ?
For example, where (on the Hub or on the managed Cluster), and how should we create the DNS operator with a forwarding rule for CoreDNS ?

Comment 4 daniel parkes 2022-11-23 11:58:19 UTC
Hi,

Sorry for the delay I was away on PTO, so the idea here is that the Submariner GW pod has changed its networking policy from:

'DNSpolicy: Clusterfirst'

to:

'DnsPolicy: ClusterFirstWithHostNet'

So the first check would be on the submariner pods that the options'DnsPolicy: ClusterFirstWithHostNet' is now in place.

A second check could be to configure a custom DNS server on the OCP cluster and check that you can resolve names from that custom DNS server from inside the Submariner pods.


Info on Pod DNS policy: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy

Info on setting a custom DNS server: https://docs.openshift.com/container-platform/4.11/networking/dns-operator.html#nw-dns-forward_dns-operator

Regards.


Note You need to log in before you can comment on or make changes to this bug.