Bug 2036034 - DNS sub-domain - network dns_domain not used to overwrite the DHCP domain announced to the Tenant VMs
Summary: DNS sub-domain - network dns_domain not used to overwrite the DHCP domain ann...
Keywords:
Status: CLOSED COMPLETED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 16.2 (Train)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z2
: 16.2 (Train on RHEL 8.4)
Assignee: Miguel Lavalle
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-29 11:50 UTC by Riccardo Bruzzone
Modified: 2023-09-08 22:54 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-08 22:54:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-11971 0 None None None 2021-12-29 11:51:51 UTC

Internal Links: 1903861

Description Riccardo Bruzzone 2021-12-29 11:50:58 UTC
Description of problem:

DNS sub-domain are implemented and used in the Customer Network.
Enabling the Neutron dns extension and configuring the "dns_domain" attribute also at the neutron network level (with a different value from the one defined in the neutron.conf), in case of DHCP enabled at the subnet level, the network "dns_domain" is not used to overwrite the DHCP domain (domain announced and assigned to the client VMs) pushed via neutron.conf.

Version-Release number of selected component (if applicable):
Red Hat OpenStack 16.2 (RHOSP16.2)

How reproducible:


Steps to Reproduce:
1. Neutron DNS extension enabled.
2. DNS domain defined in the neutron.conf
3. dns_domain enabled in all network planned (different values used as in the example below).
4. VMs deployed as in the example below.

e.g.:

- Neutron.conf DNS Domain: .example.com

- First VM environment planned:  Network1 (Network dns_name: .dns_domain1.example.com) -> Subnet 1 DHCP enabled -> Port 1 -> VM 1 

- Second VM environment planned: Network2 (Network dns_name: .dns_domain2.example.com) -> Subnet 2 DHCP enabled -> Port 1 -> VM 2

- Third VM environment planned:  Network3 (Network dns_name: .dns_domain3.example.com) -> Subnet 3 DHCP enabled -> Port 1 -> VM 3


Actual results:

The FQDN assigned to all three VMs is based on the same domain defined in the neutron.conf: ".example.com" 


Expected results:

To support DNS sub-domains, the Network "dns_domain" should be used to overwrite  the DHCP domain (domain announced and assigned to the client VMs) pushed via neutron.conf.


Additional info:
OpenStack deployment based on OVN

Comment 1 Daniel Alvarez Sanchez 2021-12-29 15:52:18 UTC
Hey Riccardo, thanks for reporting this and also for the discussions over GMeet and IRC.

Some recap for those:

* The dns_domain at network level is intended for integration with external services such as Designate
* The behavior that you describe seems to be the expected one (please see the discussion here [0] and here [1]). ie, the neutron.conf setting prevails.
* I tested this in OVN and this is how the NB database looks like:

[root@controller-0 /]# ovn-nbctl list dns
_uuid               : 8517f81a-6645-40c9-86a7-100670c627b9
external_ids        : {ls_name=neutron-74231acc-0198-4746-8ba0-06602210dd2b}
records             : {vm1="192.168.10.14", vm1.example.com="192.168.10.14", vm1.dns_domain1.example.com="192.168.10.14"}


What happens is that ML2/OVN is creating two entries for this IP address and they will be responded by the local OVN DNS responder:


root@vm1:~# dig vm1.example.com @169.254.169.254

;; ADDITIONAL SECTION:
vm1.example.com. 3600 IN   A       192.168.10.14


root@vm1:~# dig vm1.dns_domain1.example.com @169.254.169.254

;; ADDITIONAL SECTION:
vm1.dns_domain1.example.com. 3600 IN   A       192.168.10.14


Now, I create vm1 (again vm1) in a different network (Network dns_name: .dns_domain2.example.com); then the NB database contains these entries:


_uuid               : 4c2580d2-5396-474a-a9dd-8d477ba051bb
datapaths           : [9a4d45ae-9a9f-4a43-b3e5-b6aa85d5c0f3]
external_ids        : {dns_id="df68e00f-8399-4192-959d-7e5fd801e18d"}
records             : {vm1="192.168.1.46", vm1.example.com="192.168.1.46", vm1.dns_domain2.example.com="192.168.1.46"}


The problem could be that, at a first glance, we have two entries with the same DNS name: 
vm1.example.com="192.168.10.14"
vm1.example.com="192.168.1.46"

However, what happens here is that ovn-northd is translating this into DNS records that will depend on the network from which the request is coming:


_uuid               : 4c2580d2-5396-474a-a9dd-8d477ba051bb
datapaths           : [9a4d45ae-9a9f-4a43-b3e5-b6aa85d5c0f3]
external_ids        : {dns_id="df68e00f-8399-4192-959d-7e5fd801e18d"}
records             : {vm1="192.168.10.46", vm1.example.com="192.168.1.46", vm1.dns_domain2.example.com="192.168.1.46"}

So if, from the first VM I try to query vm1.example.com, I'll still get the one from the right network:


root@vm1:~# dig vm1.example.com @169.254.169.254

;; ADDITIONAL SECTION:
vm1.example.com. 3600 IN   A       192.168.10.14

And if I try to query the vm1 in the separate network, then it won't work:


The reason is that ovn-controller will only search for DNS entries belonging to that network [2].


I hope that I described how this works today at least with ML2/OVN. 
It would be valuable information if you could please test the same with ML2/OVS to see if there's a gap between the two drivers.

Thanks,
daniel



[0]  https://bugs.launchpad.net/neutron/+bug/1826419
[1]  https://meetings.opendev.org/meetings/neutron_l3/2019/neutron_l3.2019-05-29-14.00.log.html
[2] https://github.com/ovn-org/ovn/blob/a2d9ff3ccd4e12735436b0578ce0020cb62f2c27/controller/pinctrl.c#L2954

Comment 2 Daniel Alvarez Sanchez 2021-12-29 15:55:27 UTC
Please, note that I used 169.254.169.254 because my tenant network is not connected anywhere so what I wanted is for the DNS traffic to go out of the guest VM so that it arrives br-int and gets processed by ovn-controller via a controller action:

 cookie=0x4444769f, duration=805.913s, table=27, n_packets=22, n_bytes=2358, idle_age=183, priority=100,udp,metadata=0x4,tp_dst=53 actions=controller(userdata=00.00.00.06.00.00.00.00.00.01.de.10.00.00.00.64,pause),resubmit(,28)


_uuid               : 4444769f-cb72-4e91-9416-38b497767d01
actions             : "reg0[4] = dns_lookup(); next;"
controller_meter    : []
external_ids        : {source="northd.c:7661", stage-name=ls_in_dns_lookup}
logical_datapath    : []
logical_dp_group    : afe0e3fe-b1cd-4d3d-8269-c8ae61fdc6b7
match               : "udp.dst == 53"
pipeline            : ingress
priority            : 100
table_id            : 19


In a real life scenario, the IP address of the nameservers configured typically via DHCP should be used and the query is intercepted by OVN.

Comment 3 Riccardo Bruzzone 2021-12-30 08:42:17 UTC
Hi Daniel
from the E2E point of view, if the Customer Network is based/structured on DNS sud-domains, the IP Address assignment via DHCP isn't feasible.
The DHCP service can be used, this in all projects planned, only in a flat domain where all VMs are deployed in the same DNS domain.
This is the current picture.

Comment 4 Luigi Tamagnone 2021-12-30 12:33:00 UTC
From OVS/ML2 perspective:

[heat-admin@controller-0 ~]$ cat /var/lib/neutron/dhcp/d581a346-9bdc-4e93-b2a3-c1f2098fbe89/host
fa:16:3e:93:01:47,host-172-16-1-1.example.org.,172.16.1.1
fa:16:3e:ba:73:ae,vm1.example.org.,172.16.1.85


[root@vm1 ~]# cat /etc/resolv.conf 
# Generated by NetworkManager
search example.org
nameserver 172.16.1.2

[root@vm1 ~]# cat /etc/hostname 
vm1

[root@vm1 ~]# dig vm1.example.org

; <<>> DiG 9.11.26-RedHat-9.11.26-6.el8 <<>> vm1.example.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28587
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;vm1.example.org.		IN	A

;; ANSWER SECTION:
vm1.example.org.	0	IN	A	172.16.1.85

;; Query time: 4 msec
;; SERVER: 172.16.1.2#53(172.16.1.2)
;; WHEN: Thu Dec 30 07:24:22 EST 2021
;; MSG SIZE  rcvd: 60


[root@vm1 ~]# dig vm1.sub1.example.org.

; <<>> DiG 9.11.26-RedHat-9.11.26-6.el8 <<>> vm1.sub1.example.org.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 57801
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;vm1.sub1.example.org.		IN	A

;; Query time: 2 msec
;; SERVER: 172.16.1.2#53(172.16.1.2)
;; WHEN: Thu Dec 30 07:28:52 EST 2021
;; MSG SIZE  rcvd: 38


[root@vm1 ~]# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
eth0             UP             172.16.1.85/24 fe80::f816:3eff:feba:73ae/64 


(overcloud) [stack@undercloud-0 ~]$ openstack network show net2
+---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                     | Value                                                                                                                                                            |
+---------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| dns_domain                | sub1.example.org.

Comment 5 Riccardo Bruzzone 2021-12-30 13:10:03 UTC
In the OVS test made, dnsmasq doesn't know the host vm1.sub1.example.org based on the Network dns_domain.

 
[cloud-user@vm1 ~]$ dig vm1.sub1.example.org @172.16.1.2

; <<>> DiG 9.11.26-RedHat-9.11.26-6.el8 <<>> vm1.sub1.example.org @172.16.1.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 27182
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;vm1.sub1.example.org.		IN	A

;; Query time: 4 msec
;; SERVER: 172.16.1.2#53(172.16.1.2)
;; WHEN: Thu Dec 30 08:00:32 EST 2021
;; MSG SIZE  rcvd: 38

[cloud-user@vm1 ~]$ 

[cloud-user@vm1 ~]$ dig vm1.sub1.example.org @172.16.1.2

; <<>> DiG 9.11.26-RedHat-9.11.26-6.el8 <<>> vm1.sub1.example.org @172.16.1.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 27182
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;vm1.sub1.example.org.		IN	A

;; Query time: 4 msec
;; SERVER: 172.16.1.2#53(172.16.1.2)
;; WHEN: Thu Dec 30 08:00:32 EST 2021
;; MSG SIZE  rcvd: 38

[cloud-user@vm1 ~]$

Comment 6 Daniel Alvarez Sanchez 2021-12-30 14:10:19 UTC
(In reply to Riccardo Bruzzone from comment #5)
> In the OVS test made, dnsmasq doesn't know the host vm1.sub1.example.org
> based on the Network dns_domain.
> 
>  
> [cloud-user@vm1 ~]$ dig vm1.sub1.example.org @172.16.1.2
> 
> ; <<>> DiG 9.11.26-RedHat-9.11.26-6.el8 <<>> vm1.sub1.example.org @172.16.1.2
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 27182

I'm not a DNS expert but seems that REFUSED means "DNS query failed because the server refused to answer due to policy"?.

In any case, OVN seems to work as expected right?
As per my previous comment, ML2/OVN is honoring the dns_domain attribute in the network by adding an entry to the DNS table in the OVN database and I verified that it will reply to it properly.

Can you please clarify why this is not enough or what I'm missing?

Thanks!
daniel

Comment 7 Daniel Alvarez Sanchez 2021-12-30 14:22:21 UTC
Thanks for clarifying over IRC. 

From what I mentioned in the first comment, this is the expected behavior (ie. that neutron.conf setting will prevail over the dns_domain attribute, which is mainly intended for external DNS integration).

What you're requesting is that the DHCP server offers the network 'dns_domain' attribute if set instead of the neutron.conf value.


cc @njohnston 
cc @skaplons
cc @mlavalle 
to see if they have some ideas on this.

I'm not a big fan of adding more knobs but would it make sense to make this configurable for the use case described here where the user want to have a different FQDN per tenant?

Comment 8 Slawek Kaplonski 2022-01-03 10:12:55 UTC
IIRC all discussions around bug https://bugs.launchpad.net/neutron/+bug/1826419 which we had in the past, it is indeed expected behaviour that "dns_domain" parameter of the network/port is dedicated to use with the external dns services, like Designate and internally Neutron should always use only value defined in the config file.
But I'm not expert in that area really and I would like Miguel to have final vote on that one.

Comment 11 Miguel Lavalle 2022-01-06 00:47:22 UTC
Hi,

The short answer is that the behavior in the description of this bz above is the documented one and conforms with the official Neutron upstream documentation here https://docs.openstack.org/neutron/latest/admin/config-dns-int.html and here https://docs.openstack.org/neutron/latest/admin/config-dns-int-ext-serv.html. The dns_domain attributes for ports and networks are intended to be used for integration with external DNS services like OpenStack Designate.

Now, in [10] it is pointed out, correctly, that during the upstream Victoria cycle an improvement was implemented (https://bugs.launchpad.net/neutron/+bug/1873091 and https://specs.openstack.org/openstack//neutron-specs/specs/victoria/port_dns_assignment.html) whereby:

"The port level dns-domain should take precedence over network level dns-domain which take precedence over the neutron config dns-domain which will be the default if the above two levels dont have dns-domain assigned"

However, this change only takes effect when there is an external DNS driver is configured, in line with the overall philosophy that dns_domain is intended to be used for integration with external DNS. Let's look at the code:

1) This is the change made to the code to implement the aforementioned improvement: https://review.opendev.org/c/openstack/neutron/+/731624/5/neutron/plugins/ml2/extensions/dns_integration.py#292. Please note that dns_domain is retrieved from dns_data_db.current_dns_domain

2) Please read the comments in https://github.com/openstack/neutron/blob/7aba1bddabc9fb92b1603ad0ff86c4fe75f36b1f/neutron/plugins/ml2/extensions/dns_integration.py#L125-L134. current_dns_domain is only set if there is an external dns service configures. Otherwise, it is returned as a null string: https://github.com/openstack/neutron/blob/7aba1bddabc9fb92b1603ad0ff86c4fe75f36b1f/neutron/plugins/ml2/extensions/dns_integration.py#L137

I tested this in my development environment:

1) I did't configured an external dns service
2) I set dns_domain in my private network to dns-domain-1.com.
3) I create a port in the private network setting its dns_name to my-port

This is what the corresponding tables in the DB look like afterwards: https://paste.openstack.org/show/811948/

Comment 12 Miguel Lavalle 2022-01-07 18:41:32 UTC
Upstream documentation and neutron-tempest-plugin scenario tests are not up to date with the RFE implemented in Victoria, described in [11]. I filed and assigned to me this bug upstream to fix that: https://bugs.launchpad.net/neutron/+bug/1956632

Comment 17 Miguel Lavalle 2023-09-01 21:28:19 UTC
Waiting for upstream patches to merge: https://review.opendev.org/c/openstack/neutron/+/893544 and https://review.opendev.org/c/openstack/neutron/+/893545


Note You need to log in before you can comment on or make changes to this bug.