Bug 2121601

Summary: bind9.16 named does not restart when binding to secondary ipv4 or ipv6 address
Product: Red Hat Enterprise Linux 8 Reporter: Henning Schmiedehausen <hps>
Component: bind9.16Assignee: Petr Menšík <pemensik>
Status: ASSIGNED --- QA Contact: rhel-cs-infra-services-qe <rhel-cs-infra-services-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.6Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Henning Schmiedehausen 2022-08-25 22:16:10 UTC
Description of problem:

bind9 does not start at bootup if binding to secondary ip addresses


Version-Release number of selected component (if applicable):

9.16.23-0.7.el8


How reproducible:

always


Steps to Reproduce:
1. create secondary IPv4 and IPv6 addresses in /etc/sysconfig/network-scripts/ifcfg-eth0

IPADDR=192.168.2.29
IPADDR1=192.168.2.19
[...]
IPV6ADDR=xxxx::29/64
IPV6ADDR_SECONDARIES=xxxx::19/64

2. verify that those are created when the system is fully online:

ip addr show 
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether e4:5f:01:98:ba:57 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.29/24 brd 192.168.2.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 192.168.2.19/24 brd 192.168.2.255 scope global secondary noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 xxxx:6b1f:325b:899f:3772/64 scope global dynamic noprefixroute
       valid_lft 2591901sec preferred_lft 14301sec
    inet6 xxxx::19/64 scope global noprefixroute
       valid_lft forever preferred_lft forever
    inet6 xxxx::29/64 scope global noprefixroute
       valid_lft forever preferred_lft forever
    inet6 fe80::c3a:706d:ea57:181e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

3. bind named to the secondary ip addresses: (named.conf):

[...]
    listen-on port 53 {
        127.0.0.1;
        192.168.2.19;
   };

    listen-on-v6 port 53 {
        xxxx::19;
        ::1;
    };

options {
[...]
    query-source address 192.168.2.19;
    notify-source 192.168.2.19;
    transfer-source 192.168.2.19;
    query-source-v6 address xxxx::19;
    notify-source-v6 xxxx::19;
    transfer-source-v6 xxxx::19;
[...]
}
[...]


4. restart the system

5. check journalctl -e -u named

[...]
Aug 25 22:06:32 thing2.intermeta.com named[367]: listening on IPv4 interface lo, 127.0.0.1#53
Aug 25 22:06:32 thing2.intermeta.com named[367]: listening on IPv6 interface lo, ::1#53
Aug 25 22:06:32 thing2.intermeta.com named[367]: generating session key for dynamic DNS
Aug 25 22:06:32 thing2.intermeta.com named[367]: sizing zone task pool based on 11 zones
Aug 25 22:06:32 thing2.intermeta.com named[367]: none:89: 'max-cache-size 90%' - setting to 7031MB (out of 7812MB)
Aug 25 22:06:32 thing2.intermeta.com named[367]: could not get query source dispatcher (192.168.2.19#0)
Aug 25 22:06:32 thing2.intermeta.com named[367]: loading configuration: address not available
Aug 25 22:06:32 thing2.intermeta.com named[367]: exiting (due to fatal error)
Aug 25 22:06:32 thing2.intermeta.com systemd[1]: named.service: Control process exited, code=exited status=1
Aug 25 22:06:32 thing2.intermeta.com systemd[1]: named.service: Failed with result 'exit-code'.
Aug 25 22:06:32 thing2.intermeta.com systemd[1]: Failed to start Berkeley Internet Name Domain (DNS).

6. tweak named.service file:

- remove  After=network.target
- add After=network-online.target
- Add Wants=network-online.target

7. check journalctl -e -u named

Aug 25 22:11:42 thing2.intermeta.com named[581]: listening on IPv4 interface lo, 127.0.0.1#53
Aug 25 22:11:42 thing2.intermeta.com named[581]: listening on IPv4 interface eth0, 192.168.2.19#53
Aug 25 22:11:42 thing2.intermeta.com named[581]: listening on IPv6 interface lo, ::1#53
Aug 25 22:11:42 thing2.intermeta.com named[581]: listening on IPv6 interface eth0, xxxx::19#53
Aug 25 22:11:42 thing2.intermeta.com named[581]: generating session key for dynamic DNS
Aug 25 22:11:42 thing2.intermeta.com named[581]: sizing zone task pool based on 11 zones
Aug 25 22:11:42 thing2.intermeta.com named[581]: none:89: 'max-cache-size 90%' - setting to 7031MB (out of 7812MB)
Aug 25 22:11:42 thing2.intermeta.com named[581]: could not get query source dispatcher (xxxx::19#0)
Aug 25 22:11:42 thing2.intermeta.com named[581]: loading configuration: address not available
Aug 25 22:11:42 thing2.intermeta.com named[581]: exiting (due to fatal error)
Aug 25 22:11:42 thing2.intermeta.com systemd[1]: named.service: Control process exited, code=exited status=1
Aug 25 22:11:42 thing2.intermeta.com systemd[1]: named.service: Failed with result 'exit-code'.
Aug 25 22:11:42 thing2.intermeta.com systemd[1]: Failed to start Berkeley Internet Name Domain (DNS).


Actual results:

Rebooting a system where named listens on secondary addresses fails to start named. When using "network.target", the service is started before the secondary IPv4 address is online. When using "network-online.target", the service is started before the secondary IPv6 address is online. In any case, named does not start. Logging into the system and running `systemctl restart named` starts the named reliably.


Expected results:

named should start reliably after reboot when binding to secondary IP addresses.

Additional info:

Comment 1 Henning Schmiedehausen 2022-08-25 22:27:34 UTC
added workaround to  /usr/lib/systemd/system/named.service

Restart=on-failure
RestartSec=5s

Which reliably restarts the service on the second try. So this is a timing/ordering issue.

Comment 2 Petr Menšík 2022-11-12 11:47:14 UTC
Interesting issue. I don't think the problem is with listen-on directives. They wait for address to appear and bind to it when that is detected.

But I think query-source-v6 and notify-source directives do not use such system and binds the interface non-conditionally. But I would have expected all manually specified interfaces are configured when After=network-online.target is used.

Comment 3 Petr Menšík 2022-11-12 12:01:47 UTC
I think if the address family each has only single address on each, it would be safe to configure query-source and similar from the same event as listen-on sockets. IE. wait for requested address appearing on any interface and configure query source address when the address appears.

But it seems problem that it listens also on localhost.

Anyway, I would suggest ignoring what outgoing address it uses and relying on shared keys, generated by ddns-confgen command. Do not configure ACL for specific IP addresses, but use server { keys example; }; clause.

Comment 5 Henning Schmiedehausen 2023-08-05 21:42:51 UTC
It is not just the query source. I restrict it so that I can firewall just the DNS server IPs to send out DNS queries. If I change that, then bind will still not bind to any address but localhost because they are not present by the time bind is started. 

The bug still exists in RHEL 9.x (tested with 9.2).