Bug 1004569

Summary: ipa dns delegation and ipa not accepting a new ns record.
Product: Red Hat Enterprise Linux 7 Reporter: Michael Gregg <mgregg>
Component: ipaAssignee: Martin Kosek <mkosek>
Status: CLOSED NOTABUG QA Contact: Namita Soman <nsoman>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: mgregg, msauton, pspacek, rcritten
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-09 10:56:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Gregg 2013-09-05 00:44:25 UTC
Description of problem:
Possible regression of this bug: https://bugzilla.redhat.com/show_bug.cgi?id=811074

Install includes two servers. 

First server is a ipa master

Second server only runs a dns server.

Version-Release number of selected component (if applicable):
ipa-server-3.3.1-1.el7.x86_64

How reproducible:
always

Steps to Reproduce:
# On server 2. 
1. Create zone file for a subdomain to forward to.
My zone file 
"$ORIGIN .
$TTL 400	; 6 minutes 40 seconds
suba.testrelm.com	IN SOA	ns.suba.testrelm.com. admin\@testrelm.com. (
				2013032001 ; serial
				200        ; refresh (3 minutes 20 seconds)
				600        ; retry (10 minutes)
				800        ; expire (13 minutes 20 seconds)
				400        ; minimum (6 minutes 40 seconds)
				)
			NS	ns.suba.testrelm.com.
			A	10.16.96.54
			MX	10 ns.suba.testrelm.com.
$ORIGIN suba.testrelm.com.
$TTL 120	; 2 minutes
aaron			A	67.125.66.50
ns			A	10.16.96.54
thost			A	1.2.3.4
thostb			A	4.4.4.4"

2. Add domain entry to named.com.
Mine: "zone "suba.testrelm.com" {
        allow-query { any; };
        type master;
        file "/etc/suba.testrelm.com";
};"

3. Restart named
4. dig thost.suba.testrelm.com @127.0.0.1 | grep A | grep 1.2.3.4
5. dig thostb.suba.testrelm.com @127.0.0.1 | grep A | grep 4.4.4.4
# on master
6. ipa dnsrecord-add testrelm.com forwarder.suba --a-rec=10.16.96.54
7. ipa dnsrecord-add testrelm.com suba --ns-rec=forwarder.suba.testrelm.com.
8. dig thost.suba.testrelm.com @127.0.0.1
9. dig suba.testrelm.com

Actual results:
After step 7, I get the error message:
ipa: ERROR: Nameserver 'forwarder.suba.testrelm.com.' does not have a corresponding A/AAAA record

Yet, the record does exist:
[root@ipaqa64vmg ipa-dns-multi]# ipa dnsrecord-find testrelm.com forwarder.suba
  Record name: forwarder.suba
  A record: 10.16.96.54

  Record name: suba
  NS record: forwarder.suba.testrelm.com.

Step 8 and 9 fail because the ns record wasn't created.


Additional info:

Comment 1 Michael Gregg 2013-09-05 00:54:55 UTC
To clarify, the record is clearly getting created, but I am getting this error message. 

When I dig against the server, the ns record is not getting picked up for suba.testrelm.com. 

Perhaps this bug should be named something better like "ns records not getting picked up by named/bind-dyndb-ldap for subdomains"?

root@ipaqa64vmg ipa-dns-multi]# dig suba.testrelm.com @127.0.0.1

; <<>> DiG 9.9.3-rl.13207.22-P2-RedHat-9.9.3-7.P2.el7 <<>> suba.testrelm.com @127.0.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 28755
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;suba.testrelm.com.		IN	A

Comment 3 Martin Kosek 2013-09-05 15:32:43 UTC
Petr, can you please advise?

Comment 4 Petr Spacek 2013-09-06 08:46:59 UTC
I'm not sure that I understand you description, there is not enough information for debugging.

- What you see in /var/named/data/named.run logs on IPA master and the plain DNS server? I would recommend you to enable querylog and raise debug level before you start again. This is snippet from /etc/named.conf:
options {
        querylog yes;
        channel default_debug {
                file "data/named.run";
                severity debug 10;
                print-time yes;
        };
};


- Steps 4, 5 are confusing for me:
4. dig thost.suba.testrelm.com @127.0.0.1 | grep A | grep 1.2.3.4
5. dig thostb.suba.testrelm.com @127.0.0.1 | grep A | grep 4.4.4.4
What is the output? Were they run on IPA master or on the plain DNS server?

- Step 6. ipa dnsrecord-add testrelm.com forwarder.suba --a-rec=10.16.96.54
What is output from command "dig @127.0.0.1 -t A forwarder.suba.testrelm.com."
(please run it after step 6 and before step 7)

- Step 7. ipa dnsrecord-add testrelm.com suba --ns-rec=forwarder.suba.testrelm.com.
NS record name is incorrect, all NS records and associated glue records in parent zone should exactly match values in sub-zone. (It should work, but this mess violates DNS specification.)

What values are configured in /etc/resolv.conf on IPA server? Did you restarted httpd after a change to /etc/resolv.conf?

- Step 9. dig suba.testrelm.com
@ parameter is missing

Comment 5 Petr Spacek 2013-09-06 12:41:14 UTC
I'm not able to reproduce the problem on my test system with following packages:
bind-dyndb-ldap-3.5-1.fc19.x86_64
freeipa-server-3.3.1-1.fc19.x86_64

I think that it is some misconfiguration.

Wild guesses:
1) /etc/resolv.conf on FreeIPA server points to some 'upstream' servers but not to FreeIPA servers.
(This is causes problem because you hijacked testrelm.com domain and there is no real delegation from 'com.' domain.)

2) FreeIPA DNS is configured with some global forwarder (in LDAP or /etc/named.conf). As usual, global forwarding forces BIND to ignore normal zone delegation and to send queries to configured 'upstream' server.

In that case you have two options:
- Replace global forwarder with something more specific (i.e. remove global forwarder and replace it with forwarder for zone redhat.com.)
OR
- Configure forwarding policy 'none' for zones served directly from FreeIPA server (i.e. testrelm.com).

Comment 6 Michael Gregg 2013-09-06 20:02:15 UTC
If you scroll down to the bottom of this reply, you'll see I am now able to get the ipa server to respond with the correct records, but there may still be a problem. 

(In reply to Petr Spacek from comment #4)
> I'm not sure that I understand you description, there is not enough
> information for debugging.
> 
> - What you see in /var/named/data/named.run logs on IPA master and the plain
> DNS server? I would recommend you to enable querylog and raise debug level
> before you start again. This is snippet from /etc/named.conf:
> options {
>         querylog yes;
>         channel default_debug {
>                 file "data/named.run";
>                 severity debug 10;
>                 print-time yes;
>         };
> };
> 

After enabling logging, I get the following:
06-Sep-2013 15:31:10.674 client 127.0.0.1#40159 (thostb.suba.testrelm.com): ns_client_detach: ref = 0
06-Sep-2013 15:31:10.674 client 127.0.0.1#40159 (thostb.suba.testrelm.com): endrequest
06-Sep-2013 15:31:10.674 fetch completed at resolver.c:7521 for thostb.suba.testrelm.com/A in 0.020469: success/success [domain:suba.testrelm.com,referral:0,restart:1,qrysent:1,timeout:0,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
06-Sep-2013 15:31:10.674 fetch 0x7faedb1a4430 (fctx 0x7faec8bdf010(thostb.suba.testrelm.com/A)): destroyfetch
06-Sep-2013 15:31:10.674 fctx 0x7faec8bdf010(thostb.suba.testrelm.com/A): shutdown
06-Sep-2013 15:31:10.674 fctx 0x7faec8bdf010(thostb.suba.testrelm.com/A): doshutdown
06-Sep-2013 15:31:10.674 fctx 0x7faec8bdf010(thostb.suba.testrelm.com/A): stopeverything
06-Sep-2013 15:31:10.674 fctx 0x7faec8bdf010(thostb.suba.testrelm.com/A): cancelqueries
06-Sep-2013 15:31:10.674 fctx 0x7faec8bdf010(thostb.suba.testrelm.com/A): unlink
06-Sep-2013 15:31:10.674 fctx 0x7faec8bdf010(thostb.suba.testrelm.com/A): destroy



> 
> - Steps 4, 5 are confusing for me:
> 4. dig thost.suba.testrelm.com @127.0.0.1 | grep A | grep 1.2.3.4
> 5. dig thostb.suba.testrelm.com @127.0.0.1 | grep A | grep 4.4.4.4
> What is the output? Were they run on IPA master or on the plain DNS server?

Steps 4 and 5 are specified to run on the slave to ensure that the domain was set up properly. there output are as follows: 

thost.suba.testrelm.com. 120	IN	A	1.2.3.4
thostb.suba.testrelm.com. 120	IN	A	4.4.4.4

if the greps are dropped from these lines, the output on the slave shows this:
;; AUTHORITY SECTION:
suba.testrelm.com.	400	IN	NS	ns.suba.testrelm.com.
;; ADDITIONAL SECTION:
ns.suba.testrelm.com.	120	IN	A	10.16.76.36


> 
> - Step 6. ipa dnsrecord-add testrelm.com forwarder.suba --a-rec=10.16.96.54
> What is output from command "dig @127.0.0.1 -t A
> forwarder.suba.testrelm.com."
> (please run it after step 6 and before step 7)

Running this would require me to yet again set up another ipa server to diagnose this problem. You just want to verify that the A record is returned before the NS record is added? 

> 
> - Step 7. ipa dnsrecord-add testrelm.com suba
> --ns-rec=forwarder.suba.testrelm.com.
> NS record name is incorrect, all NS records and associated glue records in
> parent zone should exactly match values in sub-zone. (It should work, but
> this mess violates DNS specification.)

The only shared record here is "forwarder.suba.testrelm.com", correct? I added it to the sub zone to no effect. 

Also, I agree that a mismatch will violate the DNS spec, but I am reproducing this bug from the original description in BZ811074

> 
> What values are configured in /etc/resolv.conf on IPA server? Did you
> restarted httpd after a change to /etc/resolv.conf?

the master's /etc/resolv.conf is as follows:

search testrelm.com
nameserver 10.16.76.35 

10.16.76.35 is the master ipa server. 

I did not change resolve.conf, so I did not restart apache. 

I just restarted apache and experienced no behavior change.



> 
> - Step 9. dig suba.testrelm.com
> @ parameter is missing

There was no @ pramater because the querry would have just run against the server specified in resolv.conf. 

Specifying dig suba.testrelm.com @127.0.0.1 on the master produces the same output. 


Replying to comment 5: 

1. First, resolv.conf points to the the local dns server on the master, so, there shouldn't be any hyjacking. second, most of the queries are specified to go directly to the master or slave, bypassing resolv.conf's configuration.

2. named.conf on the master does have a forwarder configured here:

forward first;
      forwarders {
             10.11.5.19;
     };

Commenting it out, or making the forwarder the slave server fixes this problem. 

This addition to named.conf was done by ipa-server-install when "--forwarder=10.11.5.19" was specified. 

The problem being that forwarder definition is required for most normal DNS behavior. Most client installs will have a forwarder configured. 

I do not understand is why does the NS record for this domain get picked up from the forwarder before it is picked up from the local server? If it exists on the local server, shouldn't it get picked up before hitting the forwarded. 

Note, I commented out the "forward first" option, and the behavior doesn't seem to change.

Comment 7 Petr Spacek 2013-09-09 10:56:48 UTC
(In reply to Michael Gregg from comment #6)
> (In reply to Petr Spacek from comment #4)
> > - Step 6. ipa dnsrecord-add testrelm.com forwarder.suba --a-rec=10.16.96.54
> > What is output from command "dig @127.0.0.1 -t A
> > forwarder.suba.testrelm.com."
> > (please run it after step 6 and before step 7)
> 
> Running this would require me to yet again set up another ipa server to
> diagnose this problem. You just want to verify that the A record is returned
> before the NS record is added?
Yes, that is what I wanted to see. You don't need to set up another server if you are okay with my answers below.


> > - Step 7. ipa dnsrecord-add testrelm.com suba
> > --ns-rec=forwarder.suba.testrelm.com.
> > NS record name is incorrect, all NS records and associated glue records in
> > parent zone should exactly match values in sub-zone. (It should work, but
> > this mess violates DNS specification.)
> 
> The only shared record here is "forwarder.suba.testrelm.com", correct? I
> added it to the sub zone to no effect. 
In our case zones share NS+A records.

Generally, NS records have to be shared between 'parent' and 'child' in all cases. Glue records are shared only if glue belongs to 'child' zone (which is our case).


> > - Step 9. dig suba.testrelm.com
> > @ parameter is missing
> 
> There was no @ pramater because the querry would have just run against the
> server specified in resolv.conf. 
Okay. I mentioned @ parameter because I didn't know content of /etc/resolv.conf.
(I personally prefer to explicitly use @ parameter because then I don't need to rely on /etc/resolv.conf.)


> Replying to comment 5: 
[...]
> 2. named.conf on the master does have a forwarder configured here:
> 
> forward first;
>       forwarders {
>              10.11.5.19;
>      };
> 
> Commenting it out, or making the forwarder the slave server fixes this
> problem. 
> 
> This addition to named.conf was done by ipa-server-install when
> "--forwarder=10.11.5.19" was specified.
Yes, you are right.

> The problem being that forwarder definition is required for most normal DNS
> behavior. Most client installs will have a forwarder configured. 
I agree with that.

> I do not understand is why does the NS record for this domain get picked up
> from the forwarder before it is picked up from the local server? If it
> exists on the local server, shouldn't it get picked up before hitting the
> forwarded. 
We do the same thing as original BIND 9 does. (AFAIK the behavior in FreeIPA have not changed from the time when support for 'forwarders' was introduced.)

I agree that it is a bit weird but on the other side I think that we should not diverge from BIND 9 behavior. Minimal behavioral changes => easier transition from plain BIND to FreeIPA.

You can read discussion about this behavior here: https://lists.isc.org/pipermail/bind-users/2006-January/060810.html and http://www.brandonhutchinson.com/Overriding_global_forwarding_with_BIND.html .

FreeIPA's equivalent for 'forwarders {}' is $ ipa dnszone-mod example.com '--forward-policy=none'. (There was strong opposition against '--forward-policy= '.)


> Note, I commented out the "forward first" option, and the behavior doesn't
> seem to change.
AFAIK "first" is the default value. It would help if you would specify per-zone forward policy 'none' (via FreeIPA CLI).

I'm closing this bug because current behaviour matches native BIND9 behavior, i.e. it works as designed by ISC. Feel free to re-open the bug if you disagree.

Comment 8 Petr Spacek 2013-09-11 10:17:10 UTC
I found that this configuration will not work if you have mismatching records in parent and child zone.

I.e. you have to follow recommendations mentioned comment #7 in Step 7:
> > - Step 7. ipa dnsrecord-add testrelm.com suba
> > --ns-rec=forwarder.suba.testrelm.com.
> > NS record name is incorrect, all NS records and associated glue records in
> > parent zone should exactly match values in sub-zone. (It should work, but
> > this mess violates DNS specification.)

In reality, the parent zone has to contain NS and A records for *ns.suba*. A mismatch causes that *second* name resolution of any name from child zone will fail with SERVFAIL.

I think that this is caused by cache implementation details in BIND:
BIND 9 caches NS records from child zone (obtained during first query). These cached NS records are used for subsequent queries, but this causes that BIND can't find matching A (glue) records and resolution fails.

Comment 9 Petr Spacek 2013-09-12 08:37:32 UTC
How it works and why it fails in the case described above:

The 'parent' server starts with empty cache, so delegation record for child zone 'suba.testrelm.com.' is read from zone 'testrelm.com.', i.e. from LDAP.

NS record for child zone is received along with first answer from child server (it is in AUTHORITY section of the answer). This NS record is cached on the parent server.

Second query for a record from the child zone will find NS record for child zone in the cache. Now the 'parent' server has to find IP address for the name from NS record ... and it fails, because there is no such record in LDAP.

Don't ask me why BIND caches only data from AUTHORITY section and not from the ADDITIONAL section of the answer The important piece is that this algorithm is internal to BIND and our plugin doesn't affect it in any way.