Description of problem: A dynamic update request over tcp or udp with or without tsig from localhost or remote address to delete a SRV record as shown in the sample code causes the server to abort with the log message name.c:1695: INSIST(count <= 63) failed Version-Release number of selected component (if applicable): bind-9.4.1-8.P1.fc7 How reproducible: Have attached a tgz file that contains the required files. Use the files in that tgz to reproduce as follows. Steps to Reproduce: 1. Copy named.conf to /var/named/chroot/etc/named.conf 2. Copy dummy.com.db to /var/named/chroot/var/named/dummy.com.db 3. Start named 4. Verify that it has loaded the zone correctly from /var/log/messages 5. Install python-dns via yum. 6. Run test.py 7. Verify if bind crashed. Actual results: Bind aborts with the log message name.c:1695: INSIST(count <= 63) failed in /var/log/messages. Expected results: Expected the updated to complete and have bind still running. Additional info: Brief stack trace: #0 0xbfffe402 in __kernel_vsyscall () #1 0xb7b221c0 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0xb7b23ba1 in *__GI_abort () at abort.c:88 #3 0x8001da0c in assertion_failed (file=0xb7f7f274 "name.c", line=1695, type=isc_assertiontype_insist, cond=0xb7f7f2ae "count <= 63") at ./main.c:159 #4 0xb7eb5e6c in set_offsets (name=0xb7a8bd60, offsets=0xb7a8bc68 "", set_name=0xb7a8bd60) at name.c:1695 #5 0xb7eb965c in dns_name_fromregion (name=0xb7a8bd60, r=0xb7a8bd94) at name.c:999 #6 0xb7f1141f in compare_in_srv (rdata1=0xb7a8c0d0, rdata2=0xb7a8bef0) at rdata/in_1/srv_33.c:230 #7 0xb7f12a5f in dns_rdata_compare (rdata1=0xb7a8c0d0, rdata2=0xb7a8bef0) at rdata.c:348 #8 0x800361d4 in rr_equal_p (update_rr=0xb7a8c0d0, db_rr=0xb7a8bef0) at update.c:1015 #9 0x80036401 in delete_if_action (data=0xb7a8bf44, rr=0xb7a8beec) at update.c:1059 #10 0x80035f5e in foreach_rr (db=0xb7a9b8f0, ver=0xb7a961e8, name=<value optimized out>, type=33, covers=0, rr_action=0x800363e0 <delete_if_action>, rr_action_data=0xb7a8bf44) at update.c:550 #11 0x80036047 in delete_if (predicate=<value optimized out>, db=0xb7a9b8f0, ver=0xb7a961e8, name=0xb6211038, type=33, covers=0, update_rr=0xb7a8c0d0, diff=0xb7a8c130) at update.c:1094 #12 0x8003b856 in update_action (task=0xb7aa41a8, event=0xb6215008) at update.c:2790 #13 0xb7ce2a92 in run (uap=0xb7a93008) at task.c:867 #14 0xb7c91472 in start_thread (arg=0xb7a8db90) at pthread_create.c:296 #15 0xb7bcb89e in clone () from /lib/i686/nosegneg/libc.so.6 Full stack trace and the core file is in the attached archive. To get your own core file, do ulimit -c unlimted and chmod g+w /var/named/chroot/var/named so that core can be written.
Created attachment 160901 [details] bind-abort-bugreport.tgz - has config files, and python code that can repro the issue.
Let me know if you need any help/info in reproducing the bug. The difference between python-dns update and nsupdate cmdline tool is that nsupdate does with ANY and python-dns does with NONE rrtype. Cursory examination of bind code tells me that they are different code paths.
Submitted the same bug report to bind9-bugs. The ticket there has been assigned an ID of [ISC-Bugs #17074].
This bug is reproduced in BIND 9.5.0a6 too. The log message seen is: name.c:1695: INSIST(count <= 63) failed exiting (due to assertion failure) Same as earlier.
I tried the same query with nsupdate by using the following line: update delete _store._net.user.dummy.com. 300 SRV 0 0 8091 dbserver.dummy.com. Server didn't assert and quit. The update worked. So, captured the transaction using ethereal and here's the diff: diff -u nsupdate.log dnspython.log --- nsupdate.log 2007-08-10 16:57:22.000000000 +0530 +++ dnspython.log 2007-08-10 16:58:03.000000000 +0530 @@ -1,6 +1,6 @@ Domain Name System (query) - Length: 82 - Transaction ID: 0x9a27 + Length: 73 + Transaction ID: 0xeefd Flags: 0x2800 (Dynamic update) 0... .... .... .... = Response: Message is a query .010 1... .... .... = Opcode: Dynamic update (5) @@ -23,7 +23,7 @@ Type: SRV (Service location) Class: NONE (0x00fe) Time to live: 0 time - Data length: 26 + Data length: 17 Priority: 0 Weight: 0 Port: 8091 As you can see the Data length is different. I don't know which is correct. But in any case, the server shouldn't terminate if the request is bad. Hope this helps.
Adam, have you started working on this issue? If you can confirm the problem and give a hint as to if it is in bind's nsupdate or the dnspython it would be helpful. (There is problem in bind server for sure as a crash can be caused by a network input). If the problem is in dnspython, I would like to raise it on the dnspython forum. But that would mean this problem with bind server would become public. Would that be ok?
I've started doing on this. Please don't put this to any forum yet. It looks that dnspython sends some corrupted data. This issue also have to be discussed in upstream before we could mark this one as public. Thanks, Adam
Problem is that named can't handle non-absolute domain names in "Target" label of update section. Let me discuss how solve this problem in upstream
Btw I don't think this will be marked as security issue because affected query has to come from trusted server/user. But please still don't tell about this on any public forum
I don't think dnspython is sending a non absolute domain name in Target label. See the ethereal capture here: Domain Name System (query) Length: 73 Transaction ID: 0xeefd Flags: 0x2800 (Dynamic update) 0... .... .... .... = Response: Message is a query .010 1... .... .... = Opcode: Dynamic update (5) .... ..0. .... .... = Truncated: Message is not truncated .... ...0 .... .... = Recursion desired: Don't do query recursively .... .... .0.. .... = Z: reserved (0) .... .... ...0 .... = Non-authenticated data OK: Non-authenticated data is unacceptable Zones: 1 Prerequisites: 0 Updates: 1 Additional RRs: 0 Zone dummy.com: type SOA, class IN Name: dummy.com Type: SOA (Start of zone of authority) Class: IN (0x0001) Updates _store._net.user.dummy.com: type SRV, class NONE, priority 0, weight 0, port 8091, target dbserver.dummy.com Name: _store._net.user.dummy.com Type: SRV (Service location) Class: NONE (0x00fe) Time to live: 0 time Data length: 17 Priority: 0 Weight: 0 Port: 8091 Target: dbserver.dummy.com The domain is a fully qualified domain name. But, here, the "Data length" is 17. Whereas for the same query from nsupdate, the ethereal capture is like this: Domain Name System (query) Length: 82 Transaction ID: 0x9a27 Flags: 0x2800 (Dynamic update) 0... .... .... .... = Response: Message is a query .010 1... .... .... = Opcode: Dynamic update (5) .... ..0. .... .... = Truncated: Message is not truncated .... ...0 .... .... = Recursion desired: Don't do query recursively .... .... .0.. .... = Z: reserved (0) .... .... ...0 .... = Non-authenticated data OK: Non-authenticated data is unacceptable Zones: 1 Prerequisites: 0 Updates: 1 Additional RRs: 0 Zone dummy.com: type SOA, class IN Name: dummy.com Type: SOA (Start of zone of authority) Class: IN (0x0001) Updates _store._net.user.dummy.com: type SRV, class NONE, priority 0, weight 0, port 8091, target dbserver.dummy.com Name: _store._net.user.dummy.com Type: SRV (Service location) Class: NONE (0x00fe) Time to live: 0 time Data length: 26 Priority: 0 Weight: 0 Port: 8091 Target: dbserver.dummy.com Here the "Data length" is 26. Everything else is same. The assert happens only in first case and not in second case. So, I expect one of the above lengths (mostly the one from dnspython) to be wrong.
What exactly utility you used for capture packet? I've used dnscap (this shows fine-grained output, only in rawhide). If you analyze update query from python-dns it's not fully qualified domain name Adam
I used tshark (tethereal). tshark -i lo -V Can I get dnscap on Fedora 7?
I tried another experiment. With nsupdate I ran the following: server 127.0.0.1 zone dummy.com. update delete _store._net.user.dummy.com. 300 SRV 0 0 8091 dbserver send As you notice the target isn't fqdn. But this works. The packets captured from the ethereal also confirm the same. But the data lengths are different again. This time it is just off by 1. So, I still feel that it is something to do with the data length. What is the definition of data length expected by Bind?
Hi Adam, Bug #17074 at isc.org is marked as resolved by Mark Andrews. Do you agree? I didn't quite get what the resolution is. If I apply that patch given to message.c, will Bind no more assert for the same request? Also, what should we communicate to dnspython upstream? I assume there is a bug in there too? Thanks,
Hi, please see http://people.redhat.com/atkac/bind/ . There's patched bind (9.4.1-9.P1.fc7) and also dnscap if you're interested. In the end python-dns sends correct query. This will be marked as public later today because upstream don't think this is security sensitive. Update will be avaliable very soon Adam
Agreed, not security issue. Opening bug now upstream have dealt with this.
bind-9.4.1-9.P1.fc7 has been pushed to the Fedora 7 stable repository. If problems still persist, please make note of it in this bug report.
Thanks a lot! Verified the package with the dnspython code. It works well.