Bug 251336 - dynamic update - deleting a srv record - bind asserts name.c:1695: INSIST(count <= 63) failed and aborts
Summary: dynamic update - deleting a srv record - bind asserts name.c:1695: INSIST(cou...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: bind
Version: 7
Hardware: All
OS: Linux
low
urgent
Target Milestone: ---
Assignee: Adam Tkac
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-08-08 13:53 UTC by Vinay Y S
Modified: 2013-04-30 23:36 UTC (History)
2 users (show)

Fixed In Version: 9.4.1-9.P1.fc7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-08-15 19:46:34 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
bind-abort-bugreport.tgz - has config files, and python code that can repro the issue. (320.70 KB, application/octet-stream)
2007-08-08 13:53 UTC, Vinay Y S
no flags Details

Description Vinay Y S 2007-08-08 13:53:30 UTC
Description of problem:
A dynamic update request over tcp or udp with or without tsig from localhost or
remote address to delete a SRV record as shown in the sample code causes the
server to abort with the log message
name.c:1695: INSIST(count <= 63) failed 


Version-Release number of selected component (if applicable):
bind-9.4.1-8.P1.fc7

How reproducible:
Have attached a tgz file that contains the required files.
Use the files in that tgz to reproduce as follows.

Steps to Reproduce:
1. Copy named.conf to /var/named/chroot/etc/named.conf
2. Copy dummy.com.db to /var/named/chroot/var/named/dummy.com.db
3. Start named
4. Verify that it has loaded the zone correctly from /var/log/messages
5. Install python-dns via yum.
6. Run test.py
7. Verify if bind crashed.
  
Actual results:
Bind aborts with the log message
name.c:1695: INSIST(count <= 63) failed 
in /var/log/messages.

Expected results:
Expected the updated to complete and have bind still running.

Additional info:
Brief stack trace:
#0  0xbfffe402 in __kernel_vsyscall ()
#1  0xb7b221c0 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0xb7b23ba1 in *__GI_abort () at abort.c:88
#3  0x8001da0c in assertion_failed (file=0xb7f7f274 "name.c", line=1695,
type=isc_assertiontype_insist, cond=0xb7f7f2ae "count <= 63") at ./main.c:159
#4  0xb7eb5e6c in set_offsets (name=0xb7a8bd60, offsets=0xb7a8bc68 "",
set_name=0xb7a8bd60) at name.c:1695
#5  0xb7eb965c in dns_name_fromregion (name=0xb7a8bd60, r=0xb7a8bd94) at name.c:999
#6  0xb7f1141f in compare_in_srv (rdata1=0xb7a8c0d0, rdata2=0xb7a8bef0) at
rdata/in_1/srv_33.c:230
#7  0xb7f12a5f in dns_rdata_compare (rdata1=0xb7a8c0d0, rdata2=0xb7a8bef0) at
rdata.c:348
#8  0x800361d4 in rr_equal_p (update_rr=0xb7a8c0d0, db_rr=0xb7a8bef0) at
update.c:1015
#9  0x80036401 in delete_if_action (data=0xb7a8bf44, rr=0xb7a8beec) at update.c:1059
#10 0x80035f5e in foreach_rr (db=0xb7a9b8f0, ver=0xb7a961e8, name=<value
optimized out>, type=33, covers=0, rr_action=0x800363e0 <delete_if_action>,
    rr_action_data=0xb7a8bf44) at update.c:550
#11 0x80036047 in delete_if (predicate=<value optimized out>, db=0xb7a9b8f0,
ver=0xb7a961e8, name=0xb6211038, type=33, covers=0, update_rr=0xb7a8c0d0,
    diff=0xb7a8c130) at update.c:1094
#12 0x8003b856 in update_action (task=0xb7aa41a8, event=0xb6215008) at update.c:2790
#13 0xb7ce2a92 in run (uap=0xb7a93008) at task.c:867
#14 0xb7c91472 in start_thread (arg=0xb7a8db90) at pthread_create.c:296
#15 0xb7bcb89e in clone () from /lib/i686/nosegneg/libc.so.6

Full stack trace and the core file is in the attached archive. 
To get your own core file, do ulimit -c unlimted and chmod g+w
/var/named/chroot/var/named so that core can be written.

Comment 1 Vinay Y S 2007-08-08 13:53:32 UTC
Created attachment 160901 [details]
bind-abort-bugreport.tgz - has config files, and python code that can repro the issue.

Comment 2 Vinay Y S 2007-08-08 13:59:25 UTC
Let me know if you need any help/info in reproducing the bug.

The difference between python-dns update and nsupdate cmdline tool is that
nsupdate does with ANY and python-dns does with NONE rrtype. Cursory examination
of bind code tells me that they are different code paths.

Comment 3 Vinay Y S 2007-08-08 14:45:34 UTC
Submitted the same bug report to bind9-bugs. The ticket there has been
assigned an ID of [ISC-Bugs #17074].

Comment 4 Vinay Y S 2007-08-08 15:12:07 UTC
This bug is reproduced in BIND 9.5.0a6 too.

The log message seen is:
name.c:1695: INSIST(count <= 63) failed
exiting (due to assertion failure)

Same as earlier.

Comment 5 Vinay Y S 2007-08-10 19:49:06 UTC
I tried the same query with nsupdate by using the following line:
update delete _store._net.user.dummy.com. 300 SRV 0 0 8091 dbserver.dummy.com.

Server didn't assert and quit. The update worked.

So, captured the transaction using ethereal and here's the diff:
diff -u nsupdate.log dnspython.log
--- nsupdate.log        2007-08-10 16:57:22.000000000 +0530
+++ dnspython.log       2007-08-10 16:58:03.000000000 +0530
@@ -1,6 +1,6 @@
 Domain Name System (query)
-    Length: 82
-    Transaction ID: 0x9a27
+    Length: 73
+    Transaction ID: 0xeefd
     Flags: 0x2800 (Dynamic update)
         0... .... .... .... = Response: Message is a query
         .010 1... .... .... = Opcode: Dynamic update (5)
@@ -23,7 +23,7 @@
             Type: SRV (Service location)
             Class: NONE (0x00fe)
             Time to live: 0 time
-            Data length: 26
+            Data length: 17
             Priority: 0
             Weight: 0
             Port: 8091

As you can see the Data length is different. I don't know which is correct. But
in any case, the server shouldn't terminate if the request is bad.

Hope this helps.

Comment 6 Vinay Y S 2007-08-10 20:23:43 UTC
Adam, have you started working on this issue? If you can confirm the problem and
give a hint as to if it is in bind's nsupdate or the dnspython it would be
helpful. (There is problem in bind server for sure as a crash can be caused by a
network input).

If the problem is in dnspython, I would like to raise it on the dnspython forum.
But that would mean this problem with bind server would become public. Would
that be ok?

Comment 7 Adam Tkac 2007-08-10 21:15:07 UTC
I've started doing on this. Please don't put this to any forum yet. It looks
that dnspython sends some corrupted data. This issue also have to be discussed
in upstream before we could mark this one as public.

Thanks, Adam

Comment 8 Adam Tkac 2007-08-13 12:08:50 UTC
Problem is that named can't handle non-absolute domain names in "Target" label
of update section. Let me discuss how solve this problem in upstream

Comment 9 Adam Tkac 2007-08-13 12:16:17 UTC
Btw I don't think this will be marked as security issue because affected query
has to come from trusted server/user. But please still don't tell about this on
any public forum

Comment 10 Vinay Y S 2007-08-13 12:25:05 UTC
I don't think dnspython is sending a non absolute domain name in Target label.
See the ethereal capture here:
Domain Name System (query)
    Length: 73
    Transaction ID: 0xeefd
    Flags: 0x2800 (Dynamic update)
        0... .... .... .... = Response: Message is a query
        .010 1... .... .... = Opcode: Dynamic update (5)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...0 .... .... = Recursion desired: Don't do query recursively
        .... .... .0.. .... = Z: reserved (0)
        .... .... ...0 .... = Non-authenticated data OK: Non-authenticated data 
is unacceptable
    Zones: 1
    Prerequisites: 0
    Updates: 1
    Additional RRs: 0
    Zone
        dummy.com: type SOA, class IN
            Name: dummy.com
            Type: SOA (Start of zone of authority)
            Class: IN (0x0001)
    Updates
        _store._net.user.dummy.com: type SRV, class NONE, priority 0, weight 0, 
port 8091, target dbserver.dummy.com
            Name: _store._net.user.dummy.com
            Type: SRV (Service location)
            Class: NONE (0x00fe)
            Time to live: 0 time
            Data length: 17
            Priority: 0
            Weight: 0
            Port: 8091
            Target: dbserver.dummy.com

The domain is a fully qualified domain name. But, here, the "Data length" is 
17. 
Whereas for the same query from nsupdate, the ethereal capture is like this:
Domain Name System (query)
    Length: 82
    Transaction ID: 0x9a27
    Flags: 0x2800 (Dynamic update)
        0... .... .... .... = Response: Message is a query
        .010 1... .... .... = Opcode: Dynamic update (5)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...0 .... .... = Recursion desired: Don't do query recursively
        .... .... .0.. .... = Z: reserved (0)
        .... .... ...0 .... = Non-authenticated data OK: Non-authenticated data 
is unacceptable
    Zones: 1
    Prerequisites: 0
    Updates: 1
    Additional RRs: 0
    Zone
        dummy.com: type SOA, class IN
            Name: dummy.com
            Type: SOA (Start of zone of authority)
            Class: IN (0x0001)
    Updates
        _store._net.user.dummy.com: type SRV, class NONE, priority 0, weight 0, 
port 8091, target dbserver.dummy.com
            Name: _store._net.user.dummy.com
            Type: SRV (Service location)
            Class: NONE (0x00fe)
            Time to live: 0 time
            Data length: 26
            Priority: 0
            Weight: 0
            Port: 8091
            Target: dbserver.dummy.com

Here the "Data length" is 26. Everything else is same.
The assert happens only in first case and not in second case.
So, I expect one of the above lengths (mostly the one from dnspython) to be 
wrong.




Comment 11 Adam Tkac 2007-08-13 14:45:57 UTC
What exactly utility you used for capture packet? I've used dnscap (this shows
fine-grained output, only in rawhide). If you analyze update query from
python-dns it's not fully qualified domain name

Adam

Comment 12 Vinay Y S 2007-08-13 16:47:09 UTC
I used tshark (tethereal).
tshark -i lo -V

Can I get dnscap on Fedora 7?

Comment 13 Vinay Y S 2007-08-13 20:30:40 UTC
I tried another experiment.
With nsupdate I ran the following:
server 127.0.0.1
zone dummy.com.
update delete _store._net.user.dummy.com. 300 SRV 0 0 8091 dbserver
send

As you notice the target isn't fqdn. But this works. The packets captured from 
the ethereal also confirm the same.

But the data lengths are different again. This time it is just off by 1. So, I 
still feel that it is something to do with the data length.

What is the definition of data length expected by Bind? 

Comment 14 Vinay Y S 2007-08-14 06:47:25 UTC
Hi Adam,
Bug #17074 at isc.org is marked as resolved by Mark Andrews. Do you agree? I 
didn't quite get what the resolution is. If I apply that patch given to 
message.c, will Bind no more assert for the same request?

Also, what should we communicate to dnspython upstream? I assume there is a bug 
in there too?

Thanks,


Comment 15 Adam Tkac 2007-08-14 10:02:58 UTC
Hi,
please see http://people.redhat.com/atkac/bind/ . There's patched bind
(9.4.1-9.P1.fc7) and also dnscap if you're interested. In the end python-dns
sends correct query. This will be marked as public later today because upstream
don't think this is security sensitive. Update will be avaliable very soon

Adam

Comment 16 Mark J. Cox 2007-08-14 10:40:51 UTC
Agreed, not security issue.  Opening bug now upstream have dealt with this.

Comment 17 Fedora Update System 2007-08-15 19:46:27 UTC
bind-9.4.1-9.P1.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 18 Vinay Y S 2007-08-15 19:51:24 UTC
Thanks a lot!
Verified the package with the dnspython code. It works well.


Note You need to log in before you can comment on or make changes to this bug.