Bug 251336 - dynamic update - deleting a srv record - bind asserts name.c:1695: INSIST(count <= 63) failed and aborts
dynamic update - deleting a srv record - bind asserts name.c:1695: INSIST(cou...
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: bind (Show other bugs)
7
All Linux
low Severity urgent
: ---
: ---
Assigned To: Adam Tkac
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-08-08 09:53 EDT by Vinay Y S
Modified: 2013-04-30 19:36 EDT (History)
2 users (show)

See Also:
Fixed In Version: 9.4.1-9.P1.fc7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-08-15 15:46:34 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
bind-abort-bugreport.tgz - has config files, and python code that can repro the issue. (320.70 KB, application/octet-stream)
2007-08-08 09:53 EDT, Vinay Y S
no flags Details

  None (edit)
Description Vinay Y S 2007-08-08 09:53:30 EDT
Description of problem:
A dynamic update request over tcp or udp with or without tsig from localhost or
remote address to delete a SRV record as shown in the sample code causes the
server to abort with the log message
name.c:1695: INSIST(count <= 63) failed 


Version-Release number of selected component (if applicable):
bind-9.4.1-8.P1.fc7

How reproducible:
Have attached a tgz file that contains the required files.
Use the files in that tgz to reproduce as follows.

Steps to Reproduce:
1. Copy named.conf to /var/named/chroot/etc/named.conf
2. Copy dummy.com.db to /var/named/chroot/var/named/dummy.com.db
3. Start named
4. Verify that it has loaded the zone correctly from /var/log/messages
5. Install python-dns via yum.
6. Run test.py
7. Verify if bind crashed.
  
Actual results:
Bind aborts with the log message
name.c:1695: INSIST(count <= 63) failed 
in /var/log/messages.

Expected results:
Expected the updated to complete and have bind still running.

Additional info:
Brief stack trace:
#0  0xbfffe402 in __kernel_vsyscall ()
#1  0xb7b221c0 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0xb7b23ba1 in *__GI_abort () at abort.c:88
#3  0x8001da0c in assertion_failed (file=0xb7f7f274 "name.c", line=1695,
type=isc_assertiontype_insist, cond=0xb7f7f2ae "count <= 63") at ./main.c:159
#4  0xb7eb5e6c in set_offsets (name=0xb7a8bd60, offsets=0xb7a8bc68 "",
set_name=0xb7a8bd60) at name.c:1695
#5  0xb7eb965c in dns_name_fromregion (name=0xb7a8bd60, r=0xb7a8bd94) at name.c:999
#6  0xb7f1141f in compare_in_srv (rdata1=0xb7a8c0d0, rdata2=0xb7a8bef0) at
rdata/in_1/srv_33.c:230
#7  0xb7f12a5f in dns_rdata_compare (rdata1=0xb7a8c0d0, rdata2=0xb7a8bef0) at
rdata.c:348
#8  0x800361d4 in rr_equal_p (update_rr=0xb7a8c0d0, db_rr=0xb7a8bef0) at
update.c:1015
#9  0x80036401 in delete_if_action (data=0xb7a8bf44, rr=0xb7a8beec) at update.c:1059
#10 0x80035f5e in foreach_rr (db=0xb7a9b8f0, ver=0xb7a961e8, name=<value
optimized out>, type=33, covers=0, rr_action=0x800363e0 <delete_if_action>,
    rr_action_data=0xb7a8bf44) at update.c:550
#11 0x80036047 in delete_if (predicate=<value optimized out>, db=0xb7a9b8f0,
ver=0xb7a961e8, name=0xb6211038, type=33, covers=0, update_rr=0xb7a8c0d0,
    diff=0xb7a8c130) at update.c:1094
#12 0x8003b856 in update_action (task=0xb7aa41a8, event=0xb6215008) at update.c:2790
#13 0xb7ce2a92 in run (uap=0xb7a93008) at task.c:867
#14 0xb7c91472 in start_thread (arg=0xb7a8db90) at pthread_create.c:296
#15 0xb7bcb89e in clone () from /lib/i686/nosegneg/libc.so.6

Full stack trace and the core file is in the attached archive. 
To get your own core file, do ulimit -c unlimted and chmod g+w
/var/named/chroot/var/named so that core can be written.
Comment 1 Vinay Y S 2007-08-08 09:53:32 EDT
Created attachment 160901 [details]
bind-abort-bugreport.tgz - has config files, and python code that can repro the issue.
Comment 2 Vinay Y S 2007-08-08 09:59:25 EDT
Let me know if you need any help/info in reproducing the bug.

The difference between python-dns update and nsupdate cmdline tool is that
nsupdate does with ANY and python-dns does with NONE rrtype. Cursory examination
of bind code tells me that they are different code paths.
Comment 3 Vinay Y S 2007-08-08 10:45:34 EDT
Submitted the same bug report to bind9-bugs@isc.org. The ticket there has been
assigned an ID of [ISC-Bugs #17074].
Comment 4 Vinay Y S 2007-08-08 11:12:07 EDT
This bug is reproduced in BIND 9.5.0a6 too.

The log message seen is:
name.c:1695: INSIST(count <= 63) failed
exiting (due to assertion failure)

Same as earlier.
Comment 5 Vinay Y S 2007-08-10 15:49:06 EDT
I tried the same query with nsupdate by using the following line:
update delete _store._net.user.dummy.com. 300 SRV 0 0 8091 dbserver.dummy.com.

Server didn't assert and quit. The update worked.

So, captured the transaction using ethereal and here's the diff:
diff -u nsupdate.log dnspython.log
--- nsupdate.log        2007-08-10 16:57:22.000000000 +0530
+++ dnspython.log       2007-08-10 16:58:03.000000000 +0530
@@ -1,6 +1,6 @@
 Domain Name System (query)
-    Length: 82
-    Transaction ID: 0x9a27
+    Length: 73
+    Transaction ID: 0xeefd
     Flags: 0x2800 (Dynamic update)
         0... .... .... .... = Response: Message is a query
         .010 1... .... .... = Opcode: Dynamic update (5)
@@ -23,7 +23,7 @@
             Type: SRV (Service location)
             Class: NONE (0x00fe)
             Time to live: 0 time
-            Data length: 26
+            Data length: 17
             Priority: 0
             Weight: 0
             Port: 8091

As you can see the Data length is different. I don't know which is correct. But
in any case, the server shouldn't terminate if the request is bad.

Hope this helps.
Comment 6 Vinay Y S 2007-08-10 16:23:43 EDT
Adam, have you started working on this issue? If you can confirm the problem and
give a hint as to if it is in bind's nsupdate or the dnspython it would be
helpful. (There is problem in bind server for sure as a crash can be caused by a
network input).

If the problem is in dnspython, I would like to raise it on the dnspython forum.
But that would mean this problem with bind server would become public. Would
that be ok?
Comment 7 Adam Tkac 2007-08-10 17:15:07 EDT
I've started doing on this. Please don't put this to any forum yet. It looks
that dnspython sends some corrupted data. This issue also have to be discussed
in upstream before we could mark this one as public.

Thanks, Adam
Comment 8 Adam Tkac 2007-08-13 08:08:50 EDT
Problem is that named can't handle non-absolute domain names in "Target" label
of update section. Let me discuss how solve this problem in upstream
Comment 9 Adam Tkac 2007-08-13 08:16:17 EDT
Btw I don't think this will be marked as security issue because affected query
has to come from trusted server/user. But please still don't tell about this on
any public forum
Comment 10 Vinay Y S 2007-08-13 08:25:05 EDT
I don't think dnspython is sending a non absolute domain name in Target label.
See the ethereal capture here:
Domain Name System (query)
    Length: 73
    Transaction ID: 0xeefd
    Flags: 0x2800 (Dynamic update)
        0... .... .... .... = Response: Message is a query
        .010 1... .... .... = Opcode: Dynamic update (5)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...0 .... .... = Recursion desired: Don't do query recursively
        .... .... .0.. .... = Z: reserved (0)
        .... .... ...0 .... = Non-authenticated data OK: Non-authenticated data 
is unacceptable
    Zones: 1
    Prerequisites: 0
    Updates: 1
    Additional RRs: 0
    Zone
        dummy.com: type SOA, class IN
            Name: dummy.com
            Type: SOA (Start of zone of authority)
            Class: IN (0x0001)
    Updates
        _store._net.user.dummy.com: type SRV, class NONE, priority 0, weight 0, 
port 8091, target dbserver.dummy.com
            Name: _store._net.user.dummy.com
            Type: SRV (Service location)
            Class: NONE (0x00fe)
            Time to live: 0 time
            Data length: 17
            Priority: 0
            Weight: 0
            Port: 8091
            Target: dbserver.dummy.com

The domain is a fully qualified domain name. But, here, the "Data length" is 
17. 
Whereas for the same query from nsupdate, the ethereal capture is like this:
Domain Name System (query)
    Length: 82
    Transaction ID: 0x9a27
    Flags: 0x2800 (Dynamic update)
        0... .... .... .... = Response: Message is a query
        .010 1... .... .... = Opcode: Dynamic update (5)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...0 .... .... = Recursion desired: Don't do query recursively
        .... .... .0.. .... = Z: reserved (0)
        .... .... ...0 .... = Non-authenticated data OK: Non-authenticated data 
is unacceptable
    Zones: 1
    Prerequisites: 0
    Updates: 1
    Additional RRs: 0
    Zone
        dummy.com: type SOA, class IN
            Name: dummy.com
            Type: SOA (Start of zone of authority)
            Class: IN (0x0001)
    Updates
        _store._net.user.dummy.com: type SRV, class NONE, priority 0, weight 0, 
port 8091, target dbserver.dummy.com
            Name: _store._net.user.dummy.com
            Type: SRV (Service location)
            Class: NONE (0x00fe)
            Time to live: 0 time
            Data length: 26
            Priority: 0
            Weight: 0
            Port: 8091
            Target: dbserver.dummy.com

Here the "Data length" is 26. Everything else is same.
The assert happens only in first case and not in second case.
So, I expect one of the above lengths (mostly the one from dnspython) to be 
wrong.


Comment 11 Adam Tkac 2007-08-13 10:45:57 EDT
What exactly utility you used for capture packet? I've used dnscap (this shows
fine-grained output, only in rawhide). If you analyze update query from
python-dns it's not fully qualified domain name

Adam
Comment 12 Vinay Y S 2007-08-13 12:47:09 EDT
I used tshark (tethereal).
tshark -i lo -V

Can I get dnscap on Fedora 7?
Comment 13 Vinay Y S 2007-08-13 16:30:40 EDT
I tried another experiment.
With nsupdate I ran the following:
server 127.0.0.1
zone dummy.com.
update delete _store._net.user.dummy.com. 300 SRV 0 0 8091 dbserver
send

As you notice the target isn't fqdn. But this works. The packets captured from 
the ethereal also confirm the same.

But the data lengths are different again. This time it is just off by 1. So, I 
still feel that it is something to do with the data length.

What is the definition of data length expected by Bind? 
Comment 14 Vinay Y S 2007-08-14 02:47:25 EDT
Hi Adam,
Bug #17074 at isc.org is marked as resolved by Mark Andrews. Do you agree? I 
didn't quite get what the resolution is. If I apply that patch given to 
message.c, will Bind no more assert for the same request?

Also, what should we communicate to dnspython upstream? I assume there is a bug 
in there too?

Thanks,
Comment 15 Adam Tkac 2007-08-14 06:02:58 EDT
Hi,
please see http://people.redhat.com/atkac/bind/ . There's patched bind
(9.4.1-9.P1.fc7) and also dnscap if you're interested. In the end python-dns
sends correct query. This will be marked as public later today because upstream
don't think this is security sensitive. Update will be avaliable very soon

Adam
Comment 16 Mark J. Cox (Product Security) 2007-08-14 06:40:51 EDT
Agreed, not security issue.  Opening bug now upstream have dealt with this.
Comment 17 Fedora Update System 2007-08-15 15:46:27 EDT
bind-9.4.1-9.P1.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 18 Vinay Y S 2007-08-15 15:51:24 EDT
Thanks a lot!
Verified the package with the dnspython code. It works well.

Note You need to log in before you can comment on or make changes to this bug.