Bug 216349 - Bind crashes with socket.c:1674: INSIST(!sock->pending_send)
Summary: Bind crashes with socket.c:1674: INSIST(!sock->pending_send)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: bind
Version: 6
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Adam Tkac
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-11-19 20:03 UTC by Bojan Smojver
Modified: 2013-04-30 23:34 UTC (History)
1 user (show)

Fixed In Version: 9.4.1-6.1.fc7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-07-11 10:55:52 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Configuration file (3.14 KB, text/plain)
2006-11-30 21:25 UTC, Bojan Smojver
no flags Details
Last 10000 lines of the log before crash (38.51 KB, application/x-bzip2)
2006-12-14 20:25 UTC, Bojan Smojver
no flags Details

Description Bojan Smojver 2006-11-19 20:03:40 UTC
Description of problem:
Intermittently, bind crashes with the following in the log file:

--------------------------------
Nov 20 06:48:03 beauty named[14897]: socket.c:1674: INSIST(!sock->pending_send) 
failed
Nov 20 06:48:03 beauty named[14897]: exiting (due to assertion failure)
--------------------------------

Version-Release number of selected component (if applicable):
9.3.3-6.fc6

How reproducible:
Intermittent.

Steps to Reproduce:
1. Run bind.
2. Wait for it to crash?
  
Actual results:
Bind crashes intermittently.

Expected results:
Shouldn't crash.

Additional info:
This is bind in chroot environment under SELinux (targeted). It is running two
views.

Comment 1 Bojan Smojver 2006-11-21 10:06:10 UTC
Just had another one. It happened right after I brought up an IPSec tunnel on
this box. Somehow, I think previous crashes may have been around the same time.
Not sure if it's related...

Comment 2 Martin Stransky 2006-11-22 13:04:32 UTC
Could you please attach an output of named when it starts? from /var/log/messages.

Comment 3 Bojan Smojver 2006-11-22 19:43:32 UTC
Nov 21 20:50:36 beauty named[3775]: starting BIND 9.3.3rc3 -u named -t /var/name
d/chroot
Nov 21 20:50:36 beauty named[3775]: found 1 CPU, using 1 worker thread
Nov 21 20:50:36 beauty named[3775]: loading configuration from '/etc/named.conf'
Nov 21 20:50:36 beauty named[3775]: listening on IPv4 interface lo, 127.0.0.1#53
Nov 21 20:50:36 beauty named[3775]: listening on IPv4 interface lo:0, 203.171.74
.242#53
Nov 21 20:50:36 beauty named[3775]: listening on IPv4 interface bond0, 172.27.0.
1#53
Nov 21 20:50:36 beauty named[3775]: command channel listening on 127.0.0.1#953
Nov 21 20:50:36 beauty named[3775]: zone 0.in-addr.arpa/IN/internal: loaded seri
al 42
Nov 21 20:50:36 beauty named[3775]: zone 0.0.127.in-addr.arpa/IN/internal: loade
d serial 1997022700
Nov 21 20:50:36 beauty named[3775]: zone 27.172.in-addr.arpa/IN/internal: loaded
 serial 2006071500
Nov 21 20:50:36 beauty named[3775]: zone 255.in-addr.arpa/IN/internal: loaded se
rial 42
Nov 21 20:50:36 beauty named[3775]: zone 0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa/IN/internal: loaded serial 1997022700
Nov 21 20:50:36 beauty named[3775]: zone rexursive.com/IN/internal: loaded seria
l 2006083000
Nov 21 20:50:36 beauty named[3775]: zone dev/IN/internal: loaded serial 20060315
00
Nov 21 20:50:36 beauty named[3775]: zone localdomain/IN/internal: loaded serial 
42
Nov 21 20:50:36 beauty named[3775]: zone localhost/IN/internal: loaded serial 42
Nov 21 20:50:36 beauty named[3775]: zone rexursive.com.au/IN/external: loaded se
rial 2006083001
Nov 21 20:50:36 beauty named[3775]: zone rexursive.com/IN/external: loaded seria
l 2006083001
Nov 21 20:50:36 beauty named[3775]: zone rexursive.net/IN/external: loaded seria
l 2006083001
Nov 21 20:50:36 beauty named[3775]: zone rexursive.org/IN/external: loaded seria
l 2006083001
Nov 21 20:50:36 beauty named[3775]: running
Nov 21 20:50:36 beauty named[3775]: zone 27.172.in-addr.arpa/IN/internal: sendin
g notifies (serial 2006071500)
Nov 21 20:50:36 beauty named[3775]: zone rexursive.com/IN/internal: sending noti
fies (serial 2006083000)
Nov 21 20:50:36 beauty named[3775]: zone rexursive.org/IN/external: sending noti
fies (serial 2006083001)
Nov 21 20:50:36 beauty named[3775]: zone rexursive.com.au/IN/external: sending n
otifies (serial 2006083001)
Nov 21 20:50:36 beauty named[3775]: zone dev/IN/internal: sending notifies (seri
al 2006031500)
Nov 21 20:50:36 beauty named[3775]: zone rexursive.com/IN/external: sending noti
fies (serial 2006083001)
Nov 21 20:50:36 beauty named[3775]: zone rexursive.net/IN/external: sending noti
fies (serial 2006083001)
Nov 21 20:50:36 beauty named[3775]: client 172.27.0.1#32824: view internal: rece
ived notify for zone '27.172.in-addr.arpa'
Nov 21 20:50:37 beauty named[3775]: client 172.27.0.1#32824: view internal: rece
ived notify for zone 'rexursive.com'
Nov 21 20:50:37 beauty named[3775]: client 172.27.0.1#32824: view internal: rece
ived notify for zone 'dev'

Comment 4 Bojan Smojver 2006-11-29 21:02:26 UTC
Had a few more crashes and it seems it is related somehow to bringing up that
IPSec tunnel. More on the tunnel itself in bug #217306.

Comment 5 Martin Stransky 2006-11-30 10:30:32 UTC
Could you please attach a log from named? I need to figure out what happened
before the crash.

You can add this line to /etc/sysconfig/named:

OPTIONS="-d 10"

and when bind crash attach /var/named/data/named.run here. Caution, this file
can be quite big.

btw. your bind configuration could be quite helpful, too.

Comment 6 Bojan Smojver 2006-11-30 21:24:45 UTC
Did the OPTIONS="-d 10", but that produced nothing. Maybe because all logging
goes to syslog and debugging messages are ignored? Bind did crash in the middle
of bringing up IPSec tunnel this morning.

I also tried this:

/usr/sbin/named -fg -d 10 -u named -t /var/named/chroot 2>/root/named.log

I couldn't reproduce the problem any more when bringing tunnel down/up. I'll try
again tomorrow.

I'm attaching the config file.

Comment 7 Bojan Smojver 2006-11-30 21:25:35 UTC
Created attachment 142529 [details]
Configuration file

Comment 8 Bojan Smojver 2006-12-01 03:15:58 UTC
A similar thing, seen before:

http://bugzilla.mandriva.com/show_bug.cgi?id=8319

It is marked as FIXED, but reading the bug report would suggest that the bug
isn't triggered because the users stopped running that setup.

Comment 9 Bojan Smojver 2006-12-02 03:05:32 UTC
I'll be away for about 10 days now, so I'll try to collect a bit more
information when I come back.

Comment 10 Bojan Smojver 2006-12-12 22:30:51 UTC
Tried to replicate with -d 10 turned on. Even with *.debug going to
/var/log/messages, there is nothing more in the system logs. Weird. There is no
trace of suggested named.run file anywhere. Any other suggestions?

And, I cannot replicate this at will. It appears to crash when the IPSec tunnel
comes up for the first time that day. At that point named has been running for a
while (overnight).

Comment 11 Bojan Smojver 2006-12-12 22:32:55 UTC
Oh, and forgot to mention - named did crash once more:

--------------------------------------------
Dec 13 08:19:06 beauty named[9884]: socket.c:1674: INSIST(!sock->pending_send)
failed
Dec 13 08:19:06 beauty named[9884]: exiting (due to assertion failure)
--------------------------------------------

This was again in the middle of IPSec negotiation.

Comment 12 Bojan Smojver 2006-12-12 22:53:59 UTC
Another report of the same thing, also IPSec related:

http://forums.sw-soft.com/showthread.php?s=33fa21348c331c7b14175e310c5bcc43&threadid=21484&goto=nextnewest

Comment 13 Martin Stransky 2006-12-13 10:58:09 UTC
Okay, thanks, I'll check it.

Comment 14 Bojan Smojver 2006-12-13 22:41:26 UTC
I'm now running named in foreground mode, like this:

/usr/sbin/named -fg -d 10 -u named -t /var/named/chroot

Already have 95 MB of logs. When it crashes next (unless it's some kind
condition that doesn't get hit when logging is increased), we should know what
happened. This morning when I brought the tunnel up, it continued working. Maybe
tomorrow...

Comment 15 Bojan Smojver 2006-12-14 20:25:02 UTC
Created attachment 143691 [details]
Last 10000 lines of the log before crash

The whole log file is 169 MB and if you need it, let me know.

Comment 16 Bojan Smojver 2007-02-07 21:04:11 UTC
Ditto bind-9.3.4-2.fc6.

Comment 17 Adam Tkac 2007-04-18 12:13:06 UTC
I'm not sure if this is bind's problem. Crash is only reproducible during IPSec
negotiation? If yes, it seems like problem on IPSec layer, not in bind.

-A-

Comment 18 Bojan Smojver 2007-04-18 22:37:37 UTC
You are correct. Crash is only reproducible during IPSec negotition and then
intermittently. I'm seeing it still with the latest FC6 kernel
(2.6.20-1.2944.fc6). May as well be a kernel bug.

Comment 22 Bojan Smojver 2007-06-11 21:52:09 UTC
Since switching this box to Fedora 7, I have not had a crash yet. I'll leave it
for another month or so and if there are no crashes, I'll close.

Comment 23 Adam Tkac 2007-06-12 09:01:13 UTC
Interesting. I read changelog and there's nothing related to this bug. Looks one
of the mysterious fixes...

Adam

Comment 24 Bojan Smojver 2007-06-12 10:27:07 UTC
It is also possible that:

- this was really a kernel bug that got fixed
- this bug is still there but other code has been reworked so it didn't get
triggered yet (i.e. this bug was intermittent before - maybe it became even less
frequent with the new code)

Anyhow, I use this stuff almost every day and I'll report back if this failure
happens again.

Comment 25 Bojan Smojver 2007-06-12 21:37:09 UTC
Aha, different error now, it would seem - a non fatal one, but still right in
the middle of IPSec negotiation:

--------------------------------------------
Jun 13 07:31:45 beauty racoon: INFO: initiate new phase 2 negotiation:
<source_ip>[500]<=><dest_ip>[500] 
Jun 13 07:32:14 beauty named[1880]: socket.c:1178: unexpected error:
Jun 13 07:32:14 beauty named[1880]: internal_send: 10.0.0.35#53: No such process
Jun 13 07:32:14 beauty named[1880]: errno2result.c:111: unexpected error:
Jun 13 07:32:14 beauty named[1880]: unable to convert errno to isc_result: 3: No
 such process
Jun 13 07:32:15 beauty racoon: INFO: IPsec-SA expired: ESP/Tunnel <dest_ip>[0
]-><source_ip>[0] spi=[...]
--------------------------------------------

So, maybe the error handling got a bit smarter after all?

Comment 26 Adam Tkac 2007-06-13 13:01:03 UTC
Could you please reproduce this problem with
http://people.redhat.com/atkac/test_srpms/bind-9.4.1-5.1.fc7.src.rpm and attach
output from log? It could print what error occurs.

Thanks, Adam

Comment 27 Bojan Smojver 2007-07-10 08:45:49 UTC
I can't see this any more: bind-9.4.1-6.1.fc7, kernel-2.6.21-1.3228.fc7.

Comment 28 Adam Tkac 2007-07-11 10:55:52 UTC
Chm, named doesn't have any fix related to this one. Closing like one of
interesting fixes... Please reopen if occurs again

Adam

Comment 29 Bojan Smojver 2007-07-12 00:06:43 UTC
Chalk one up for cosmic rays :-)

PS. This may have been a kernel issue that got fixed somewhere along the way.


Note You need to log in before you can comment on or make changes to this bug.