77074 – xinetd needs to be restarted, in.ftpd in <defunc> status

Bug 77074 - xinetd needs to be restarted, in.ftpd in <defunc> status

Summary: xinetd needs to be restarted, in.ftpd in <defunc> status

Keywords:
Status:	CLOSED DUPLICATE of bug 76146
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	xinetd
Sub Component:
Version:	7.3
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Trond Eivind Glomsrxd
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	78104
TreeView+	depends on / blocked

Reported:	2002-10-31 20:20 UTC by Bzizou
Modified:	2007-04-18 16:48 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2002-12-03 20:09:55 UTC
Embargoed:

Attachments	(Terms of Use)
A single attack sequence ereal/tcpdump file. fast ack/ fin-ack timing (532 bytes, application/octet-stream) 2002-11-19 00:34 UTC, Stephen Samuel	no flags	Details
After an attack has succeeded, xinetd will respond as in this tcpdump (1.90 KB, application/octet-stream) 2002-11-19 00:39 UTC, Stephen Samuel	no flags	Details
This contains 2 different attacks Note that each one attacks addresses on one subnet at a time. -- ethereal/tcpdump file (55.47 KB, application/octet-stream) 2002-11-19 00:41 UTC, Stephen Samuel	no flags	Details
shell script which seems to trigger the bug on redhat 7.3 boxes.. requires setup in .netrc (686 bytes, text/plain) 2002-11-26 22:36 UTC, Stephen Samuel	no flags	Details
(tested) shell script which hunts and kills defunct children of xinetd .. then issues a kill -USR1 to restart it (dunno why it works yet) (581 bytes, text/plain) 2002-11-26 22:40 UTC, Stephen Samuel	no flags	Details
lsof and strace output (38.72 KB, patch) 2002-11-27 04:00 UTC, Radu Greab	no flags	Details \| Diff
View All

Description Bzizou 2002-10-31 20:20:23 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

Description of problem:
After several hours, ftpd refuses connexions. A ps shows only one process
in.ftpd in defunc status. A xinetd restart solves temporarly the problem.

Version-Release number of selected component (if applicable):
xinetd-2.3.9-0.73
wu-ftpd-2.6.2-5

How reproducible:
Sometimes

Steps to Reproduce:
1.Have a standard heavy loaded ftp server (wu-ftpd enabled in xinetd) up-to-date
2.Wait 1 hours, 2 hours or 2 days...
3.When ftp connexions are refused, do a "ps |grep ftpd" and a
"/etc/rc.d/init.d/xinetd restart"
	

Additional info:

[root@triton root]# ps awux |grep ftp
root      9012  0.0  0.0     0    0 ?        ZN   19:11   0:00 [in.ftpd <defunct>]
[root@triton root]# /etc/rc.d/init.d/xinetd restart
Arrjt de xinetd :                                          [  OK  ]
Dimarrage de xinetd :                                      [  OK  ]
[root@triton root]# ps awux |grep ftp
ftp      11066  0.0  0.2  3620 2220 ?        SN   20:22   0:00 ftpd:
132.248.173.40: anonymous/mozilla@: RETR /ge/languages/babylon_
ftp      11212  0.0  0.2  3548 2144 ?        SN   20:28   0:00 ftpd:
marseille-4-a7-62-147-63-23.dial.proxad.net: anonymous/anonymou
ftp      11218  0.0  0.2  3624 2228 ?        SN   20:29   0:00 ftpd:
line-74-226.dial.freestart.hu: anonymous/mozilla@: RETR /mirror
ftp      11343  0.0  0.2  3616 2192 ?        SN   20:32   0:00 ftpd:
line-74-226.dial.freestart.hu: anonymous/mozilla@: IDLE
ftp      11344  0.0  0.2  3616 2192 ?        SN   20:32   0:00 ftpd:
line-74-226.dial.freestart.hu: anonymous/mozilla@: IDLE
ftp      11347  0.0  0.2  3548 2144 ?        SN   20:33   0:00 ftpd:
marseille-4-a7-62-147-63-23.dial.proxad.net: anonymous/anonymou
ftp      11348  0.0  0.2  3548 2144 ?        SN   20:33   0:00 ftpd:
marseille-4-a7-62-147-63-23.dial.proxad.net: anonymous/anonymou
ftp      11349  0.0  0.2  3548 2144 ?        SN   20:33   0:00 ftpd:
marseille-4-a7-62-147-63-23.dial.proxad.net: anonymous/anonymou
ftp      11350  0.0  0.2  3548 2144 ?        SN   20:33   0:00 ftpd:
marseille-4-a7-62-147-63-23.dial.proxad.net: anonymous/anonymou
[root@triton root]# rpm -q xinetd
xinetd-2.3.9-0.73
[root@triton root]# rpm -q wu-ftpd
wu-ftpd-2.6.2-5

Comment 1 Need Real Name 2002-11-01 09:19:23 UTC

I think that this is related.  In the logs I get:

Nov  1 00:41:55 shell2 xinetd[15914]: Deactivating service ftp due to excessive
incoming connections.  Restarting in 30 seconds.
Nov  1 00:42:25 shell2 xinetd[15914]: Activating service ftp
Nov  1 00:42:31 shell2 xinetd[15914]: file descriptor of service ftp has been closed
Nov  1 00:42:31 shell2 xinetd[15914]: select reported EBADF but no bad file
descriptors were found
Nov  1 00:46:57 shell2 xinetd[15914]: Service ftp: server exit with 0 running
servers

Comment 2 Need Real Name 2002-11-01 09:20:54 UTC

Sorry.  I should indicated that was with xinetd-2.3.9-0.73.

Comment 3 Bzizou 2002-11-01 09:25:13 UTC

No, it's not the same problem. I know this issue, and of course, I have not the
same messages into syslog. I didn't mentionned it, but, when ftpd responds no
more, I have no more logs. It's really a freeze, xinetd continues to open other
services, but no more ftpd, and that without any explanation. It's really
painfull, like a DOS, and for the moment, as a workaround, I installed proftpd
in standalone mode...

Comment 4 Need Real Name 2002-11-12 16:15:37 UTC

I have this same problem (not the Deactivating service ftp problem, which I
fixed) but on RH 7.2, xinetd-2.3.9-0.72, with the same [in.ftpd <defunct>] error
on three ftp load balance servers at my site.  I am restarting xinetd every 1/2
hour in cron right now so when the problem happens (which it still does), it
will be reset automatically.  Any way to get this priority raised?

Comment 5 Bzizou 2002-11-12 18:04:35 UTC

Yes, please, raise the priority. xinetd did the same thing, on another server,
with ipop3d... it's really critical! I also need to restart xinetd from cron.

Comment 6 Need Real Name 2002-11-17 07:41:19 UTC

I have the exact same problem, but in my case the problem is with the imapd. 
xinetd need to be restartet, before the imapd works again.

Comment 7 Bzizou 2002-11-17 08:56:30 UTC

It's been 18 days since I first reported the bug. It seems that xinetd is broken
and that servers all around the world have the same problem. It reminds me bug
No 75128, mysqld crashed all over the world and it took a long time for RedHat
to consider the problem... Is this because it's not security related that nobody
at RedHat cares about this xinetd bug??

Comment 8 Stephen Samuel 2002-11-19 00:30:07 UTC

It's not a general bug. It appears to be a DOS attack.

Somebody connects and creates a large number of partial connections. From what
I can see, it appears that the connection gets ripped down before xinetd actually
calls ftpd. it seems to take about 30 connections to cause the problem.

The attack consists of a storm of syn packets.  When the server responds with
a syn-ack, the attacker responds with an ack followed by an fin-ack without
necessarily waiting for a response (see below) In some cases, ftp doesn't even
get around
to transmitting a welcome banner.

the tcpdump from an isolated stream is included below (using delta time stamps)
I just noticed that the rst-ack doesn't arrive until 5 hours later.

(attachment: single1)

  1   0.000000 66.139.79.23 -> 66.51.123.178 TCP 59351 59351 > 21 [SYN]
Seq=533978424 Ack=0 Win=5840 Len=0
  2   0.000021 66.51.123.178 -> 66.139.79.23 TCP 21 21 > 59351 [SYN, ACK]
Seq=224836778 Ack=533978425 Win=5792 Len=0
  3   0.091872 66.139.79.23 -> 66.51.123.178 TCP 59351 59351 > 21 [ACK]
Seq=533978425 Ack=224836779 Win=5840 Len=0
  4   0.000867 66.139.79.23 -> 66.51.123.178 TCP 59351 59351 > 21 [FIN, ACK]
Seq=533978425 Ack=224836779 Win=5840 Len=0
  5   0.000676 66.51.123.178 -> 66.139.79.23 TCP 21 21 > 59351 [ACK]
Seq=224836779 Ack=533978426 Win=5792 Len=0
  6 17440.498890 66.51.123.178 -> 66.139.79.23 TCP 21 21 > 59351 [RST, ACK]
Seq=224836779 Ack=533978426 Win=5792 Len=0


Once the attack is successful xinetd starts behaving wierdly.

There will be a bunch of ftp servers in <defunct> state, and xinetd will
respond to an opening connection as follows:

It responds to a syn with a syn-ack, but it then it acts like it is recieving no
other packets -- only responding with syn-acks until the remote system gives up.
(attachment attack-failure)

The attackers appear to be spraying netblocks with their attacks.

the attachment 2 [details]-attack-sets includes a tcpdump from two sample attack sets.
Note that the attacks go to multiple addresses.

Comment 9 Stephen Samuel 2002-11-19 00:34:53 UTC

Created attachment 85524 [details]
A single attack  sequence  ereal/tcpdump  file. fast ack/ fin-ack timing

Comment 10 Stephen Samuel 2002-11-19 00:39:22 UTC

Created attachment 85525 [details]
After an attack has succeeded, xinetd will respond as in this tcpdump

Comment 11 Stephen Samuel 2002-11-19 00:41:39 UTC

Created attachment 85526 [details]
This contains 2 different attacks Note that each one attacks addresses on one subnet at a time.   -- ethereal/tcpdump file

Comment 12 Stephen Samuel 2002-11-19 08:01:04 UTC

It's looking like this attack is against the default settings of xinetd.d  

The compiled-in defaults appear to be 25 maximum instances and 27 connections
per source address(!).

The standard RPM setup increases the maximum instances to 60, which means that 3
attacking IPs can lock out a system.

It appears that xinet responds to the overage by using the syn-acks as a
keep-alive to keep the connections alive while waiting for an open connection slot.

I changed the numbers to 100 concurrent and 10 per-source. When I activated the
changes, it looks like xinetd cleaned up some  backlog. so this looks hopeful,
so far.

With this setup, a DOS attack is still possible, but it would require more
machines to successfully mount.

This should handle most situations, but could cause problems if you have
customers with large source-NATed networks.

Settings: (either in the defaults section of /etc/xinetd.conf or in 
/etc/xinetd.d/wu-ftpd)

        instances   = 100
        per_source  = 10

Someone else try this and see if it helps.

Comment 13 John Haxby 2002-11-19 09:23:21 UTC

I'm not sure if this is different to the attack, but when xinetd fails to start
an ftp daemon I see the following:

$ lsof -p 6933
...
xinetd  6933 root    2r   CHR        1,3           66678 /dev/null
xinetd  6933 root    3r  FIFO        0,5          173349 pipe
xinetd  6933 root    4w  FIFO        0,5          173349 pipe
xinetd  6933 root    5u  IPv4     173354             TCP *:ftp (LISTEN)
xinetd  6933 root    6u  unix 0xd9731a40          426588 socket
(END)

$ strace -p 3933
recv(6,

Note that xinetd is _only_listening on ftp.  I'm not sure what the unix domain
socket is, but that's where it's stuck.   I'll check on the state of any ftpd's
next time we hang up.

Comment 14 Stephen Samuel 2002-11-19 23:37:09 UTC

never mind.. The 'fix' came because my signals to reload xinetd's config caused
it to reap it's dead children (one child process reaped per signal sent).

Once all the children are reaped, the xinetd starts accept connections again.

I realized this using USR1 signals. No state dumpes were done until all of the
children were reaped... then all 5 (in this case) state dumps were done at once. 

File available for attachment: kill-log.txt

Comment 15 Brian Walker 2002-11-20 02:24:26 UTC

I am seeing this same thing happen to me on a private server.  I have wu-ftpd 
and ipop3d configured in xinetd.  I have a message server that connects to the 
pop3 daemon every 4 or 5 minutes and checks 4 or 5 accounts each time.  After 
it runs a little while I end up with one ipop3d in a defunct state and nobody 
can connect until I restart xinetd.  

This is a test server that is not open to be attacked yet so it should not be 
happening because of a DOS attack.  When the mail is checked by the message 
server there is a flurry of connections to the pop3 port. 

I do need this fixed pretty quickly.  Any ideas?

Comment 16 Bzizou 2002-11-20 08:27:58 UTC

I would like to try the fix (instances   = 100, per_source  = 10) on my ftp
server  but it is in production state and now works well with proftpd in
standalone mode, and I don't want to change something until I'm sure it will
work well.
Meanwhile, I activated those options on my pop/imap server and deactivated my
cron  hourly restart of xinetd, so I'll see if it works, and I'll tell you that...

Comment 17 Stephen Samuel 2002-11-20 08:48:34 UTC

I thought it was clear from my following message: the settings don't seem to be
the source of the problem.

Rather than randomly restarting xinetd, I think I've come up with a (rather
unusual) hack... Sending signals to xinetd when it has defunct processes.

#!/bin/sh
while sleep 97; do
        if ps -auxww | grep 'in.ftpd <defunct>' |grep -v grep ; then
                echo date `date` found defunct
                # allow for  races...
                sleep 2
                if XINETD=`cat /var/run/xinetd.pid` ; then
                        while ps -auxw | grep 'in.ftpd <defunct>' |grep -v grep ; do
                                echo killing
                                kill -ALRM $XINETD
                                sleep 3
                        done
                else
                        echo no xinetd service running????
                fi
        fi
done

Comment 18 Stephen Samuel 2002-11-20 23:51:39 UTC

Two notes: 
1) in the script above, the " grep -v" should be changed to "grep -qv" -- 
otherwise they'll produce spurious output on a match (OK for debugging purposes?)

2) I'm looking at the question of whether this is a kernel-related bug. I'm 
   seeing something vaguly similar occurring with Interchange. 
   For people having this problem I have two questions:
   A) are you running an SMP system?
   B) are you running a bigmam kernel?

The problems that we've had with this are on a dual processor system.  The 
problems with Interchange have only (and consistently) occurred on dual 
processor systems.  I'm including the query about bigmem simply
for completion (we're running on a 6GB system... Users running bigmem are 
reasonably rare).

   If you email to me directly ( samuel )  I'll summarize the
responses that I receive

Comment 19 tatooin 2002-11-21 16:50:45 UTC

Kernel: 2.4.9 / 2.4.18 RH
OS: RedHat 7.3 
xinetd version: xinetd-2.3.9-0.73

Hi, I've the same problem on several internal / external servers, regarding
every  services xinetd is managing.

After a few hours, xinetd stop to accept any new incoming connections. It then
then start to behaves exactly as if it was refusing the connection because of
TCP wrapper. All services are impacted.

The only way to fix this, is by restarting xinetd services, or to downgrade to
the version originally shipped with RedHat 7.3 (xinetd-2.3.4-0.8.i386.rpm)

Right now, we are currently downgrading all xinetd servers to fix this problem.

Regards,
Vincent Jaussaud

Comment 20 alex kramarov 2002-11-24 20:40:39 UTC

Well, at least this is not redhat-rpm-specific bug - i have got the same to 
reproduce with rpm comliled from the original source from xinetd.org and a cut 
down spec file from the redhat .src.rpm (removed the extra patches, and the 
configure line doesn't include lib_tcpwrappers)- xinetd related services die 
withing hours on my server, the only colution is to restart xinetd. However, 
(strange that noone mentioned it), with both rpms (the redhat's one and the 
one i compiled), the services remain accessible from localhost for some time 
after they stop responding from other systems. if i wait enough, then it also 
stops responding from localhost. the server is 7.32, with all errata applied. 

by the way, this doesn't happen if i install the 2.3.7 xinetd from the redhat 
8.0 cd - all is working well, except from the fact that that xinetd doesn't 
honor my "redirect" services - it's like the config files for them doesn't 
exist !

Comment 21 alex kramarov 2002-11-24 20:42:25 UTC

sorry, in the prevous comment there is a typo :

the server is 7.3

not

the server is 7.32

Comment 22 Stephen Samuel 2002-11-26 22:22:19 UTC

Given the responses that I've gotten from people, it would appear that
multi-processor has nothing to do with it. People with single processer
boxes have this problem too.

Whatever the bug is, going back to xinetd-2.3.7-2 (as released for redhat 8.0)
resolves the problem.

Interestingly enough, it appears that installing xinetd-2.3.9-0.73
(from redhat7.3) on redhat8.0 (on the one machine I've tried it) does
NOT seem to trigger the bug... (kernel kernel-2.4.18-14 )

In the mean time, I've managed to build a script to test the bug (attached
later).. seems to do the job.  What I've found is interesting:

I have been able to put together a properly working version of the
kill-defunct script. It turns out that kill -ALRM causes xinetd to
reap dead children, but doesn't get xinetd back to a responding state.
My script required at least one USR1 signal to 'fix' xinetd.  If I only
use SIGALRMs, it seems to leave xinet in a (different) locked-up state.

Once xinet recieves the last USR1, it appears to be fixed *permanently*
-- Despite repeated attempts, I'm unable to lock it up again until I
restart xinetd. My script could use USR1 for all signals, but when it
does that, xinetd will cue the dump requests and do multiple dumps once it
starts moving again.

If someone else is willing to try installing 2.3.9 on another 8.0 box
and see if my script can trigger a lockup there, it would be nice

(would someone from redhat be willing to get involved?)

.

Comment 23 Stephen Samuel 2002-11-26 22:36:31 UTC

Created attachment 86625 [details]
shell script which seems to trigger the bug on redhat 7.3 boxes.. requires setup in .netrc

Comment 24 Stephen Samuel 2002-11-26 22:40:56 UTC

Created attachment 86627 [details]
(tested) shell script which hunts and kills defunct children of xinetd .. then issues a kill -USR1 to restart it (dunno why it works yet)

Comment 25 Radu Greab 2002-11-27 03:57:34 UTC

Yesterday I had the same problem on a RedHat 7.3 box. xinetd was
configured to run telnet and pop3 services. Because the box was used
by the users I had time to do just a strace on xinetd and it was
blocked on the "recv(7, " system call, as found by
jch.

With the ftp test script presented above I managed to duplicate the
problem and produce an explanation: the unix domain socket from which
xinetd tries to receive the datagram and is stalled on is the
syslogd's socket, /dev/log. xinetd is stalling on /dev/log due to a
race condition.

After xinetd accepts a connection, it forks, closes the file
descriptor (but saves the fd in its internal data structures) and
proceeds to listen to new connections and watch the children. When a
child exits, the xinetd daemon tries to free the connection. To free
the connection it tries to receive a datagram on the file descriptor
of the connection and to close it again. Usually the try to receive a
datagram on the file descriptor fails because the descriptor is
already closed. The problem is caused by the fact that sometimes the
connection to syslogd is closed during the lifetime of a process
spawned by xinetd and the new unix socket receives the same fd as the
one of the child process. When xinetd tries to free the connection it
is calling recv() not on an already closed filedescriptor, but on
/dev/log and blocks there forever (or until it receives a sufficient
number of SIGUSR1 signals as proved by the "hunting" shell script
above).

The above is happening on version 2.3.9 due to a change between 2.3.7
and 2.3.9 and partially fixed in the latest development version. The
faulty code is in conn_free() from xinetd/connection.c. 2.3.7
called correctly drain() (the routine that calls recv()), only when
the connection was datagram oriented and xinetd failed to launch the
server for the service. The code in 2.3.9 is always calling drain(),
even if the connection is not datagram oriented. The last development
version fixes the problem for stream oriented connections, but may
leave it open for datagram oriented connections.

I'll attach a strace log for parties interested to analyze the
problem.

Comment 26 Radu Greab 2002-11-27 04:00:51 UTC

Created attachment 86644 [details]
lsof and strace output

Comment 27 Stephen Samuel 2002-12-03 20:09:47 UTC

GRR: after all of the work that we did on this, it turns out that RedHat was
working on bug # 76146. Dunno why they nobody ever bothered to mark this as a
duplicate.

Makes me wonder just how closely RedHat looks at these bugzilla reports.

Comment 28 Bill Nottingham 2002-12-03 20:24:05 UTC


*** This bug has been marked as a duplicate of 76146 ***

Note You need to log in before you can comment on or make changes to this bug.