On test systems, /etc/hosts.allow contains this line: all:all: rfc931 5 which causes an ident lookup to be performed for all inbound TCP connections that go through tcpd. This is a program called alarm.c, used with time to test if alarm works: int main () { printf("starting\n"); alarm(3); while (1) { /* noop */ } printf("stopping\n"); return(1); } This is what normally happens: $ /usr/bin/time ./alarm starting Command terminated by signal 14 3.01user 0.00system 0:03.00elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (81major+11minor)pagefaults 0swaps $ This is what happens when alarm() fails (program terminated with ^C after ~8 seconds): $ /usr/bin/time ./alarm starting Command terminated by signal 2 8.76user 0.00system 0:08.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (81major+11minor)pagefaults 0swaps $ This behavior happens iff the program which uses alarm is a child process of a process invoked via tcp_wrappers when rfc931 is in use, such as in.telnetd through inetd, or that uses libwrap, such as sshd, where the remote host (the client) ignores connections on port 113 (in my test case, the client is behind a Cisco 675 ADSL router doing NAT). This is a tcpdump of traffic generated when a user behind a 675 (63.228.194.87) tries to telnet in to a host (raven) using tcp_wrappers: 14:13:59.934475 > raven.2726 > 63.228.194.87.auth: S 489162589:489162589(0) win 32120 <mss 1460,sackOK,timestamp 111563722 0,nop,wscale 0> (DF) 14:14:02.929819 > raven.2726 > 63.228.194.87.auth: S 489162589:489162589(0) win 32120 <mss 1460,sackOK,timestamp 111564022 0,nop,wscale 0> (DF) Note that nothing is sent in response to the inbound packets; they are silently ignored. Upon recipt of packets trying to set up a connection, rather than refusing it, the inbound packets are dropped on the floor. According to an ltrace, this is the last time an alarm is set successfully: [pid 13442] alarm(5) = 0 [pid 13442] htons(0, 5, 14, 0x0804a640, 0xbffffb1c) = 0 [pid 13442] htons(113, 0, 5, 14, 0x0804a640) = 28928 [pid 13442] fileno(0x0804ebd0) = 3 [pid 13442] bind(3, 0xbfffb24c, 16, 113, 0) = 0 [pid 13442] fileno(0x0804ebd0) = 3 [pid 13442] connect(3, 0xbfffb25c, 16, 0xbffffb1c, 0xbffffb20 <unfinished ...> [pid 529] --- SIGCHLD (Child exited) --- [pid 529] wait3(0xbffffbc8, 1, 0, 19, 0) = 13433 [pid 529] snprintf("/usr/sbin/tcpd (pid 13433)", 64, "%s (pid %d)", "/usr/sbin/tcpd", 13433) = 26 [pid 529] wait3(0xbffffbc8, 1, 0, 19, 0) = -1 [pid 529] breakpointed at 0x400c817d (?) [pid 529] sigprocmask(0, 0x0804e2c0, 0, 0xbffffda8, 0x08049cb2) = 0 [pid 529] __errno_location() = 0x4010bd60 [pid 529] sigprocmask(2, 0x0804e340, 0, 0xbffffda8, 0x08049c93) = 0 [pid 529] select(19, 0xbffffd28, 0, 0, 0 <unfinished ...> [pid 13442] --- SIGALRM (Alarm clock) --- this activity is occuring when tcpd uses alarm to cause its ident attempt to time out. (inetd is pid 529) alarm works fine for users coming in via ssh or telnet from hosts that have identd running or from hosts that reject connections to the ident port. It only fails when hosts ignore the inbound connection attempt, causing the connect to time out. This behavior was first noticed on a RedHat 6.2 system with tcp_wrappers 7.6-10 when a daemon failed after being restarted remotely; it has also been observed on two RedHat 5.1-derived systems (upgraded with individual packages from later distributions and with custom packages) with tcp_wrappers 7.6-4.
To help replicate the bug when such beasts as a 675 are unavailable, the commands: ipchains -A input -j DENY -p tcp --destination-port 113 ipchains -A output -j DENY -p tcp --source-port 113 will cause a system to ignore inbound connections to and outbound connections from port 113 and can be used on a client prior to telnetting to a server to trigger the bug.
Looks like something is masking SIGALRM in the caller. Is this still the case in newer kernels ?
It's still occuring as of 2.2.19. I don't have the opportunity to upgrade any of my remaining Redhat machines to 2.4, so I can't test with anything newer. I don't want to touch the kernels on two of them, but I could probably schedule downtime on the other to upgrade elsewhere in the 2.2 series. It dosen't happen on my 2.4.x Debian boxes, but it also dosen't happen on Debian boxes running 2.2.x, so I can't chalk it up to the kernel version alone. I've not thought about this bug for nearly two years, but looking back at it, I'd start by comparing how tcp-wrappers and the kernel are built the two machines, and I would tend to suspect tcp_wrappers more than the kernel. The Debian tcp-wrappers package applies the patch http://http.us.debian.org/debian/pool/main/t/tcp-wrappers/tcp-wrappers_7.6-9.diff.gz and then says 'make linux' to build. I assume that you have details about how the Redhat tcp_wrappers is built. The machine is currently running tcp_wrappers-7.6-4. On one of the Redhat machines, the kernel was configured as per http://www.umnh.utah.edu/temp-files/raven-config-2.2.19 . On one of the Debian machines, the kernel was configured as per http://www.g6net.com/tmp/oxygen-config-2.2.20 . On the other, the kernel was configured as per http://www.umnh.utah.edu/temp-files/phoenix-config-2.2.19pre17 . I can dig up more configs, but it'll take time. If I get a chance, I'll try to hunt down the options that RedHat builds with and then try to tickle the bug on the Debian boxes by building with those config options.
Please verify this with a newer version of Red Hat Enterprise Linux or Fedora Core and reopen it against the new version if it still occurs. Closing as "not a bug" for now.