hey, it seems as though there is a race condition in xinetd. my theory follows: i noticed that after a seemingly random amount of tftp requests, the tftp-server (via xinetd) would no longer respond to requests. so, i wrote a script that would repeatedly retrieve the same file via tftp from the xinetd/in.tftpd running on the local machine. after a random period of time, the script can no longer retrieve files (not a script problem, i can't do it manually either) while running xinetd with the -d option, i see the following at fail-time: 1. 00/10/18@17:38:52: DEBUG: {main_loop} active_services = 6 2. 00/10/18@17:38:53: DEBUG: {main_loop} select returned 1 3. 00/10/18@17:38:53: DEBUG: {svc_suspend} Suspended service tftp 4. 00/10/18@17:38:53: DEBUG: {main_loop} active_services = 5 5. 00/10/18@17:38:53: DEBUG: {exec_server} duping 7 6. 00/10/18@17:38:53: DEBUG: {child_exit} waitpid returned = 27316 7. 00/10/18@17:38:53: DEBUG: {server_end} tftp server 27316 exited 8. 00/10/18@17:38:53: DEBUG: {svc_resume} Resumed service tftp 9. 00/10/18@17:38:53: DEBUG: {child_exit} waitpid returned = -1 10. 00/10/18@17:38:53: DEBUG: {main_loop} active_services = 6 11. 00/10/18@17:38:53: DEBUG: {main_loop} select returned 1 12. 00/10/18@17:38:53: DEBUG: {svc_suspend} Suspended service tftp 13. 00/10/18@17:38:53: DEBUG: {main_loop} active_services = 5 14. 00/10/18@17:38:53: DEBUG: {exec_server} duping 7 15. 00/10/18@17:38:53: DEBUG: {child_exit} waitpid returned = 27320 16. 00/10/18@17:38:53: DEBUG: {server_end} tftp server 27320 exited 17. 00/10/18@17:38:53: DEBUG: {svc_resume} Resumed service tftp 18. 00/10/18@17:38:53: DEBUG: {child_exit} waitpid returned = -1 19. 00/10/18@17:38:53: DEBUG: {main_loop} active_services = 6 20. 00/10/18@17:38:53: DEBUG: {main_loop} select returned 1 21. 00/10/18@17:38:53: DEBUG: {svc_suspend} Suspended service tftp 22. 00/10/18@17:38:53: DEBUG: {exec_server} duping 7 23. 00/10/18@17:38:53: DEBUG: {main_loop} active_services = 5 there is always a new (see pid) zombie in.tftpd process at this point nobody 27324 0.0 0.0 0 0 ? Z 17:51 0:00 [in.tftpd <defunc now, if i finger a user, the zombie will disappear and the tftp transfers will continue - sometimes a have to run finger a few times before i get that effect. (this likely has the side-effect of also wait()ing for outstanding pids) i.e., xinetd doesn't seem to always wait() for a process to finish before it sleeps. so, when the next tftp request comes in, xinetd looks and sees that there is already an in.tftpd process running, and it will deny services (wait = yes in the tftp configuration for xinetd). as you can see in line 21, tftp service is placed in a suspend state, but it is never returned to a svc_resume state. a possible solution (if this is indeed the problem) would be to have xinetd waitpid() for outstanding pids when a new request comes in. or maybe just for the requests of the same kind as the service being requested, who are configured to wait (i.e. only 1 process can exist at a time) the problem i have with this theory, is that i had seemingly the same problem with inetd in redhat6.2. however, i didn't have the level of debugging information that xinetd gives me. it is unlikely that xinetd and inetd both have the same bug, unless of course xinetd was developed with the inetd codebase. <machine info> [root@zumpano xinetd.d]# uname -a Linux zumpano.fc.hp.com 2.2.16-22enterprise #1 SMP Tue Aug 22 16:29:32 EDT 2000 i686 unknown [root@zumpano xinetd.d]# rpm -q xinetd xinetd-2.1.8.9pre9-6 [root@zumpano xinetd.d]# rpm -q tftp-server tftp-server-0.17-5 [root@zumpano xinetd.d]# rpm -q tftp tftp-0.17-5 [root@zumpano xinetd.d]# cat /etc/xinetd.d/tftp # default: off # description: The tftp server serves files using the trivial file transfer \ # protocol. The tftp protocol is often used to boot diskless \ # workstations, download configuration files to network-aware printers, \ # and to start the installation process for some operating systems. # server_args = -s -r blksize /tftpboot service tftp { socket_type = dgram wait = yes user = nobody log_on_success += USERID log_on_failure += USERID server = /usr/sbin/in.tftpd server_args = /tftpboot disable = no } </machine_info> i am running an SMP kernel, but there is only 1 processor in the machine. the version of inetd i was running was the latest available for 6.2 as of 2 weeks ago... probably still the latest one. -dann
The program author is looking at it.
This should be fixed in 2.1.8.9pre13-1, which should show up in Rawhide soonish.