Description of problem: I am having intermittent problems with whois lookups hanging, and using large amounts of CPU time when they do. This seems to affect looking up against different servers, but doesn't happen all of the time - but when it is happening, it affects a high proportion of the lookups I do. The problem is not reproducible at will but happens to me on a fairly regular basis - perhaps every few weeks for a period of a few days. My machine makes several tens of whois lookups on IP addresses per day (I'm using Fail2ban with an action to complain to the ISPs of offending IPs - http://www.gloomytrousers.co.uk/open_source/fail2ban.shtml) so I'm guessing I may occasionally be falling foul of rate-limiting for whois lookups against particular servers - but I would not expect whois to hang and consume excessive CPU in this case. I'm guessing it may be a busy-waiting for a response, and this is only noticeable when a response does not come quickly. During periods when it's happening, I can reproduce it by running the same lookups manually, and I can also perform other, apparently unrelated, lookups, some of which also fail and some succeed - against domain names as well as IP addresses. Here's some results from top showing hung lookups from yesterday and today (the IP addresses are sources of spam, SSH brute force attacks, and the like; the last is a manual lookup): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21068 root 20 0 89860 904 676 R 72.1 0.0 164:20.51 whois 125.167.105.130 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 700 root 20 0 89860 1048 820 R 25.1 0.1 31:35.02 whois 211.214.161.93 26946 root 20 0 89860 1048 820 R 23.4 0.1 244:51.37 whois 122.167.13.13 30168 root 20 0 89860 1048 820 R 21.7 0.1 100:07.98 whois 222.208.183.218 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5268 root 20 0 89860 1040 796 R 80.6 0.1 1:50.47 whois gloomytrousers.co.uk I can perform the same lookups on another host (running CentOS4 and jwhois-3.2.2-6.EL4.1.i386) with no problems, and this host has never experienced the same problem of hung lookups despite running fail2ban in an identical config. Version-Release number of selected component (if applicable): [root@detritus ~]# rpm -q jwhois jwhois-4.0-4.fc8.x86_64 [root@detritus ~]# uname -a Linux detritus.local 2.6.25.11-60.fc8 #1 SMP Mon Jul 21 01:40:51 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Intermittent, periodically easy to reproduce, but long periods with no occurrence. Steps to Reproduce: 1. (unable to determine the conditions which cause this) Actual results: whois lookups hang using high CPU Expected results: Lookup should either succeed, or fail and exit, and not use excessive CPU while running/waiting for a response. Additional info: I'd like to collect some additional info to aid tracking this one down - anyone got any suggestions for how to collect anything which might help? I've tried setting "connect-timeout" in /etc/jwhois.conf but it made no difference.
OK, it appears to have started behaving again - the above lookups are working once again. By way of a test, I tried this... [root@detritus ~]# host whois.nic.uk whois.nic.uk has address 213.248.210.12 [root@detritus ~]# iptables -I OUTPUT --dst 213.248.210.12 -j DROP [root@detritus ~]# whois gloomytrousers.co.uk [Querying whois.nic.uk] ^C ...and although the lookup hung, as expected, CPU usage was 0.
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.
Have seen this since I upgraded to F10. Reopening.
Created attachment 341781 [details] make jwhois use select ... because it sits in a do/while loop calling read() on a non-blocking socket. This appears to fix it for me.
Created attachment 341783 [details] trivial spec patch to use listed patch Because I can and hopefully it makes it easier for whoever may need to apply it.
Created attachment 341787 [details] specfile patch ... but actually update the release! Sigh. :)
This message is a reminder that Fedora 10 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 10. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '10'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 10's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 10 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.
This has a patch to fix - could it be released still? Presumably this is applicable to F11/12 too. I did experience the bug last night (still on F10, haven't upgraded yet).
Just verified that this still exists on f-12: write(3, "austnet.au\r\n", 12) = 12 read(3, 0x7fff4dd9e470, 1023) = -1 EAGAIN (Resource temporarily unavailable) read(3, 0x7fff4dd9e470, 1023) = -1 EAGAIN (Resource temporarily unavailable) ... hundreds/thousands more as cpu spikes and jwhois loops ... read(3, 0x7fff4dd9e470, 1023) = -1 EAGAIN (Resource temporarily unavailable) read(3, "No Data Found\r\n", 1023) = 15 read(3, "", 1023) = 0 write(1, "[whois.aunic.net]\nNo Data Found\r"..., 33[whois.aunic.net] No Data Found ) = 33 exit_group(0) = ?
FWIW, I'm seeing this too - but only when IPV6 is involved. From strace whois 74.220.121.126: <normal stuff I suppose> connect(3, {sa_family=AF_INET, sin_port=htons(43), sin_addr=inet_addr("199.212.0.43")}, 16) = 0 getsockname(3, {sa_family=AF_INET6, sin6_port=htons(40891), inet_pton(AF_INET6, "::ffff:68.192.13.200", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 close(3) = 0 socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP) = 3 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(43), inet_pton(AF_INET6, "2001:500:4:1::81", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now in progress) select(1024, NULL, [3], NULL, {75, 0}) = 1 (out [3], left {74, 979698}) getsockopt(3, SOL_SOCKET, SO_ERROR, [-4985675827644465152], [4]) = 0 write(3, "74.220.121.126\r\n", 16) = 16 read(3, 0x7fffbacf4d30, 1023) = -1 EAGAIN (Resource temporarily unavailable) read(3, 0x7fffbacf4d30, 1023) = -1 EAGAIN (Resource temporarily unavailable) <continues forever>
jwhois-4.0-19.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/jwhois-4.0-19.fc12
jwhois-4.0-19.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update jwhois'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-1172
Fedora 11 has this bug in its latest jwhois package. I believe the best fix is simply to turn off non-blocking mode after connecting. I suggest the following patch. (I will try to contact the jwhois maintainers too.) --- jwhois-4.0/src/utils_old.c 2010-01-29 16:00:21.261869369 -0800 +++ jwhois-4.0/src/utils.c 2010-01-29 16:00:43.007869124 -0800 @@ -298,6 +298,11 @@ make_connect(const char *host, int port) break; } #endif + flags = fcntl(sockfd, F_GETFL, 0); + if (fcntl(sockfd, F_SETFL, flags&~O_NONBLOCK) == -1) + { + return -1; + } return sockfd; }
(In reply to comment #15) > Fedora 11 has this bug in its latest jwhois package. I believe the best fix is > simply to turn off non-blocking mode after connecting. I suggest the following > patch. (I will try to contact the jwhois maintainers too.) > > --- jwhois-4.0/src/utils_old.c 2010-01-29 16:00:21.261869369 -0800 > +++ jwhois-4.0/src/utils.c 2010-01-29 16:00:43.007869124 -0800 > @@ -298,6 +298,11 @@ make_connect(const char *host, int port) > break; > } > #endif > + flags = fcntl(sockfd, F_GETFL, 0); > + if (fcntl(sockfd, F_SETFL, flags&~O_NONBLOCK) == -1) > + { > + return -1; > + } > > return sockfd; > } Did you try jwhois-4.0-14.fc11 from testing repository? It should be fixed there...
jwhois-4.0-19.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report.
Created attachment 667375 [details] strace whois google.com
Comment on attachment 667375 [details] strace whois google.com I'm having the very same issue on CentOS 6.3 with jwhois 4.0-19.el6, attached strace output.