Bug 469412

Summary: whois lookups hanging with high CPU usage
Product: [Fedora] Fedora Reporter: Russell Odom <russ+bugzilla-redhat>
Component: jwhoisAssignee: Vitezslav Crhonek <vcrhonek>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 12CC: davids, mbreuer, roysjosh, vcrhonek
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 4.0-19.fc12 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1174469 (view as bug list) Environment:
Last Closed: 2010-02-11 14:46:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1174469    
Attachments:
Description Flags
make jwhois use select
none
trivial spec patch to use listed patch
none
specfile patch ... but actually update the release!
none
strace whois google.com none

Description Russell Odom 2008-10-31 18:15:10 UTC
Description of problem:
I am having intermittent problems with whois lookups hanging, and using large amounts of CPU time when they do. This seems to affect looking up against different servers, but doesn't happen all of the time - but when it is happening, it affects a high proportion of the lookups I do. The problem is not reproducible at will but happens to me on a fairly regular basis - perhaps every few weeks for a period of a few days.

My machine makes several tens of whois lookups on IP addresses per day (I'm using Fail2ban with an action to complain to the ISPs of offending IPs - http://www.gloomytrousers.co.uk/open_source/fail2ban.shtml) so I'm guessing I may occasionally be falling foul of rate-limiting for whois lookups against particular servers - but I would not expect whois to hang and consume excessive CPU in this case. I'm guessing it may be a busy-waiting for a response, and this is only noticeable when a response does not come quickly.

During periods when it's happening, I can reproduce it by running the same lookups manually, and I can also perform other, apparently unrelated, lookups, some of which also fail and some succeed - against domain names as well as IP addresses.

Here's some results from top showing hung lookups from yesterday and today (the IP addresses are sources of spam, SSH brute force attacks, and the like; the last is a manual lookup):
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                 
21068 root      20   0 89860  904  676 R 72.1  0.0 164:20.51 whois 125.167.105.130

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                 
  700 root      20   0 89860 1048  820 R 25.1  0.1  31:35.02 whois 211.214.161.93
26946 root      20   0 89860 1048  820 R 23.4  0.1 244:51.37 whois 122.167.13.13
30168 root      20   0 89860 1048  820 R 21.7  0.1 100:07.98 whois 222.208.183.218

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                           
 5268 root      20   0 89860 1040  796 R 80.6  0.1   1:50.47 whois gloomytrousers.co.uk
 
I can perform the same lookups on another host (running CentOS4 and jwhois-3.2.2-6.EL4.1.i386) with no problems, and this host has never experienced the same problem of hung lookups despite running fail2ban in an identical config.

Version-Release number of selected component (if applicable):
[root@detritus ~]# rpm -q jwhois
jwhois-4.0-4.fc8.x86_64
[root@detritus ~]# uname -a
Linux detritus.local 2.6.25.11-60.fc8 #1 SMP Mon Jul 21 01:40:51 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
Intermittent, periodically easy to reproduce, but long periods with no occurrence.

Steps to Reproduce:
1. (unable to determine the conditions which cause this)
  
Actual results:
whois lookups hang using high CPU

Expected results:
Lookup should either succeed, or fail and exit, and not use excessive CPU while running/waiting for a response.

Additional info:
I'd like to collect some additional info to aid tracking this one down - anyone got any suggestions for how to collect anything which might help?

I've tried setting "connect-timeout" in /etc/jwhois.conf but it made no difference.

Comment 1 Russell Odom 2008-11-01 09:47:06 UTC
OK, it appears to have started behaving again - the above lookups are working once again.

By way of a test, I tried this...

[root@detritus ~]# host whois.nic.uk
whois.nic.uk has address 213.248.210.12
[root@detritus ~]# iptables -I OUTPUT --dst 213.248.210.12 -j DROP
[root@detritus ~]# whois gloomytrousers.co.uk
[Querying whois.nic.uk]
^C

...and although the lookup hung, as expected, CPU usage was 0.

Comment 2 Bug Zapper 2008-11-26 11:16:25 UTC
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Bug Zapper 2009-01-09 07:53:54 UTC
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 4 Russell Odom 2009-01-09 08:34:11 UTC
Have seen this since I upgraded to F10. Reopening.

Comment 5 Joshua Roys 2009-04-29 15:57:56 UTC
Created attachment 341781 [details]
make jwhois use select

... because it sits in a do/while loop calling read() on a non-blocking socket.

This appears to fix it for me.

Comment 6 Joshua Roys 2009-04-29 15:59:24 UTC
Created attachment 341783 [details]
trivial spec patch to use listed patch

Because I can and hopefully it makes it easier for whoever may need to apply it.

Comment 7 Joshua Roys 2009-04-29 16:22:39 UTC
Created attachment 341787 [details]
specfile patch ... but actually update the release!

Sigh. :)

Comment 8 Bug Zapper 2009-11-18 08:43:15 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 9 Bug Zapper 2009-12-18 06:42:55 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 10 Russell Odom 2009-12-18 10:52:20 UTC
This has a patch to fix - could it be released still? Presumably this is applicable to F11/12 too.

I did experience the bug last night (still on F10, haven't upgraded yet).

Comment 11 Joshua Roys 2009-12-18 13:36:15 UTC
Just verified that this still exists on f-12:

write(3, "austnet.au\r\n", 12)          = 12
read(3, 0x7fff4dd9e470, 1023)           = -1 EAGAIN (Resource temporarily unavailable)
read(3, 0x7fff4dd9e470, 1023)           = -1 EAGAIN (Resource temporarily unavailable)

... hundreds/thousands more as cpu spikes and jwhois loops ...

read(3, 0x7fff4dd9e470, 1023)           = -1 EAGAIN (Resource temporarily unavailable)
read(3, "No Data Found\r\n", 1023)      = 15
read(3, "", 1023)                       = 0
write(1, "[whois.aunic.net]\nNo Data Found\r"..., 33[whois.aunic.net]
No Data Found
) = 33
exit_group(0)                           = ?

Comment 12 Michael Breuer 2010-01-08 07:32:52 UTC
FWIW, I'm seeing this too - but only when IPV6 is involved. 
From strace whois 74.220.121.126:
<normal stuff I suppose>
connect(3, {sa_family=AF_INET, sin_port=htons(43), sin_addr=inet_addr("199.212.0.43")}, 16) = 0
getsockname(3, {sa_family=AF_INET6, sin6_port=htons(40891), inet_pton(AF_INET6, "::ffff:68.192.13.200", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
close(3)                                = 0
socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
connect(3, {sa_family=AF_INET6, sin6_port=htons(43), inet_pton(AF_INET6, "2001:500:4:1::81", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now in progress)
select(1024, NULL, [3], NULL, {75, 0})  = 1 (out [3], left {74, 979698})
getsockopt(3, SOL_SOCKET, SO_ERROR, [-4985675827644465152], [4]) = 0
write(3, "74.220.121.126\r\n", 16)      = 16
read(3, 0x7fffbacf4d30, 1023)           = -1 EAGAIN (Resource temporarily unavailable)
read(3, 0x7fffbacf4d30, 1023)           = -1 EAGAIN (Resource temporarily unavailable)
<continues forever>

Comment 13 Fedora Update System 2010-01-26 15:03:56 UTC
jwhois-4.0-19.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/jwhois-4.0-19.fc12

Comment 14 Fedora Update System 2010-01-28 01:03:40 UTC
jwhois-4.0-19.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update jwhois'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-1172

Comment 15 David J. Schwartz 2010-01-30 00:08:46 UTC
Fedora 11 has this bug in its latest jwhois package. I believe the best fix is simply to turn off non-blocking mode after connecting. I suggest the following patch. (I will try to contact the jwhois maintainers too.)

--- jwhois-4.0/src/utils_old.c  2010-01-29 16:00:21.261869369 -0800
+++ jwhois-4.0/src/utils.c      2010-01-29 16:00:43.007869124 -0800
@@ -298,6 +298,11 @@ make_connect(const char *host, int port)
       break;
     }
 #endif
+  flags = fcntl(sockfd, F_GETFL, 0);
+  if (fcntl(sockfd, F_SETFL, flags&~O_NONBLOCK) == -1)
+    {
+      return -1;
+    }
 
   return sockfd;
 }

Comment 16 Vitezslav Crhonek 2010-02-01 11:55:53 UTC
(In reply to comment #15)
> Fedora 11 has this bug in its latest jwhois package. I believe the best fix is
> simply to turn off non-blocking mode after connecting. I suggest the following
> patch. (I will try to contact the jwhois maintainers too.)
> 
> --- jwhois-4.0/src/utils_old.c  2010-01-29 16:00:21.261869369 -0800
> +++ jwhois-4.0/src/utils.c      2010-01-29 16:00:43.007869124 -0800
> @@ -298,6 +298,11 @@ make_connect(const char *host, int port)
>        break;
>      }
>  #endif
> +  flags = fcntl(sockfd, F_GETFL, 0);
> +  if (fcntl(sockfd, F_SETFL, flags&~O_NONBLOCK) == -1)
> +    {
> +      return -1;
> +    }
> 
>    return sockfd;
>  }    

Did you try jwhois-4.0-14.fc11 from testing repository? It should be fixed there...

Comment 17 Fedora Update System 2010-02-11 14:46:15 UTC
jwhois-4.0-19.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 18 Kai 2012-12-21 19:15:08 UTC
Created attachment 667375 [details]
strace whois google.com

Comment 19 Kai 2012-12-21 19:18:39 UTC
Comment on attachment 667375 [details]
strace whois google.com

I'm having the very same issue on CentOS 6.3 with jwhois 4.0-19.el6, attached strace output.