Bug 1174469

Summary: whois lookups hanging with high CPU usage
Product: Red Hat Enterprise Linux 6 Reporter: Orion Poplawski <orion>
Component: jwhoisAssignee: Vitezslav Crhonek <vcrhonek>
Status: CLOSED WONTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.6CC: dancy, davids, extras-qa, isenfeld, mbreuer, redhat-bugzilla, robert.scheck, roysjosh, russ+bugzilla-redhat, suren, vcrhonek
Target Milestone: rcKeywords: Reopened, Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 469412 Environment:
Last Closed: 2017-12-06 10:35:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 469412    
Bug Blocks:    

Description Orion Poplawski 2014-12-15 21:56:08 UTC
+++ This bug was initially created as a clone of Bug #469412 +++

Description of problem:
I am having intermittent problems with whois lookups hanging, and using large amounts of CPU time when they do. This seems to affect looking up against different servers, but doesn't happen all of the time - but when it is happening, it affects a high proportion of the lookups I do. The problem is not reproducible at will but happens to me on a fairly regular basis - perhaps every few weeks for a period of a few days.

My machine makes several tens of whois lookups on IP addresses per day (I'm using Fail2ban with an action to complain to the ISPs of offending IPs - http://www.gloomytrousers.co.uk/open_source/fail2ban.shtml) so I'm guessing I may occasionally be falling foul of rate-limiting for whois lookups against particular servers - but I would not expect whois to hang and consume excessive CPU in this case. I'm guessing it may be a busy-waiting for a response, and this is only noticeable when a response does not come quickly.

During periods when it's happening, I can reproduce it by running the same lookups manually, and I can also perform other, apparently unrelated, lookups, some of which also fail and some succeed - against domain names as well as IP addresses.

Here's some results from top showing hung lookups from yesterday and today (the IP addresses are sources of spam, SSH brute force attacks, and the like; the last is a manual lookup):
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                 
21068 root      20   0 89860  904  676 R 72.1  0.0 164:20.51 whois 125.167.105.130

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                 
  700 root      20   0 89860 1048  820 R 25.1  0.1  31:35.02 whois 211.214.161.93
26946 root      20   0 89860 1048  820 R 23.4  0.1 244:51.37 whois 122.167.13.13
30168 root      20   0 89860 1048  820 R 21.7  0.1 100:07.98 whois 222.208.183.218

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                           
 5268 root      20   0 89860 1040  796 R 80.6  0.1   1:50.47 whois gloomytrousers.co.uk
 
I can perform the same lookups on another host (running CentOS4 and jwhois-3.2.2-6.EL4.1.i386) with no problems, and this host has never experienced the same problem of hung lookups despite running fail2ban in an identical config.

Version-Release number of selected component (if applicable):
[root@detritus ~]# rpm -q jwhois
jwhois-4.0-4.fc8.x86_64
[root@detritus ~]# uname -a
Linux detritus.local 2.6.25.11-60.fc8 #1 SMP Mon Jul 21 01:40:51 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
Intermittent, periodically easy to reproduce, but long periods with no occurrence.

Steps to Reproduce:
1. (unable to determine the conditions which cause this)
  
Actual results:
whois lookups hang using high CPU

Expected results:
Lookup should either succeed, or fail and exit, and not use excessive CPU while running/waiting for a response.

Additional info:
I'd like to collect some additional info to aid tracking this one down - anyone got any suggestions for how to collect anything which might help?

I've tried setting "connect-timeout" in /etc/jwhois.conf but it made no difference.

--- Additional comment from Russell Odom on 2008-11-01 05:47:06 EDT ---

OK, it appears to have started behaving again - the above lookups are working once again.

By way of a test, I tried this...

[root@detritus ~]# host whois.nic.uk
whois.nic.uk has address 213.248.210.12
[root@detritus ~]# iptables -I OUTPUT --dst 213.248.210.12 -j DROP
[root@detritus ~]# whois gloomytrousers.co.uk
[Querying whois.nic.uk]
^C

...and although the lookup hung, as expected, CPU usage was 0.

--- Additional comment from Joshua Roys on 2009-04-29 11:57:56 EDT ---

... because it sits in a do/while loop calling read() on a non-blocking socket.

This appears to fix it for me.

--- Additional comment from Joshua Roys on 2009-04-29 11:59:24 EDT ---

Because I can and hopefully it makes it easier for whoever may need to apply it.

--- Additional comment from Michael Breuer on 2010-01-08 02:32:52 EST ---

FWIW, I'm seeing this too - but only when IPV6 is involved. 
From strace whois 74.220.121.126:
<normal stuff I suppose>
connect(3, {sa_family=AF_INET, sin_port=htons(43), sin_addr=inet_addr("199.212.0.43")}, 16) = 0
getsockname(3, {sa_family=AF_INET6, sin6_port=htons(40891), inet_pton(AF_INET6, "::ffff:68.192.13.200", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
close(3)                                = 0
socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
connect(3, {sa_family=AF_INET6, sin6_port=htons(43), inet_pton(AF_INET6, "2001:500:4:1::81", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now in progress)
select(1024, NULL, [3], NULL, {75, 0})  = 1 (out [3], left {74, 979698})
getsockopt(3, SOL_SOCKET, SO_ERROR, [-4985675827644465152], [4]) = 0
write(3, "74.220.121.126\r\n", 16)      = 16
read(3, 0x7fffbacf4d30, 1023)           = -1 EAGAIN (Resource temporarily unavailable)
read(3, 0x7fffbacf4d30, 1023)           = -1 EAGAIN (Resource temporarily unavailable)
<continues forever>

--- Additional comment from Fedora Update System on 2010-01-26 10:03:56 EST ---

jwhois-4.0-19.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/jwhois-4.0-19.fc12

--- Additional comment from Fedora Update System on 2010-01-27 20:03:40 EST ---

jwhois-4.0-19.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update jwhois'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2010-1172

--- Additional comment from David J. Schwartz on 2010-01-29 19:08:46 EST ---

Fedora 11 has this bug in its latest jwhois package. I believe the best fix is simply to turn off non-blocking mode after connecting. I suggest the following patch. (I will try to contact the jwhois maintainers too.)

--- jwhois-4.0/src/utils_old.c  2010-01-29 16:00:21.261869369 -0800
+++ jwhois-4.0/src/utils.c      2010-01-29 16:00:43.007869124 -0800
@@ -298,6 +298,11 @@ make_connect(const char *host, int port)
       break;
     }
 #endif
+  flags = fcntl(sockfd, F_GETFL, 0);
+  if (fcntl(sockfd, F_SETFL, flags&~O_NONBLOCK) == -1)
+    {
+      return -1;
+    }
 
   return sockfd;
 }

--- Additional comment from Vitezslav Crhonek on 2010-02-01 06:55:53 EST ---

(In reply to comment #15)
> Fedora 11 has this bug in its latest jwhois package. I believe the best fix is
> simply to turn off non-blocking mode after connecting. I suggest the following
> patch. (I will try to contact the jwhois maintainers too.)
> 
> --- jwhois-4.0/src/utils_old.c  2010-01-29 16:00:21.261869369 -0800
> +++ jwhois-4.0/src/utils.c      2010-01-29 16:00:43.007869124 -0800
> @@ -298,6 +298,11 @@ make_connect(const char *host, int port)
>        break;
>      }
>  #endif
> +  flags = fcntl(sockfd, F_GETFL, 0);
> +  if (fcntl(sockfd, F_SETFL, flags&~O_NONBLOCK) == -1)
> +    {
> +      return -1;
> +    }
> 
>    return sockfd;
>  }    

Did you try jwhois-4.0-14.fc11 from testing repository? It should be fixed there...

--- Additional comment from Fedora Update System on 2010-02-11 09:46:15 EST ---

jwhois-4.0-19.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

--- Additional comment from Kai on 2012-12-21 14:18:39 EST ---

I'm having the very same issue on CentOS 6.3 with jwhois 4.0-19.el6, attached strace output.

Comment 1 Orion Poplawski 2014-12-15 21:57:40 UTC
Looks like jwhois-4.0-19.el6.x86_64 is missing this fix, and possibly others.  I'm seeing it effectively shutdown fail2ban while it waits for whois to complete.

Comment 3 Vitezslav Crhonek 2015-01-19 15:09:26 UTC
(In reply to Orion Poplawski from comment #1)
> Looks like jwhois-4.0-19.el6.x86_64 is missing this fix, and possibly
> others.  I'm seeing it effectively shutdown fail2ban while it waits for
> whois to complete.

Yes, it's missing this patch (named 'jwhois-4.0-select.patch' in newer releases) - this one should suffice to fix the issue.

Comment 5 Jan Kurik 2017-12-06 10:35:45 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/