Bug 1085971

Summary: Arping utility hangs if SIGALRM blocked
Product: [Fedora] Fedora Reporter: Rui Prior <rprior>
Component: iputilsAssignee: Jan Synacek <jsynacek>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 20CC: averma00, jsynacek, rprior
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: iputils-20121221-6.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 144882 Environment:
Last Closed: 2014-04-23 04:31:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Test program none

Description Rui Prior 2014-04-09 18:49:28 UTC
Created attachment 884619 [details]
Test program

+++ This bug was initially created as a clone of Bug #144882 +++

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Texas 
Instruments; .NET CLR 1.1.4322)

Description of problem:
If "arping" utility is forked (using system() library function) from 
a parent process in which SIGALRM signal is blocked then the system() 
function call never returns.

The problem is that arping utility installs a SIGALRM handler but 
does not bother to unblock it - so its logic does not work if it 
inherited blocked SIGALRM.

Because of this, under certain circumstances it is impossible to bring up an interface configured by dhcp with ifup (since the /sbin/dhclient-script invokes arping for duplicate address detection).

Version-Release number of selected component (if applicable):
iputils-20020927-11 (original)
iputils-20121221-5 (mine)

How reproducible:
Always

Steps to Reproduce:
1.Block SIGALRM signal
2.Call srping utility ising system() library function
3.Observe that system() function never returns
    

Actual Results:  The process hangs forever

Expected Results:  The process should have exited

Additional info:

The original bug was closed by Atul Verma on 2005-05-12 ("WORKSFORME") because the original poster of the bug forgot one line in the testing program he submitted as attachment.  Please test with my (corrected) testing program.

I have fixed the bug myself with the following simple patch:

--- iputils-s20121221-orig/arping.c	2013-02-01 08:28:29.836191171 +0100
+++ iputils-s20121221-new/arping.c	2013-02-01 08:28:11.013152725 +0100
@@ -1215,16 +1215,18 @@
 		socklen_t alen = sizeof(from);
 		int cc;
 
+		sigemptyset(&sset);
+		sigaddset(&sset, SIGALRM);
+		sigaddset(&sset, SIGINT);
+		sigprocmask(SIG_UNBLOCK, &sset, &osset);
+
 		if ((cc = recvfrom(s, packet, sizeof(packet), 0,
 				   (struct sockaddr *)&from, &alen)) < 0) {
 			perror("arping: recvfrom");
 			continue;
 		}
 
-		sigemptyset(&sset);
-		sigaddset(&sset, SIGALRM);
-		sigaddset(&sset, SIGINT);
-		sigprocmask(SIG_BLOCK, &sset, &osset);
+		sigprocmask(SIG_BLOCK, &sset, NULL);
 		recv_pack(packet, cc, (struct sockaddr_ll *)&from);
 		sigprocmask(SIG_SETMASK, &osset, NULL);
 	}

Comment 1 Jan Synacek 2014-04-10 08:01:43 UTC
I wonder what "certain circumstances" might be? Something doesn't feel right about this bug. Yes, it does block the process, but the example looks pretty artificial. It also feels something like "hey, I passed an invalid pointer to free() and it segfaulted" kind of problem.

Care to elaborate?

Comment 2 Rui Prior 2014-04-10 12:37:24 UTC
I found this problem on a lab (used for a Networking Lab course) with PCs running fedora, where ifup was hanging on about 1/3 of them because of dhclient, so the problem definitely occurs in practice and is not some "I passed an invalid pointer to free() and it segfaulted" kind of problem.

After closer inspection, I found out that this was due to /sbin/dhclient-script invoking arping for duplicate address detection and arping not exiting (as it is supposed to do after a couple of seconds).  Running arping on the command line yielded similar results (it hangs on about 1/3 of the PCs in the lab).

Using strace I found out that arping was calling alarm(1) before calling recvfrom(), in order to interrupt the latter after 1s in case nothing is received (the normal case), but it had no effect -- recvfrom() was blocking ad eternum.

I searched bugzilla for a similar report and found out the bug I ended up cloning.  Admittedly, the example is quite artificial, but it does illustrate, in a 100% reproducible way, a problem that occurs in practice, as explained above.

I am still unsure of what is causing the system to block SIGALRM (especially since it does not affect all PCs, only about 1/3, all of them running a similar installation of Fedora 20), so further investigation will be required.  Anyway, I hacked a temporary fix by commenting out the duplicate address detection part in /sbin/dhclient-script, but the real fix needs to be done in arping.c.

Comment 3 Rui Prior 2014-04-10 12:43:49 UTC
By the way, as you can see the patch is both trivial and harmless.  It also does solve a problem.  I could have contacted directly the author of arping or the maintainer of iputils, but figured that posting the bug report here along with the patch would be the fastest way to get it fixed in the Linux distribution I am using, and it would end up getting to them anyway.

Comment 4 Jan Synacek 2014-04-10 13:51:45 UTC
Thank you for the explanation. Would you mind changing your patch and putting a comment above the unblocking part of it explaining why it is unblocking stuff first? I will apply the patch, but I think it needs to be explained, since it's pretty surprising if you don't know the background. I'll also mention the bugzilla in the header.

Comment 5 Rui Prior 2014-04-10 15:04:56 UTC
Thank you very much for your interest, Jan.  Here is the patch with the explanation:

--- iputils-s20121221-orig/arping.c     2014-04-10 15:41:29.158243387 +0100
+++ iputils-s20121221-new/arping.c      2014-04-10 16:02:06.000000000 +0100
@@ -1215,16 +1215,22 @@
                socklen_t alen = sizeof(from);
                int cc;

+               sigemptyset(&sset);
+               sigaddset(&sset, SIGALRM);
+               sigaddset(&sset, SIGINT);
+               /* Unblock SIGALRM so that the previously called alarm()
+                * can prevent recvfrom from blocking forever in case the
+                * inherited procmask is blocking SIGALRM and no packet
+                * is received. */
+               sigprocmask(SIG_UNBLOCK, &sset, &osset);
+
                if ((cc = recvfrom(s, packet, sizeof(packet), 0,
                                   (struct sockaddr *)&from, &alen)) < 0) {
                        perror("arping: recvfrom");
                        continue;
                }

-               sigemptyset(&sset);
-               sigaddset(&sset, SIGALRM);
-               sigaddset(&sset, SIGINT);
-               sigprocmask(SIG_BLOCK, &sset, &osset);
+               sigprocmask(SIG_BLOCK, &sset, NULL);
                recv_pack(packet, cc, (struct sockaddr_ll *)&from);
                sigprocmask(SIG_SETMASK, &osset, NULL);
        }

Comment 7 Fedora Update System 2014-04-11 08:00:41 UTC
iputils-20121221-6.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/iputils-20121221-6.fc20

Comment 8 Fedora Update System 2014-04-15 15:33:23 UTC
Package iputils-20121221-6.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing iputils-20121221-6.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-5072/iputils-20121221-6.fc20
then log in and leave karma (feedback).

Comment 9 Fedora Update System 2014-04-23 04:31:34 UTC
iputils-20121221-6.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.