Bug 1416014 - rlWaitForSocket --close now waits for incorrect socket
Summary: rlWaitForSocket --close now waits for incorrect socket
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: beakerlib
Version: 30
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
Assignee: Jakub Heger
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1416018 (view as bug list)
Depends On: 1496120
Blocks: 1388422
TreeView+ depends on / blocked
 
Reported: 2017-01-24 11:10 UTC by Frantisek Sumsal
Modified: 2019-05-24 14:51 UTC (History)
7 users (show)

Fixed In Version: beakerlib-1.18-3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-24 14:51:17 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1388422 0 low CLOSED rlWaitForSocket --close should wait for socket to actually close 2021-02-22 00:41:40 UTC

Internal Links: 1388422

Description Frantisek Sumsal 2017-01-24 11:10:09 UTC
Description of problem:
A patch in the latest testing version of beakerlib (BZ#1388422) now causes that rlWaitForSocket --close can cause a deadlock/unwanted delay, because of grepping an incorrect socket.

Current solution of waiting for a socket is done by this snippet:

local cmd="netstat -nla | grep -E '$grep_opt' >/dev/null"

Given a real-world example scenario, where a client-server test is executed (server listens on port 4433 and client connects to it) following a simple server process kill along with rlWaitForSocket --close 4433, we get following netstat output:

# netstat -nla
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State   
...
tcp6       0      0 ::1:56226               ::1:4433                TIME_WAIT

Now the problem can be easily spotted - this socket right here is a *client* socket, which is irrelevant for us right now, but the function rlWaitForSocket actually waits for it to close, because it's matched by the grep. This usually takes (in our case) around 50 seconds. So, from a phase, which originally took around 3 seconds we have a phase, which now takes around 53 seconds. This causes a huge difference in a runtime of some of our tests (from ~30 minutes to (dozens of) hours).

There are few possible solutions, basically you just need to compare the given socket with local sockets only, not with foreign ones. Also, I would split checking of unix sockets and TCP/UDP sockets into two different calls, to prevent another weird issues, like socket name containing the checked port. Not sure how popular awk is between beakerlib developers, but it could by done by something along this lines:

ss -natu state all | awk '
BEGIN {
    FS=" "
}
{
    match($5, "^.*?:(.+)$", a);
    if(a[1])
        print a[1];
}'

Something similar can be done for unix sockets as well:

ss -nax state all | awk '
BEGIN {
    FS=" "
}
{
    print $5;
}'

The matching itself must be exact - current solution with grep -E causes 22 being matched even if a port 2222 is opened instead of 22.

Everything written here is just an overall idea what should be checked/compared with what - any another ideas and improvements are more than welcome.

Version-Release number of selected component (if applicable):
beakerlib-1.12-1.fc25.noarch

Comment 1 Dalibor Pospíšil 2017-01-24 11:15:29 UTC
*** Bug 1416018 has been marked as a duplicate of this bug. ***

Comment 2 Dalibor Pospíšil 2017-01-24 13:39:20 UTC
(In reply to Frantisek Sumsal from comment #0)
> The matching itself must be exact - current solution with grep -E causes 22
> being matched even if a port 2222 is opened instead of 22.
Fortunately this cannot happen as the regexp is not that dumb. But it may happed in case of unix socket.

Comment 3 Jan Kurik 2017-08-15 08:40:13 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle.
Changing version to '27'.

Comment 4 Dalibor Pospíšil 2019-02-07 13:39:29 UTC
I propose to limit monitoring to local ports and to add --remote to specify monitoring to remote port. This would be backward incompatible change but it is more natural approach.

Comment 5 Ben Cotton 2019-02-19 17:11:47 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 30 development cycle.
Changing version to '30.

Comment 6 Jakub Heger 2019-02-22 14:27:29 UTC
Should be fixed by commit https://github.com/beakerlib/beakerlib/commit/4ac8297be3d6bca718269dab1d5a1c8f4ebfb257
Dalibor could you please review it/test it?


Note You need to log in before you can comment on or make changes to this bug.