Description of problem:
A patch in the latest testing version of beakerlib (BZ#1388422) now causes that rlWaitForSocket --close can cause a deadlock/unwanted delay, because of grepping an incorrect socket.
Current solution of waiting for a socket is done by this snippet:
local cmd="netstat -nla | grep -E '$grep_opt' >/dev/null"
Given a real-world example scenario, where a client-server test is executed (server listens on port 4433 and client connects to it) following a simple server process kill along with rlWaitForSocket --close 4433, we get following netstat output:
# netstat -nla
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp6 0 0 ::1:56226 ::1:4433 TIME_WAIT
Now the problem can be easily spotted - this socket right here is a *client* socket, which is irrelevant for us right now, but the function rlWaitForSocket actually waits for it to close, because it's matched by the grep. This usually takes (in our case) around 50 seconds. So, from a phase, which originally took around 3 seconds we have a phase, which now takes around 53 seconds. This causes a huge difference in a runtime of some of our tests (from ~30 minutes to (dozens of) hours).
There are few possible solutions, basically you just need to compare the given socket with local sockets only, not with foreign ones. Also, I would split checking of unix sockets and TCP/UDP sockets into two different calls, to prevent another weird issues, like socket name containing the checked port. Not sure how popular awk is between beakerlib developers, but it could by done by something along this lines:
ss -natu state all | awk '
match($5, "^.*?:(.+)$", a);
Something similar can be done for unix sockets as well:
ss -nax state all | awk '
The matching itself must be exact - current solution with grep -E causes 22 being matched even if a port 2222 is opened instead of 22.
Everything written here is just an overall idea what should be checked/compared with what - any another ideas and improvements are more than welcome.
Version-Release number of selected component (if applicable):
*** Bug 1416018 has been marked as a duplicate of this bug. ***
(In reply to Frantisek Sumsal from comment #0)
> The matching itself must be exact - current solution with grep -E causes 22
> being matched even if a port 2222 is opened instead of 22.
Fortunately this cannot happen as the regexp is not that dumb. But it may happed in case of unix socket.
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle.
Changing version to '27'.
I propose to limit monitoring to local ports and to add --remote to specify monitoring to remote port. This would be backward incompatible change but it is more natural approach.
This bug appears to have been reported against 'rawhide' during the Fedora 30 development cycle.
Changing version to '30.
Should be fixed by commit https://github.com/beakerlib/beakerlib/commit/4ac8297be3d6bca718269dab1d5a1c8f4ebfb257
Dalibor could you please review it/test it?