Bug 1885841 - pcsd leaves stalled connections in CLOSE_WAIT state
Summary: pcsd leaves stalled connections in CLOSE_WAIT state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: pcs
Version: 8.3
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: 8.4
Assignee: Ondrej Mular
QA Contact: cluster-qe@redhat.com
Steven J. Levine
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-07 07:03 UTC by Ondrej Mular
Modified: 2021-05-18 15:12 UTC (History)
12 users (show)

Fixed In Version: pcs-0.10.7-1.el8
Doc Type: Bug Fix
Doc Text:
Clone Of: 1870551
Environment:
Last Closed: 2021-05-18 15:12:08 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description Ondrej Mular 2020-10-07 07:03:04 UTC
+++ This bug was initially created as a clone of Bug #1870551 +++

Description of problem:

While pcsd is running it leaves stalled connections in CLOSE_WAIT state which indicates the other end of the connection has been closed while the local end is still waiting for the application to close. THis doesn't affect pcsd nor cluster operation as such but might have external effects i.e. flooding system log with notifications about INVALID packets if iptables/firewalld is configured to log ctstate INVALID packets [there is situation when one side sends closure request (FIN packet), remote side (client) acks that FIN request, but does not send its own FIN (which would be expected for proper connection closure process) for long time, and keeps the socket on the client side in CLOSE_WAIT state. 
After some time the client finally sends the FIN, but as it takes too long time, the remote end already closed the connection based on timeout, and as such the last (late send) packets are considered as not legitimate, and therefore logged.]


#  netstat -laputen | grep 2224
tcp       32      0 10.20.30.40:37372     10.20.30.40:2224      CLOSE_WAIT  0          318652188  1781/ruby
tcp       32      0 10.20.30.40:31713     10.20.30.41:2224      CLOSE_WAIT  0          318659641  1781/ruby
tcp       32      0 10.20.30.40:30813     10.20.30.41:2224      CLOSE_WAIT  0          318546897  1781/ruby
tcp       32      0 10.20.30.40:36472     10.20.30.40:2224      CLOSE_WAIT  0          318542722  1781/ruby
tcp       32      0 10.20.30.40:29931     10.20.30.41:2224      CLOSE_WAIT  0          318441750  1781/ruby
tcp       32      0 10.20.30.40:35594     10.20.30.40:2224      CLOSE_WAIT  0          318438254  1781/ruby
tcp6       0      0 :::2224                 :::*                    LISTEN      0          23943      1781/ruby



Version-Release number of selected component (if applicable):
pcs-0.9.168-4.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. pcsd.service running
2. netstat -laputen | grep 2224
3. few minutes after pcsd started the stalled connections start to pile up

Actual results:
stalled connections in CLOSE_WAIT on port 2224

Expected results:
connections are closed properly

Additional info:


--- Additional comment from Ondrej Mular on 2020-10-07 06:58:05 UTC ---

After further investigation (together with Tomas, thank you) we've been able to understand the issue.

Pcsd is using libcurl for communication. Libcurl is designed to reuse open sockets, therefore there might be some tcp connections in ESTABLISHED state even thought there is no communication happening at the moment. We assumed that those connection will be eventually closed by GC mechanism (but it seems that is not the case)

To solve this issue, we have to explicitly close all connections right after we use them rather than rely on GC to do so.

Comment 3 Miroslav Lisik 2020-10-14 14:42:10 UTC
Test:

[root@r8-node-01 ~]# rpm -q pcs
pcs-0.10.7-1.el8.x86_64

Start pcsd and after a while check connections:

[root@r8-node-01 ~]# netstat -laputen | grep 2224
tcp        0      0 0.0.0.0:2224            0.0.0.0:*               LISTEN      0          23719      1109/platform-pytho
tcp6       0      0 :::2224                 :::*                    LISTEN      0          23720      1109/platform-pytho

Comment 6 Michal Mazourek 2020-11-02 08:31:32 UTC
BEFORE:
=======

[root@virt-041 ~]# rpm -q pcs
pcs-0.10.6-4.el8.x86_64


## after cluster setup, pcsd is running

[root@virt-041 ~]# systemctl status pcsd
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-10-06 18:16:46 CEST; 3 weeks 5 days ago
     Docs: man:pcsd(8)
           man:pcs(8)
 Main PID: 48979 (pcsd)
    Tasks: 1 (limit: 12666)
   Memory: 39.7M
   CGroup: /system.slice/pcsd.service
           └─48979 /usr/libexec/platform-python -Es /usr/sbin/pcsd
{...}


## checking the connections after a while

[root@virt-041 ~]# netstat -laputen | grep 2224
tcp        0      0 0.0.0.0:2224            0.0.0.0:*               LISTEN      0          94462      48979/platform-pyth 
tcp6       0      0 :::2224                 :::*                    LISTEN      0          94463      48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:32924 ESTABLISHED 0          12421032   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:18:44098 2620:52:0:25a4:180:2224 ESTABLISHED 0          12421030   48976/ruby          
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:52704 ESTABLISHED 0          12421033   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:18:52704 2620:52:0:25a4:180:2224 ESTABLISHED 0          12421031   48976/ruby   

> several connections in ESTABLISHED state


## waiting another couple of minutes

[root@virt-041 ~]# netstat -laputen | grep 2224
tcp        0      0 0.0.0.0:2224            0.0.0.0:*               LISTEN      0          94462      48979/platform-pyth 
tcp6       0      0 :::2224                 :::*                    LISTEN      0          94463      48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:32924 ESTABLISHED 0          12421032   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:18:44098 2620:52:0:25a4:180:2224 ESTABLISHED 0          12421030   48976/ruby          
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:52704 ESTABLISHED 0          12421033   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:18:52778 2620:52:0:25a4:180:2224 ESTABLISHED 0          12424618   48976/ruby          
tcp6       0      0 2620:52:0:25a4:18:44176 2620:52:0:25a4:180:2224 ESTABLISHED 0          12424621   48976/ruby          
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:52778 ESTABLISHED 0          12424619   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:33000 ESTABLISHED 0          12424608   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:18:52704 2620:52:0:25a4:180:2224 ESTABLISHED 0          12421031   48976/ruby        

> connections are starting to pile up


## after longer time

[root@virt-041 ~]# netstat -laputen | grep 2224
tcp        0      0 0.0.0.0:2224            0.0.0.0:*               LISTEN      0          94462      48979/platform-pyth 
tcp6       0      0 :::2224                 :::*                    LISTEN      0          94463      48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:33072 ESTABLISHED 0          12427451   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:32924 ESTABLISHED 0          12421032   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:18:44098 2620:52:0:25a4:180:2224 ESTABLISHED 0          12421030   48976/ruby          
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:52704 ESTABLISHED 0          12421033   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:18:52778 2620:52:0:25a4:180:2224 ESTABLISHED 0          12424618   48976/ruby          
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:52850 ESTABLISHED 0          12427450   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:33150 ESTABLISHED 0          12430034   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:18:44176 2620:52:0:25a4:180:2224 ESTABLISHED 0          12424621   48976/ruby          
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:52778 ESTABLISHED 0          12424619   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:33000 ESTABLISHED 0          12424608   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:18:52850 2620:52:0:25a4:180:2224 ESTABLISHED 0          12427449   48976/ruby          
tcp6       0      0 2620:52:0:25a4:18:44248 2620:52:0:25a4:180:2224 ESTABLISHED 0          12427453   48976/ruby          
tcp6       0      0 2620:52:0:25a4:18:52704 2620:52:0:25a4:180:2224 ESTABLISHED 0          12421031   48976/ruby          
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:52928 ESTABLISHED 0          12430033   48979/platform-pyth 
tcp6       0      0 2620:52:0:25a4:18:44322 2620:52:0:25a4:180:2224 ESTABLISHED 0          12430031   48976/ruby          
tcp6       0      0 2620:52:0:25a4:18:52928 2620:52:0:25a4:180:2224 ESTABLISHED 0          12430032   48976/ruby  

> connections keep piling up


AFTER:
======

[root@virt-134 ~]# rpm -q pcs
pcs-0.10.7-1.el8.x86_64


## after cluster setup, pcsd is running

[root@virt-134 ~]# systemctl status pcsd
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-10-19 10:28:53 CEST; 1 weeks 6 days ago
     Docs: man:pcsd(8)
           man:pcs(8)
 Main PID: 49061 (pcsd)
    Tasks: 1 (limit: 12666)
   Memory: 39.8M
   CGroup: /system.slice/pcsd.service
           └─49061 /usr/libexec/platform-python -Es /usr/sbin/pcsd
{...}


## checking the connections after a while

[root@virt-134 ~]# netstat -laputen | grep 2224
tcp        0      0 0.0.0.0:2224            0.0.0.0:*               LISTEN      0          94465      49061/platform-pyth 
tcp6       0      0 :::2224                 :::*                    LISTEN      0          94466      49061/platform-pyth 

> no connections in ESTABLISHED state


## waiting another couple of minutes

[root@virt-134 ~]# netstat -laputen | grep 2224
tcp        0      0 0.0.0.0:2224            0.0.0.0:*               LISTEN      0          94465      49061/platform-pyth 
tcp6       0      0 :::2224                 :::*                    LISTEN      0          94466      49061/platform-pyth 

> OK: connections are not piling up


## for a brief moment, there were connections in TIME_WAIT state, but they were closed almost immediately

[root@virt-134 ~]# netstat -laputen | grep 2224
tcp        0      0 0.0.0.0:2224            0.0.0.0:*               LISTEN      0          94465      49061/platform-pyth 
tcp6       0      0 :::2224                 :::*                    LISTEN      0          94466      49061/platform-pyth 
tcp6       0      0 2620:52:0:25a4:180:2224 2620:52:0:25a4:18:52958 TIME_WAIT   0          0          -                   
tcp6       0      0 2620:52:0:25a4:18:41924 2620:52:0:25a4:180:2224 TIME_WAIT   0          0          -                   
 
[root@virt-134 ~]# netstat -laputen | grep 2224
tcp        0      0 0.0.0.0:2224            0.0.0.0:*               LISTEN      0          94465      49061/platform-pyth 
tcp6       0      0 :::2224                 :::*                    LISTEN      0          94466      49061/platform-pyth 


## after longer time

[root@virt-134 ~]# netstat -laputen | grep 2224
tcp        0      0 0.0.0.0:2224            0.0.0.0:*               LISTEN      0          94465      49061/platform-pyth 
tcp6       0      0 :::2224                 :::*                    LISTEN      0          94466      49061/platform-pyth 

> OK: connections are not piling up


Marking as VERIFIED for pcs-0.10.7-1.el8

Comment 8 errata-xmlrpc 2021-05-18 15:12:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:1737


Note You need to log in before you can comment on or make changes to this bug.