1870551 – pcsd leaves stalled connections in CLOSE_WAIT state [rhel-7.9.z]

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1870551 - pcsd leaves stalled connections in CLOSE_WAIT state [rhel-7.9.z]

Summary: pcsd leaves stalled connections in CLOSE_WAIT state [rhel-7.9.z]

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	7.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Ondrej Mular
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-20 10:36 UTC by Josef Zimek
Modified:	2024-03-25 16:20 UTC (History)
CC List:	11 users (show)
Fixed In Version:	pcs-0.9.167-3.el7_7.2
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1885841 (view as bug list)
Environment:
Last Closed:	2020-12-15 11:20:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	5647101	0	None	None	None	2020-12-14 16:04:14 UTC

Description Josef Zimek 2020-08-20 10:36:27 UTC

Description of problem:

While pcsd is running it leaves stalled connections in CLOSE_WAIT state which indicates the other end of the connection has been closed while the local end is still waiting for the application to close. THis doesn't affect pcsd nor cluster operation as such but might have external effects i.e. flooding system log with notifications about INVALID packets if iptables/firewalld is configured to log ctstate INVALID packets [there is situation when one side sends closure request (FIN packet), remote side (client) acks that FIN request, but does not send its own FIN (which would be expected for proper connection closure process) for long time, and keeps the socket on the client side in CLOSE_WAIT state. 
After some time the client finally sends the FIN, but as it takes too long time, the remote end already closed the connection based on timeout, and as such the last (late send) packets are considered as not legitimate, and therefore logged.]


#  netstat -laputen | grep 2224
tcp       32      0 10.20.30.40:37372     10.20.30.40:2224      CLOSE_WAIT  0          318652188  1781/ruby
tcp       32      0 10.20.30.40:31713     10.20.30.41:2224      CLOSE_WAIT  0          318659641  1781/ruby
tcp       32      0 10.20.30.40:30813     10.20.30.41:2224      CLOSE_WAIT  0          318546897  1781/ruby
tcp       32      0 10.20.30.40:36472     10.20.30.40:2224      CLOSE_WAIT  0          318542722  1781/ruby
tcp       32      0 10.20.30.40:29931     10.20.30.41:2224      CLOSE_WAIT  0          318441750  1781/ruby
tcp       32      0 10.20.30.40:35594     10.20.30.40:2224      CLOSE_WAIT  0          318438254  1781/ruby
tcp6       0      0 :::2224                 :::*                    LISTEN      0          23943      1781/ruby



Version-Release number of selected component (if applicable):
pcs-0.9.168-4.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. pcsd.service running
2. netstat -laputen | grep 2224
3. few minutes after pcsd started the stalled connections start to pile up

Actual results:
stalled connections in CLOSE_WAIT on port 2224

Expected results:
connections are closed properly

Additional info:

Comment 7 Ondrej Mular 2020-10-07 06:58:05 UTC

After further investigation (together with Tomas, thank you) we've been able to understand the issue.

Pcsd is using libcurl for communication. Libcurl is designed to reuse open sockets, therefore there might be some tcp connections in ESTABLISHED state even thought there is no communication happening at the moment. We assumed that those connection will be eventually closed by GC mechanism (but it seems that is not the case)

To solve this issue, we have to explicitly close all connections right after we use them rather than rely on GC to do so.

Comment 9 Ondrej Mular 2020-10-07 07:56:43 UTC

Upstream fix: https://github.com/ClusterLabs/pcs/commit/14d0f7b20bf79fa01cc6f28611f716900e3b3233

After applying this patch on all nodes, we were unable to reproduce this issue anymore.

Comment 19 errata-xmlrpc 2020-12-15 11:20:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pcs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5449

Note You need to log in before you can comment on or make changes to this bug.