RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1812185 - fix client ECONNRESET by closing wrong fd
Summary: fix client ECONNRESET by closing wrong fd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sanlock
Version: 8.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: David Teigland
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1664159 1821042
TreeView+ depends on / blocked
 
Reported: 2020-03-10 17:54 UTC by David Teigland
Modified: 2021-09-07 12:04 UTC (History)
5 users (show)

Fixed In Version: sanlock-3.8.1-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-04 02:14:39 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4595 0 None None None 2020-11-04 02:14:49 UTC

Description David Teigland 2020-03-10 17:54:07 UTC
Description of problem:

This sanlock bug was found and fixed in RHV bug 1664159.

commit 42f7f8f2d924eb8abe52b1c118ee89871d9112f1
Author: David Teigland <teigland>
Date:   Fri Mar 6 16:03:01 2020 -0600

    sanlock: fix closing wrong client fd
    
    The symptoms of this bug were inq_lockspace returning
    ECONNRESET.  It was caused by a previous client closing
    the fd of a newer client doing inq_lockspace (when both
    clients were running at roughly the same time.)
    
    First client ci1, second client ci2.
    
    ci1 in call_cmd_daemon() is finished, and close(fd)
    is called (and client[ci].fd is *not* set to -1).
    
    ci2 is a new client at about the same time and gets the
    same fd that had been used by ci1.
    
    ci1 being finished triggers a poll error, which results
    in client_free(ci1).  client_free looks at client[ci1].fd
    and finds it is not -1, so it calls close() on it, but
    this fd is now being used by ci2.  This breaks the sanlock
    daemon connection for ci2 and the client gets ECONNRESET.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Nir Soffer 2020-05-01 22:37:29 UTC
David, this should POST or MODIFIED, no?

Comment 5 Corey Marthaler 2020-09-15 15:05:22 UTC
Fix verified in the latest rpms. I ran the mentioned scenario in a loop and never saw the ECONNRESET error.


sanlock-3.8.2-1.el8    BUILT: Mon Aug 10 12:12:49 CDT 2020
sanlock-lib-3.8.2-1.el8    BUILT: Mon Aug 10 12:12:49 CDT 2020

kernel-4.18.0-234.el8    BUILT: Thu Aug 20 12:01:26 CDT 2020
lvm2-2.03.09-5.el8    BUILT: Wed Aug 12 15:51:50 CDT 2020
lvm2-libs-2.03.09-5.el8    BUILT: Wed Aug 12 15:51:50 CDT 2020


[root@hayes-01 ~]# systemctl status sanlock
â sanlock.service - Shared Storage Lease Manager
   Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-09-14 13:41:40 CDT; 2h 52min ago
  Process: 72782 ExecStart=/usr/sbin/sanlock daemon (code=exited, status=0/SUCCESS)
 Main PID: 72783 (sanlock)
    Tasks: 6 (limit: 1647453)
   Memory: 21.5M
   CGroup: /system.slice/sanlock.service
           ââ72783 /usr/sbin/sanlock daemon
           ââ72784 /usr/sbin/sanlock daemon

Sep 14 13:41:40 hayes-01.lab.msp.redhat.com systemd[1]: Starting Shared Storage Lease Manager...
Sep 14 13:41:40 hayes-01.lab.msp.redhat.com systemd[1]: Started Shared Storage Lease Manager.

[root@hayes-01 ~]# lvs -a -o +devices
  LV    VG    Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices     
  lock0 sanlk -wi-a----- 4.00m                                                     /dev/sdd1(2)
  lock1 sanlk -wi-a----- 4.00m                                                     /dev/sdd1(3)
  lock2 sanlk -wi-a----- 4.00m                                                     /dev/sdd1(4)

[root@hayes-01 ~]# sanlock client init -s LS0:0:/dev/sanlk/lock0:0 -o 2
init
init done 0
[root@hayes-01 ~]# sanlock client init -s LS1:0:/dev/sanlk/lock1:0 -o 2
init
init done 0
[root@hayes-01 ~]# sanlock client init -s LS2:0:/dev/sanlk/lock2:0 -o 2
init
init done 0

[root@hayes-01 ~]# sanlock client add_lockspace -s LS0:1:/dev/sanlk/lock0:0 -o 2
add_lockspace_timeout 2
add_lockspace_timeout done 0
[root@hayes-01 ~]# sanlock client add_lockspace -s LS1:1:/dev/sanlk/lock1:0 -o 2
add_lockspace_timeout 2
add_lockspace_timeout done 0
[root@hayes-01 ~]# sanlock client add_lockspace -s LS2:1:/dev/sanlk/lock2:0 -o 2
add_lockspace_timeout 2
add_lockspace_timeout done 0

[root@hayes-01 ~]# sanlock status
daemon de5af774-c1b3-4202-b94e-5d0bfa9250cb.hayes-01.la
p -1 helper
p -1 listener
p -1 status
s LS2:1:/dev/sanlk/lock2:0
s LS1:1:/dev/sanlk/lock1:0
s LS0:1:/dev/sanlk/lock0:0

[root@hayes-01 ~]# sanlock status > /dev/null & sanlock client inq_lockspace -s LS2:1:/dev/sanlk/lock2:0 >>inq & sanlock client inq_lockspace -s LS1:1:/dev/sanlk/lock1:0 >> inq & sanlock client inq_lockspace -s LS0:1:/dev/sanlk/lock0:0 >>inq & sanlock status > /dev/null
[1] 89910
[2] 89911
[3] 89912
[4] 89913
[1]   Done                    sanlock status > /dev/null
[2]   Done                    sanlock client inq_lockspace -s LS2:1:/dev/sanlk/lock2:0 >> inq
[3]-  Done                    sanlock client inq_lockspace -s LS1:1:/dev/sanlk/lock1:0 >> inq
[4]+  Done                    sanlock client inq_lockspace -s LS0:1:/dev/sanlk/lock0:0 >> inq


[root@hayes-01 ~]# cat inq
inq_lockspace
inq_lockspace done -2
inq_lockspace
inq_lockspace done -2
inq_lockspace
inq_lockspace done -2
inq_lockspace
inq_lockspace done 0
inq_lockspace
inq_lockspace done 0
inq_lockspace
inq_lockspace done 0

Comment 8 errata-xmlrpc 2020-11-04 02:14:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sanlock bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4595


Note You need to log in before you can comment on or make changes to this bug.