RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1356667 - libsanlock does not handle EINTR, causing failures in client
Summary: libsanlock does not handle EINTR, causing failures in client
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sanlock
Version: 7.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: David Teigland
QA Contact: Aharon Canan
URL:
Whiteboard:
Depends On:
Blocks: 1356676 1357883 1416336
TreeView+ depends on / blocked
 
Reported: 2016-07-14 15:59 UTC by Nir Soffer
Modified: 2021-09-03 12:06 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1357883 (view as bug list)
Environment:
Last Closed: 2016-09-29 13:58:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sanlock test script (256 bytes, text/plain)
2016-07-14 19:45 UTC, Nir Soffer
no flags Details
sanlock test script (918 bytes, text/plain)
2016-07-17 12:52 UTC, Nir Soffer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2356 0 normal SHIPPED_LIVE sanlock bug fix and enhancement update 2016-11-03 13:48:46 UTC

Description Nir Soffer 2016-07-14 15:59:14 UTC
Description of problem:

If signal is received when libsanlock api is blocking, the call will fail
with EINTR.

Examples from vdsm (using libsanlock via the python bindings):

jsonrpc.Executor/3::ERROR::2016-06-08 00:03:48,803::task::868::Storage.TaskManager.Task::(_setError) Task=`1bb2fee0-9a56-49e4-890b-3c1b051683f5`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 875, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 606, in getSpmStatus
    status = self._getSpmStatusInfo(pool)
  File "/usr/share/vdsm/storage/hsm.py", line 600, in _getSpmStatusInfo
    (pool.spmRole,) + pool.getSpmStatus()))
  File "/usr/share/vdsm/storage/sp.py", line 114, in getSpmStatus
    return self._backend.getSpmStatus()
  File "/usr/share/vdsm/storage/spbackends.py", line 430, in getSpmStatus
    lVer, spmId = self.masterDomain.inquireClusterLock()
  File "/usr/share/vdsm/storage/sd.py", line 688, in inquireClusterLock
    return self._manifest.inquireDomainLock()
  File "/usr/share/vdsm/storage/sd.py", line 436, in inquireDomainLock
    return self._domainLock.inquire()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 327, in inquire
    resource = sanlock.read_resource(self._leasesPath, SDM_LEASE_OFFSET)
SanlockException: (4, 'Sanlock resource read failure', 'Interrupted system call')


jsonrpc.Executor/1::ERROR::2016-07-12 07:40:52,131::task::868::Storage.TaskManager.Task::(_setError) Task=`7cca3210-568f-4449-8890-34a3ee104952`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 875, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1161, in attachStorageDomain
    pool.attachSD(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 951, in attachSD
    dom.releaseHostId(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 664, in releaseHostId
    self._manifest.releaseHostId(hostId, async)
  File "/usr/share/vdsm/storage/sd.py", line 403, in releaseHostId
    self._domainLock.releaseHostId(hostId, async, False)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 266, in releaseHostId
    raise se.ReleaseHostIdFailure(self._sdUUID, e)
ReleaseHostIdFailure: Cannot release host id: (u'3755b6ce-4e6a-405b-b529-c01e25bac03e', SanlockException(4, 'Sanlock lockspace remove failure', 'Interrupted system call'))

Vdsm receive lot of signals, since it runs many short lived child
processes. Each time a child process finish, Vdsm receives a SIGCHLD
signal.

Version-Release number of selected component (if applicable):
Current sanlock version on 7.2
(I got these reports by mail, no info available on specific version)

How reproducible:
Rare.

Steps to Reproduce:
1. Any flow that include sanlock operations (acquire, release, inquire)

Actual results:
Sanlock call fails, failing the operation in vdsm.

Expected results:
Sanlock should handle EINTR for the client.

Additional info:

This was fixed upstream in:

commit f520991e83d0a05d1670abba4561c3de86a09c5f
Author: David Teigland <teigland>
Date:   Wed Jun 8 12:06:22 2016 -0500

    libsanlock: ignore EINTR
    
    libsanlock calls were returning EINTR if there was a
    signal during send() or recv().  Now just restart the
    syscall on EINTR.

We like a backport to 7.2.z if possible.

Comment 1 Allon Mureinik 2016-07-14 16:13:02 UTC
Bronce - can you assist with pushing this for 7.2.z?

Comment 2 David Teigland 2016-07-14 16:14:00 UTC
This fix is already in 7.3, but there was no previous 7.3 bz for it. This bz was created after the fact as a formality en route to 7.2.z.

Comment 4 Nir Soffer 2016-07-14 16:36:54 UTC
David, can you explain how this can be tested?

Comment 5 Nir Soffer 2016-07-14 16:38:31 UTC
Aharon, can you ack this?

Comment 7 David Teigland 2016-07-14 18:59:13 UTC
Here's how I verified the libsanlock EINTR patch:

1. fallocate -l 1048576 /tmp/foo

2. losetup -f /tmp/foo

3. sanlock direct init -s foo:0:/dev/loop0:0

4. python ./addls

addls code that Nir sent me:

import os
import signal
import threading
 
import sanlock
 
def handle(signo, frame):
    print "received signal: %d" % signo
 
def interrupt():
    os.kill(os.getpid(), signal.SIGUSR1)
 
signal.signal(signal.SIGUSR1, handle)
 
path = "/dev/loop0"
open(path, "w").close()
 
threading.Timer(0.5, interrupt).start()
 
sanlock.add_lockspace("foo", 1, path)


After addls completes successfully, it should print: "received signal: 10",
and 'sanlock status' should show the line: "s foo:1:/dev/loop0:0"
Run 'sanlock rem_lockspace -s foo:1:/dev/loop0:0' to clear it.

If addls returns quickly with "Interrupted system call", then the patch is missing.

Comment 8 Nir Soffer 2016-07-14 19:45:56 UTC
Created attachment 1179943 [details]
sanlock test script

This is a test script for testing the fix.

With sanlock version from git (commit cf37f526de8b74ae41695f1d5f68b208ddac3a89),
the output should be:

$ sudo python sanlock-eintr.py 
Creating lockspace file...
Initializing lockspace...
Adding lockspace...
(received signal: 10)
Checking lockspace...
[{'flags': 0,
  'host_id': 1,
  'lockspace': 'test',
  'offset': 0,
  'path': '/var/tmp/lockspace'}]
Removing lockspace...
(received signal: 10)
OK

With sanlock version before this fix, you will get the
exception seen in the bug description.

Comment 10 Nir Soffer 2016-07-17 12:50:17 UTC
Comment on attachment 1179943 [details]
sanlock test script

Wrong script

Comment 11 Nir Soffer 2016-07-17 12:52:57 UTC
Created attachment 1180756 [details]
sanlock test script

This is the correct test script, see comment 8 for instructions.

Comment 18 Aharon Canan 2016-09-29 13:58:52 UTC
We (Rhev storage QE) executed our automation using below sanlock pkgs and didn't face anything special.

[root@lynx09 ~]# rpm -qa |grep sanlock
sanlock-lib-3.4.0-1.el7.x86_64
sanlock-3.4.0-1.el7.x86_64
libvirt-lock-sanlock-2.0.0-10.el7.x86_64
sanlock-python-3.4.0-1.el7.x86_64


Closing.


Note You need to log in before you can comment on or make changes to this bug.