Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
If signal is received when libsanlock api is blocking, the call will fail
with EINTR.
Examples from vdsm (using libsanlock via the python bindings):
jsonrpc.Executor/3::ERROR::2016-06-08 00:03:48,803::task::868::Storage.TaskManager.Task::(_setError) Task=`1bb2fee0-9a56-49e4-890b-3c1b051683f5`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 875, in _run
return fn(*args, **kargs)
File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 606, in getSpmStatus
status = self._getSpmStatusInfo(pool)
File "/usr/share/vdsm/storage/hsm.py", line 600, in _getSpmStatusInfo
(pool.spmRole,) + pool.getSpmStatus()))
File "/usr/share/vdsm/storage/sp.py", line 114, in getSpmStatus
return self._backend.getSpmStatus()
File "/usr/share/vdsm/storage/spbackends.py", line 430, in getSpmStatus
lVer, spmId = self.masterDomain.inquireClusterLock()
File "/usr/share/vdsm/storage/sd.py", line 688, in inquireClusterLock
return self._manifest.inquireDomainLock()
File "/usr/share/vdsm/storage/sd.py", line 436, in inquireDomainLock
return self._domainLock.inquire()
File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 327, in inquire
resource = sanlock.read_resource(self._leasesPath, SDM_LEASE_OFFSET)
SanlockException: (4, 'Sanlock resource read failure', 'Interrupted system call')
jsonrpc.Executor/1::ERROR::2016-07-12 07:40:52,131::task::868::Storage.TaskManager.Task::(_setError) Task=`7cca3210-568f-4449-8890-34a3ee104952`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 875, in _run
return fn(*args, **kargs)
File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 1161, in attachStorageDomain
pool.attachSD(sdUUID)
File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/share/vdsm/storage/sp.py", line 951, in attachSD
dom.releaseHostId(self.id)
File "/usr/share/vdsm/storage/sd.py", line 664, in releaseHostId
self._manifest.releaseHostId(hostId, async)
File "/usr/share/vdsm/storage/sd.py", line 403, in releaseHostId
self._domainLock.releaseHostId(hostId, async, False)
File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 266, in releaseHostId
raise se.ReleaseHostIdFailure(self._sdUUID, e)
ReleaseHostIdFailure: Cannot release host id: (u'3755b6ce-4e6a-405b-b529-c01e25bac03e', SanlockException(4, 'Sanlock lockspace remove failure', 'Interrupted system call'))
Vdsm receive lot of signals, since it runs many short lived child
processes. Each time a child process finish, Vdsm receives a SIGCHLD
signal.
Version-Release number of selected component (if applicable):
Current sanlock version on 7.2
(I got these reports by mail, no info available on specific version)
How reproducible:
Rare.
Steps to Reproduce:
1. Any flow that include sanlock operations (acquire, release, inquire)
Actual results:
Sanlock call fails, failing the operation in vdsm.
Expected results:
Sanlock should handle EINTR for the client.
Additional info:
This was fixed upstream in:
commit f520991e83d0a05d1670abba4561c3de86a09c5f
Author: David Teigland <teigland>
Date: Wed Jun 8 12:06:22 2016 -0500
libsanlock: ignore EINTR
libsanlock calls were returning EINTR if there was a
signal during send() or recv(). Now just restart the
syscall on EINTR.
We like a backport to 7.2.z if possible.
Here's how I verified the libsanlock EINTR patch:
1. fallocate -l 1048576 /tmp/foo
2. losetup -f /tmp/foo
3. sanlock direct init -s foo:0:/dev/loop0:0
4. python ./addls
addls code that Nir sent me:
import os
import signal
import threading
import sanlock
def handle(signo, frame):
print "received signal: %d" % signo
def interrupt():
os.kill(os.getpid(), signal.SIGUSR1)
signal.signal(signal.SIGUSR1, handle)
path = "/dev/loop0"
open(path, "w").close()
threading.Timer(0.5, interrupt).start()
sanlock.add_lockspace("foo", 1, path)
After addls completes successfully, it should print: "received signal: 10",
and 'sanlock status' should show the line: "s foo:1:/dev/loop0:0"
Run 'sanlock rem_lockspace -s foo:1:/dev/loop0:0' to clear it.
If addls returns quickly with "Interrupted system call", then the patch is missing.
Created attachment 1179943[details]
sanlock test script
This is a test script for testing the fix.
With sanlock version from git (commit cf37f526de8b74ae41695f1d5f68b208ddac3a89),
the output should be:
$ sudo python sanlock-eintr.py
Creating lockspace file...
Initializing lockspace...
Adding lockspace...
(received signal: 10)
Checking lockspace...
[{'flags': 0,
'host_id': 1,
'lockspace': 'test',
'offset': 0,
'path': '/var/tmp/lockspace'}]
Removing lockspace...
(received signal: 10)
OK
With sanlock version before this fix, you will get the
exception seen in the bug description.
Description of problem: If signal is received when libsanlock api is blocking, the call will fail with EINTR. Examples from vdsm (using libsanlock via the python bindings): jsonrpc.Executor/3::ERROR::2016-06-08 00:03:48,803::task::868::Storage.TaskManager.Task::(_setError) Task=`1bb2fee0-9a56-49e4-890b-3c1b051683f5`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 875, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 606, in getSpmStatus status = self._getSpmStatusInfo(pool) File "/usr/share/vdsm/storage/hsm.py", line 600, in _getSpmStatusInfo (pool.spmRole,) + pool.getSpmStatus())) File "/usr/share/vdsm/storage/sp.py", line 114, in getSpmStatus return self._backend.getSpmStatus() File "/usr/share/vdsm/storage/spbackends.py", line 430, in getSpmStatus lVer, spmId = self.masterDomain.inquireClusterLock() File "/usr/share/vdsm/storage/sd.py", line 688, in inquireClusterLock return self._manifest.inquireDomainLock() File "/usr/share/vdsm/storage/sd.py", line 436, in inquireDomainLock return self._domainLock.inquire() File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 327, in inquire resource = sanlock.read_resource(self._leasesPath, SDM_LEASE_OFFSET) SanlockException: (4, 'Sanlock resource read failure', 'Interrupted system call') jsonrpc.Executor/1::ERROR::2016-07-12 07:40:52,131::task::868::Storage.TaskManager.Task::(_setError) Task=`7cca3210-568f-4449-8890-34a3ee104952`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 875, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 1161, in attachStorageDomain pool.attachSD(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 951, in attachSD dom.releaseHostId(self.id) File "/usr/share/vdsm/storage/sd.py", line 664, in releaseHostId self._manifest.releaseHostId(hostId, async) File "/usr/share/vdsm/storage/sd.py", line 403, in releaseHostId self._domainLock.releaseHostId(hostId, async, False) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 266, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: (u'3755b6ce-4e6a-405b-b529-c01e25bac03e', SanlockException(4, 'Sanlock lockspace remove failure', 'Interrupted system call')) Vdsm receive lot of signals, since it runs many short lived child processes. Each time a child process finish, Vdsm receives a SIGCHLD signal. Version-Release number of selected component (if applicable): Current sanlock version on 7.2 (I got these reports by mail, no info available on specific version) How reproducible: Rare. Steps to Reproduce: 1. Any flow that include sanlock operations (acquire, release, inquire) Actual results: Sanlock call fails, failing the operation in vdsm. Expected results: Sanlock should handle EINTR for the client. Additional info: This was fixed upstream in: commit f520991e83d0a05d1670abba4561c3de86a09c5f Author: David Teigland <teigland> Date: Wed Jun 8 12:06:22 2016 -0500 libsanlock: ignore EINTR libsanlock calls were returning EINTR if there was a signal during send() or recv(). Now just restart the syscall on EINTR. We like a backport to 7.2.z if possible.