Bug 1356676

Summary: consume fix for "Bug 1356667: libsanlock does not handle EINTR, causing failures in client" to be released 2016-Sep-13
Product: [oVirt] vdsm Reporter: Nir Soffer <nsoffer>
Component: CoreAssignee: Allon Mureinik <amureini>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: high    
Version: 4.18.0CC: amureini, bugs, mkalinin, nsoffer, ratamir, tnisan, ylavi
Target Milestone: ovirt-4.1.0-betaFlags: rule-engine: ovirt-4.1+
rule-engine: exception+
rule-engine: planning_ack+
amureini: devel_ack+
acanan: testing_ack+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1416336 (view as bug list) Environment:
Last Closed: 2017-02-15 14:50:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1356667    
Bug Blocks: 1416336    

Description Nir Soffer 2016-07-14 16:29:00 UTC
Description of problem:

Sanlock version fixing this issue eliminate rare failures in vdsm, when vdsm
receive a signal during sanlock operation.

When this happens, the vdsm operation fails with this error:

jsonrpc.Executor/3::ERROR::2016-06-08 00:03:48,803::task::868::Storage.TaskManager.Task::(_setError) Task=`1bb2fee0-9a56-49e4-890b-3c1b051683f5`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 875, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 606, in getSpmStatus
    status = self._getSpmStatusInfo(pool)
  File "/usr/share/vdsm/storage/hsm.py", line 600, in _getSpmStatusInfo
    (pool.spmRole,) + pool.getSpmStatus()))
  File "/usr/share/vdsm/storage/sp.py", line 114, in getSpmStatus
    return self._backend.getSpmStatus()
  File "/usr/share/vdsm/storage/spbackends.py", line 430, in getSpmStatus
    lVer, spmId = self.masterDomain.inquireClusterLock()
  File "/usr/share/vdsm/storage/sd.py", line 688, in inquireClusterLock
    return self._manifest.inquireDomainLock()
  File "/usr/share/vdsm/storage/sd.py", line 436, in inquireDomainLock
    return self._domainLock.inquire()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 327, in inquire
    resource = sanlock.read_resource(self._leasesPath, SDM_LEASE_OFFSET)
SanlockException: (4, 'Sanlock resource read failure', 'Interrupted system call')

Version-Release number of selected component (if applicable):
any

How reproducible:
rare

Fixed upstream in sanlcok, we are waiting for a backport.

Comment 2 Yaniv Kaul 2016-07-21 13:29:51 UTC
sanlock-3.2.4-3.el7_2 is available for testing, ON_QA already, for 7.2.z.

Comment 3 Nir Soffer 2016-07-21 14:15:49 UTC
(In reply to Yaniv Kaul from comment #2)
> sanlock-3.2.4-3.el7_2 is available for testing, ON_QA already, for 7.2.z.

The package was not released yet, so are still blocked on it, but I sent a patch
to require it. We will merge the patch as soon as the package is available
to users.

Comment 4 Yaniv Kaul 2016-07-25 12:52:41 UTC
I think the bug for 7.2.z is bug 1357883 .

Comment 5 Allon Mureinik 2016-11-07 08:53:36 UTC
With RHEL 7.3 released, the 7.2.z backport is a mute point. Sanlock 3.4.0-1 delivers a fix for this issue via bug 1356667.

We'll wait a couple of days till it hits the CentOS repository too and require it in vdsm.spec.in in time for oVirt 4.0.6.
RHEL users can already run "yum update sanlock*" to receive the newest sanlock package that resolves the issue regardless of VDSM's requirements.

Comment 6 Allon Mureinik 2016-11-22 17:33:01 UTC
reducing both priority and severity as a fix is available by yum upgrade sanlock. When CentOS' package will be available, we'll consume it.

Comment 7 Allon Mureinik 2016-12-27 15:26:25 UTC
RHEL 7.2.z has become a mute point by now. We should require the sanlock version provided in RHEL 7.3 and QAed by bug 1356667.

Comment 8 Sandro Bonazzola 2017-01-25 07:55:20 UTC
4.0.6 has been the last oVirt 4.0 release, please re-target this bug.

Comment 9 Raz Tamir 2017-01-26 08:35:12 UTC
Nir,
Can you provide clear steps to reproduce?

Comment 10 Nir Soffer 2017-01-31 11:43:25 UTC
This is random error, there is no way to reproduce it. The fix is in sanlock, and
in vdsm we only require the version with the fix.

Comment 11 Elad 2017-01-31 11:57:46 UTC
vdsm requires the correct Sanlock version which delivers a fix for this issue via bug 1356667 (according to comment #5)

[root@storage-ge2-vdsm1 ~]# yum deplist vdsm

  dependency: sanlock >= 3.4.0-1
   provider: sanlock.x86_64 3.4.0-1.el7


Tested using:
vdsm-4.19.2-2.el7ev.x86_64