Created attachment 1676415 [details] vdsm.log Description of problem: During of running flow of creating VM, CreateSnapshot(*30), DeleteVM get the error in vdsm.log Version-Release number of selected component (if applicable): rhv-release-4.3.5-1-001 vdsm-4.30.24-2.el7ev.x86_64 How reproducible: Steps to Reproduce: 1. Running 50 Users concurrency 2. CreateVM, CreateSnapshots(*30), DeleteVM 3. Run ~5 cycles (~20Hours) Actual results: ERROR messages Expected results: No Errors ERROR MESSAGE: 2020-04-02 04:25:05,584+0000 ERROR (jsonrpc/3) [storage.TaskManager.Task] (Task='e4444821-3794-4ed6-a017-e77faf59c881') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in getVolumeInfo File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3107, in getVolumeInfo info = vol.getInfo() File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 259, in getInfo leasestatus = self.getLeaseStatus() File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 204, in getLeaseStatus self.volUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 531, in inquireVolumeLease return self._domainLock.inquire(lease) File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 460, in inquire sector=self._block_size) SanlockException: (104, 'Unable to read resource owners', 'Connection reset by peer') 2020-04-02 04:25:05,584+0000 INFO (jsonrpc/3) [storage.TaskManager.Task] (Task='e4444821-3794-4ed6-a017-e77faf59c881') aborting: Task is aborted: u"(104, 'Unable to read resource owners', 'Connection reset by peer')" - code 100 (task:1181) 2020-04-02 04:25:05,585+0000 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH getVolumeInfo error=(104, 'Unable to read resource owners', 'Connection reset by peer') (dispatcher:87) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 74, in wrapper result = ctask.prepare(func, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in wrapper return m(self, *a, **kw) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, in prepare raise self.error SanlockException: (104, 'Unable to read resource owners', 'Connection reset by peer') Additional info:
David, is this a regression? is it reproducible?
it is reproducible regression - It is the first time we ran this scenario related: RHV bug: https://bugzilla.redhat.com/1664159 Sanlock bug: https://bugzilla.redhat.com/1812185
Targeting to 4.5 for the meanwhile as the Sanlock fix will only be released in RHEL 8.3, if needed please escalate to have it in 8.2.z so we can consume it earlier
We cannot deffer this to 4.5, sanlock is a critical component and we must use the lastest version to ensure that we consume all fixes. Moving back to 4.4.3. This version requires 8.3 so it must use sanlock from 8.3 (8.3.2), fixing this issue.
(In reply to Nir Soffer from comment #4) > We cannot deffer this to 4.5, sanlock is a critical component and we > must use the lastest version to ensure that we consume all fixes. > > Moving back to 4.4.3. This version requires 8.3 so it must use sanlock > from 8.3 (8.3.2), fixing this issue. According to release notes 4.4.3 is for el 8.2 [1], rhel 8.3 is currently beta AFAICT, centos 8.3 is not to be seen on the centos downloads. so if this ticket is for adding a spec requirement for sanlock-3.8.2 or later wouldn't it break users installation on a missing dep? [1] https://www.ovirt.org/release/4.4.3/
We are already requiring sanlock >= 3.8.2-4. Is this solves this bug?
This random failure that is not reproducible. Vdsm requires the sanlock version fixing for a while since 4.4.5. Moving to ONQA.