1459855 – [RFE] vdsm sanlock exception on top gluster with 4K disks

Bug 1459855 - [RFE] vdsm sanlock exception on top gluster with 4K disks

Summary: [RFE] vdsm sanlock exception on top gluster with 4K disks

Keywords:
Status:	CLOSED DUPLICATE of bug 798947
Alias:	None
Product:	vdsm
Classification:	oVirt
Component:	Gluster
Sub Component:
Version:	4.19.15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.2.0
Target Release:	---
Assignee:	Nir Soffer
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-08 11:17 UTC by Alex Kaouris
Modified:	2017-07-17 11:45 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-06-12 09:14:08 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.2+

Attachments	(Terms of Use)
strace log on gluster mount point (5.40 KB, text/plain) 2017-06-08 11:17 UTC, Alex Kaouris	no flags	Details
View All

Description Alex Kaouris 2017-06-08 11:17:57 UTC

Created attachment 1286119 [details]
strace log on gluster mount point

Description of problem:
Storage pool creation fails during initial setup of hosted-engine (self hosted), on top a 3 nodes gluster storage domain, giving Sanlock exception: 

Failed to execute stage 'Misc configuration': Cannot acquire host id: (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))

This issue is seen only on disk drives with 4K physical/logical sector size. 

Version-Release number of selected component (if applicable):

version packages: 
vdsm-4.19.15-1.el7.centos.x86_64
ovirt-release41-4.1.2-1.el7.centos.noarch
ovirt-imageio-common-1.0.0-1.el7.noarch
cockpit-ovirt-dashboard-0.10.7-0.0.18.el7.centos.noarch
ovirt-imageio-daemon-1.0.0-1.el7.noarch
ovirt-host-deploy-1.6.5-1.el7.centos.noarch
ovirt-vmconsole-host-1.0.4-1.el7.centos.noarch
ovirt-hosted-engine-ha-2.1.0.6-1.el7.centos.noarch
ovirt-vmconsole-1.0.4-1.el7.centos.noarch
ovirt-hosted-engine-setup-2.1.0.6-1.el7.centos.noarch
ovirt-engine-appliance-4.1-20170523.1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7.centos.noarch
ovirt-setup-lib-1.1.0-1.el7.centos.noarch

OS: CentOS Linux release 7.3.1611 (Core) 64 bit
kernel: 3.10.0-514.21.1.el7.x86_64

How reproducible: always


Steps to Reproduce:
1. setup gluster volume with 3 nodes
2. run hosted-engine --deploy --config-append=/root/storage.conf


Actual results:
Installation fails

Expected results:
Storage pool must be prepared and installation must succeed.

Additional info:

The /var/log/sanlock.log has the following lines:

2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/mnt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047/dom_md/ids:0
2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576 for 2,9,23040
2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids:0
2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD 0x7f59b00008c0:0x7f59b00008d0:0x7f59b0101000 result -22:0 match res
2017-06-03 19:21:36+0200 23471 [23522]: read_sectors delta_leader offset 127488 rv -22 /rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids
2017-06-03 19:21:37+0200 23472 [930]: s9 host 250 1 23450 88c2244c-a782-40ed-9560-6cfa4d46f853.v0.neptune
2017-06-03 19:21:37+0200 23472 [943]: s10 add_lockspace fail result -22

And /var/log/vdsm/vdsm.log says:

2017-06-03 19:19:38,176+0200 WARN  (jsonrpc/3) [storage.StorageServer.MountConnection] Using user specified backup-volfile-servers option (storageServer:253)
2017-06-03 19:21:12,379+0200 WARN  (periodic/1) [throttled] MOM not available. (throttledlog:105)
2017-06-03 19:21:12,380+0200 WARN  (periodic/1) [throttled] MOM not available, KSM stats will be missing. (throttledlog:105)
2017-06-03 19:21:14,714+0200 WARN  (jsonrpc/1) [storage.StorageServer.MountConnection] Using user specified backup-volfile-servers option (storageServer:253)
2017-06-03 19:21:15,515+0200 ERROR (jsonrpc/4) [storage.initSANLock] Cannot initialize SANLock for domain a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922 (clusterlock:238)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 234, in initSANLock
    sanlock.init_lockspace(sdUUID, idsPath)
SanlockException: (107, 'Sanlock lockspace init failure', 'Transport endpoint is not connected')
2017-06-03 19:21:15,515+0200 WARN  (jsonrpc/4) [storage.StorageDomainManifest] lease did not initialize successfully (sd:557)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sd.py", line 552, in initDomainLock
    self._domainLock.initLock(self.getDomainLease())
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 271, in initLock
    initSANLock(self._sdUUID, self._idsPath, lease)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 239, in initSANLock
    raise se.ClusterLockInitError()
ClusterLockInitError: Could not initialize cluster lock: ()
2017-06-03 19:21:37,867+0200 ERROR (jsonrpc/2) [storage.StoragePool] Create pool hosted_datacenter canceled  (sp:655)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 652, in create
    self.attachSD(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 971, in attachSD
    dom.acquireHostId(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
    self._manifest.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
    self._domainLock.acquireHostId(hostId, async)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 297, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))
2017-06-03 19:21:37,870+0200 ERROR (jsonrpc/2) [storage.StoragePool] Domain ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 detach from MSD ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 Ver 1 failed. (sp:528)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 525, in __cleanupDomains
    self.detachSD(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1046, in detachSD
    raise se.CannotDetachMasterStorageDomain(sdUUID)
CannotDetachMasterStorageDomain: Illegal action: (u'ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047',)
2017-06-03 19:21:37,872+0200 ERROR (jsonrpc/2) [storage.StoragePool] Domain a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922 detach from MSD ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 Ver 1 failed. (sp:528)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 525, in __cleanupDomains
    self.detachSD(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1043, in detachSD
    self.validateAttachedDomain(dom)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 542, in validateAttachedDomain
    self.validatePoolSD(dom.sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 535, in validatePoolSD
    raise se.StorageDomainNotMemberOfPool(self.spUUID, sdUUID)
StorageDomainNotMemberOfPool: Domain is not member in pool: u'pool=a1e7e9dd-0cf4-41ae-ba13-36297ed66309, domain=a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922'
2017-06-03 19:21:40,063+0200 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='a2476a33-26f8-4ebd-876d-02fe5d13ef78') Unexpected error (task:870)
Traceback (most recent call last):
 File "/usr/share/vdsm/storage/task.py", line 877, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 52, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 959, in createStoragePool
    leaseParams)
  File "/usr/share/vdsm/storage/sp.py", line 652, in create
    self.attachSD(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 971, in attachSD
    dom.acquireHostId(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
    self._manifest.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
    self._domainLock.acquireHostId(hostId, async)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 297, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))
2017-06-03 19:21:40,067+0200 ERROR (jsonrpc/2) [storage.Dispatcher] {'status': {'message': "Cannot acquire host id: (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))", 'code': 661}} (dispatcher:77)


Seems that vdsm has 512 block size hard coded and fails with 4K disks. 

Attached also the trace of following command: 

strace -y -ff -o /root/512-trace-on-root.log dd if=/dev/zero of=/mnt/test2.img oflag=direct bs=512 count=1

Gluster volume was mounted at /mnt for the strace.

Comment 1 Allon Mureinik 2017-06-08 15:29:16 UTC

Nir, AFAIK, this is something we don't support?

Comment 2 Nir Soffer 2017-06-08 15:39:13 UTC

(In reply to Allon Mureinik from comment #1)
> Nir, AFAIK, this is something we don't support?

We do not support 4K disks, but sanlock does support it. Not sure if the system can
work in this setup.

Maybe the issue is glusterfs, not exposing the fact that the underlying storage is
using 4K sectors, so sanlock fail to use it writing 512 bytes sectors?

David, can you suggest how to debug this?

Comment 3 David Teigland 2017-06-08 16:43:04 UTC

sanlock gets EINVAL (-22) from the read at offset 127488 on /mnt

a5a6b0e7 aio collect RD 0x7f59b00008c0:0x7f59b00008d0:0x7f59b0101000 result -22:0 match res
[23522]: read_sectors delta_leader offset 127488 rv -22
/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids

dd gets ENOTCONN from the write at offset 0 on /mnt

write(1</mnt/test2.img>, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = -1 ENOTCONN (Transport endpoint is not connected)

The different errors raises the question about whether the state of /mnt was the same in both cases.

When sanlock uses files on a file system, it always uses 512 byte direct i/o.  It doesn't have a way of knowing what the disk sector size is under nfs or gluster.  It's also not clear what restrictions the fs imposes on file i/o based on the underlying disk sector size.

Comment 4 Alex Kaouris 2017-06-09 10:05:51 UTC

Seems that every 512 sized operation fails on the file system. 
The file system is XFS and alignment is ok. 

I was using 5 disks (with 4K physical sector) in RAID5. The RAID had 64K stripe size. I have prepared the storage as below:

pvcreate --dataalignment 256K /dev/sda4
vgcreate --physicalextentsize 256K gluster /dev/sda4

lvcreate -n engine --size 120G gluster
mkfs.xfs -f -i size=512 /dev/gluster/engine


Some dd tests below: 

Testing on the gluster mount:
dd if=/dev/zero of=/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/test2.img oflag=direct bs=512 count=1
dd: error writing β/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/test2.imgβ: Transport endpoint is not connected
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00336755 s, 0.0 kB/s

Testing on the /root directory (XFS):
dd if=/dev/zero of=/test2.img oflag=direct bs=512 count=1
dd: error writing β/test2.imgβ: Invalid argument
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000321239 s, 0.0 kB/s

Maybe, instead of having it hard coded as 512, vdsm could get the sector size with blockdev --getss.

Comment 5 Nir Soffer 2017-06-09 15:02:18 UTC

Since both vdsm and sanlock do not support 4k disks when using network file system,
I think this can be an RFE for future version.

Comment 6 Allon Mureinik 2017-06-12 09:14:08 UTC

(In reply to Nir Soffer from comment #5)
> Since both vdsm and sanlock do not support 4k disks when using network file
> system,
> I think this can be an RFE for future version.

Agreed.
Yaniv L. - we can discuss priorities on this, but ATM, I don't see this happening in 4.2.

*** This bug has been marked as a duplicate of bug 798947 ***

Note You need to log in before you can comment on or make changes to this bug.