Bug 1674360

Summary: oVirt Error creating a storage domain (On Cisco UCS Only)
Product: [oVirt] vdsm Reporter: Nico Kruger <nico.kruger>
Component: GeneralAssignee: Nir Soffer <nsoffer>
Status: CLOSED DUPLICATE QA Contact: Lukas Svaty <lsvaty>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.20.17CC: bugs, dholler, frolland, nsoffer, tnisan
Target Milestone: ovirt-4.3.5   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-19 12:45:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
full system logs from var/logs none

Description Nico Kruger 2019-02-11 06:28:19 UTC
Created attachment 1528880 [details]
full system logs from var/logs

Description of problem: oVirt Error creating a storage domain (On Cisco UCS Only)


Version-Release number of selected component (if applicable): oVirt 4.2


How reproducible: perfectly each time reproducible 


Steps to Reproduce:
1. Install oVirt Node using node-ng iso: https://resources.ovirt.org/pub/ovirt-4.3/iso/ovirt-node-ng-installer/

2. Follow guide to deploy hyperconverged cluster: https://www.ovirt.org/documentation/gluster-hyperconverged/chap-Single_node_hyperconverged.html

3. Proceed to creating hosted-engine. Hosted-engine deployment will fail when creating shared storage

Actual results:[ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is
"[Error creating a storage domain]". HTTP response code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg":
"Fault reason is \"Operation Failed\". Fault detail is \"[Error
creating a storage domain]\". HTTP response code is 400."}


Expected results: Not to fail. 


Additional info: I have created a article on oVirt forums for this with more details and findings from redhat software engineers
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/4TDNVOXEA6B4YXGHGZAGB4AT53OKZ7FG/

Full system logs are attached.

Please not this only happens on Cisco UCS servers, i have had success when using other hardware brands

Comment 1 Sandro Bonazzola 2019-02-12 09:12:09 UTC
Dominic, can you please review this bug?

Comment 2 Dominik Holler 2019-02-12 09:33:28 UTC
According to vdsm.log there is the error 'Invalid argument' during dd tries to format the index for lockspace.
Nir, do you have an idea why this is failing?





2019-01-27 10:55:10,968+0400 INFO  (jsonrpc/6) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'status': 0, 'id': u'7af1e739-db3a-46f0-b5b5-62abb9da02fb'}]} from=::ffff:192.168.124.101,34866, flow_id=717fdcaf, task_id=2d93cc27-f77a-46d3-bfe9-72c6d4582d56 (api:52)
2019-01-27 10:55:10,968+0400 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call StoragePool.connectStorageServer succeeded in 0.00 seconds (__init__:573)
2019-01-27 10:55:10,975+0400 INFO  (jsonrpc/1) [vdsm.api] START createStorageDomain(storageType=7, sdUUID=u'e06a2edb-3b14-4459-b185-e8cd2db93b2c', domainName=u'hosted_storage', typeSpecificArg=u'z1ppvh05.pki.darkmatter.ae:/engine', domClass=1, domVersion=u'4', options=None) from=::ffff:192.168.124.101,34866, flow_id=717fdcaf, task_id=4331548a-3158-4a9b-a146-763db3551484 (api:46)
2019-01-27 10:55:11,226+0400 INFO  (jsonrpc/1) [storage.StorageDomain] sdUUID=e06a2edb-3b14-4459-b185-e8cd2db93b2c domainName=hosted_storage remotePath=z1ppvh05.pki.darkmatter.ae:/engine domClass=1 (nfsSD:70)
2019-01-27 10:55:11,238+0400 INFO  (jsonrpc/1) [IOProcessClient] (e06a2edb-3b14-4459-b185-e8cd2db93b2c) Starting client (__init__:308)
2019-01-27 10:55:11,246+0400 INFO  (ioprocess/33031) [IOProcess] (e06a2edb-3b14-4459-b185-e8cd2db93b2c) Starting ioprocess (__init__:434)
2019-01-27 10:55:11,273+0400 INFO  (jsonrpc/1) [storage.StorageDomain] Creating domain metadata directory u'/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md' (fileSD:450)
2019-01-27 10:55:11,292+0400 INFO  (jsonrpc/1) [storage.xlease] Formatting index for lockspace u'e06a2edb-3b14-4459-b185-e8cd2db93b2c' (version=1) (xlease:653)
2019-01-27 10:55:11,318+0400 INFO  (jsonrpc/1) [vdsm.api] FINISH createStorageDomain error=Command ['/usr/bin/dd', 'iflag=fullblock', u'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases', 'oflag=direct,seek_bytes', 'seek=1048576', 'bs=256512', 'count=1', 'conv=notrunc,nocreat,fsync'] failed with rc=1 out='[suppressed]' err="/usr/bin/dd: error writing '/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\n" from=::ffff:192.168.124.101,34866, flow_id=717fdcaf, task_id=4331548a-3158-4a9b-a146-763db3551484 (api:50)
2019-01-27 10:55:11,318+0400 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='4331548a-3158-4a9b-a146-763db3551484') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in createStorageDomain
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2608, in createStorageDomain
    storageType, domVersion)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 87, in create
    remotePath, storageType, version)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 466, in _prepareMetadata
    cls.format_external_leases(sdUUID, xleases_path)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 1219, in format_external_leases
    xlease.format_index(lockspace, backend)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py", line 661, in format_index
    index.dump(file)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py", line 761, in dump
    file.pwrite(INDEX_BASE, self._buf)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py", line 994, in pwrite
    self._run(args, data=buf[:])
  File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py", line 1011, in _run
    raise cmdutils.Error(args, rc, "[suppressed]", err)
Error: Command ['/usr/bin/dd', 'iflag=fullblock', u'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases', 'oflag=direct,seek_bytes', 'seek=1048576', 'bs=256512', 'count=1', 'conv=notrunc,nocreat,fsync'] failed with rc=1 out='[suppressed]' err="/usr/bin/dd: error writing '/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\n"
2019-01-27 10:55:11,318+0400 INFO  (jsonrpc/1) [storage.TaskManager.Task] (Task='4331548a-3158-4a9b-a146-763db3551484') aborting: Task is aborted: u'Command [\'/usr/bin/dd\', \'iflag=fullblock\', u\'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases\', \'oflag=direct,seek_bytes\', \'seek=1048576\', \'bs=256512\', \'count=1\', \'conv=notrunc,nocreat,fsync\'] failed with rc=1 out=\'[suppressed]\' err="/usr/bin/dd: error writing \'/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases\': Invalid argument\\n1+0 records in\\n0+0 records out\\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\\n"' - code 100 (task:1181)
2019-01-27 10:55:11,318+0400 ERROR (jsonrpc/1) [storage.Dispatcher] FINISH createStorageDomain error=Command ['/usr/bin/dd', 'iflag=fullblock', u'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases', 'oflag=direct,seek_bytes', 'seek=1048576', 'bs=256512', 'count=1', 'conv=notrunc,nocreat,fsync'] failed with rc=1 out='[suppressed]' err="/usr/bin/dd: error writing '/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\n" (dispatcher:86)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 73, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, in prepare
    raise self.error
Error: Command ['/usr/bin/dd', 'iflag=fullblock', u'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases', 'oflag=direct,seek_bytes', 'seek=1048576', 'bs=256512', 'count=1', 'conv=notrunc,nocreat,fsync'] failed with rc=1 out='[suppressed]' err="/usr/bin/dd: error writing '/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\n"
2019-01-27 10:55:11,319+0400 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.create failed (error 351) in 0.34 seconds (__init__:573)

Comment 3 Nir Soffer 2019-02-12 16:24:25 UTC
(In reply to Dominik Holler from comment #2)
> Error: Command ['/usr/bin/dd', 'iflag=fullblock',
> u'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/
> e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases',
> 'oflag=direct,seek_bytes', 'seek=1048576', 'bs=256512', 'count=1',
> 'conv=notrunc,nocreat,fsync'] failed with rc=1 out='[suppressed]'
> err="/usr/bin/dd: error writing
> '/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-
> 3b14-4459-b185-e8cd2db93b2c/dom_md/xleases': Invalid argument\n1+0 records
> in\n0+0 records out\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\n"

We are writing 256512 bytes at offset 1048576. This write is aligned
to 512 bytes, but the underlying storage may use 4k drives, or vdo, with
4k alignment.

We don't support 4k yet. It is expected to be supported in 4.3.z and only
on specific setups (e.g. RHHI) and only with certain storage (glusterfs).

Denis, can you help with detecting the underling block size of this 
gluster based setup?

Comment 4 Denis Chaplygin 2019-02-18 10:44:27 UTC
Something like dd if=/dev/zero of=/tmp/test iflag=fullblock bs=512 count=1 should fail on 4k only device, and changing bs to 4096 should work

Comment 5 Tal Nisan 2019-06-19 12:45:55 UTC

*** This bug has been marked as a duplicate of bug 798947 ***