Bug 1674360 - oVirt Error creating a storage domain (On Cisco UCS Only)
Summary: oVirt Error creating a storage domain (On Cisco UCS Only)
Keywords:
Status: CLOSED DUPLICATE of bug 798947
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.20.17
Hardware: x86_64
OS: Linux
unspecified
high with 1 vote
Target Milestone: ovirt-4.3.5
: ---
Assignee: Nir Soffer
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-11 06:28 UTC by Nico Kruger
Modified: 2019-06-19 12:45 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-06-19 12:45:55 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
full system logs from var/logs (777.33 KB, application/zip)
2019-02-11 06:28 UTC, Nico Kruger
no flags Details

Description Nico Kruger 2019-02-11 06:28:19 UTC
Created attachment 1528880 [details]
full system logs from var/logs

Description of problem: oVirt Error creating a storage domain (On Cisco UCS Only)


Version-Release number of selected component (if applicable): oVirt 4.2


How reproducible: perfectly each time reproducible 


Steps to Reproduce:
1. Install oVirt Node using node-ng iso: https://resources.ovirt.org/pub/ovirt-4.3/iso/ovirt-node-ng-installer/

2. Follow guide to deploy hyperconverged cluster: https://www.ovirt.org/documentation/gluster-hyperconverged/chap-Single_node_hyperconverged.html

3. Proceed to creating hosted-engine. Hosted-engine deployment will fail when creating shared storage

Actual results:[ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is
"[Error creating a storage domain]". HTTP response code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg":
"Fault reason is \"Operation Failed\". Fault detail is \"[Error
creating a storage domain]\". HTTP response code is 400."}


Expected results: Not to fail. 


Additional info: I have created a article on oVirt forums for this with more details and findings from redhat software engineers
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/4TDNVOXEA6B4YXGHGZAGB4AT53OKZ7FG/

Full system logs are attached.

Please not this only happens on Cisco UCS servers, i have had success when using other hardware brands

Comment 1 Sandro Bonazzola 2019-02-12 09:12:09 UTC
Dominic, can you please review this bug?

Comment 2 Dominik Holler 2019-02-12 09:33:28 UTC
According to vdsm.log there is the error 'Invalid argument' during dd tries to format the index for lockspace.
Nir, do you have an idea why this is failing?





2019-01-27 10:55:10,968+0400 INFO  (jsonrpc/6) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'status': 0, 'id': u'7af1e739-db3a-46f0-b5b5-62abb9da02fb'}]} from=::ffff:192.168.124.101,34866, flow_id=717fdcaf, task_id=2d93cc27-f77a-46d3-bfe9-72c6d4582d56 (api:52)
2019-01-27 10:55:10,968+0400 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call StoragePool.connectStorageServer succeeded in 0.00 seconds (__init__:573)
2019-01-27 10:55:10,975+0400 INFO  (jsonrpc/1) [vdsm.api] START createStorageDomain(storageType=7, sdUUID=u'e06a2edb-3b14-4459-b185-e8cd2db93b2c', domainName=u'hosted_storage', typeSpecificArg=u'z1ppvh05.pki.darkmatter.ae:/engine', domClass=1, domVersion=u'4', options=None) from=::ffff:192.168.124.101,34866, flow_id=717fdcaf, task_id=4331548a-3158-4a9b-a146-763db3551484 (api:46)
2019-01-27 10:55:11,226+0400 INFO  (jsonrpc/1) [storage.StorageDomain] sdUUID=e06a2edb-3b14-4459-b185-e8cd2db93b2c domainName=hosted_storage remotePath=z1ppvh05.pki.darkmatter.ae:/engine domClass=1 (nfsSD:70)
2019-01-27 10:55:11,238+0400 INFO  (jsonrpc/1) [IOProcessClient] (e06a2edb-3b14-4459-b185-e8cd2db93b2c) Starting client (__init__:308)
2019-01-27 10:55:11,246+0400 INFO  (ioprocess/33031) [IOProcess] (e06a2edb-3b14-4459-b185-e8cd2db93b2c) Starting ioprocess (__init__:434)
2019-01-27 10:55:11,273+0400 INFO  (jsonrpc/1) [storage.StorageDomain] Creating domain metadata directory u'/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md' (fileSD:450)
2019-01-27 10:55:11,292+0400 INFO  (jsonrpc/1) [storage.xlease] Formatting index for lockspace u'e06a2edb-3b14-4459-b185-e8cd2db93b2c' (version=1) (xlease:653)
2019-01-27 10:55:11,318+0400 INFO  (jsonrpc/1) [vdsm.api] FINISH createStorageDomain error=Command ['/usr/bin/dd', 'iflag=fullblock', u'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases', 'oflag=direct,seek_bytes', 'seek=1048576', 'bs=256512', 'count=1', 'conv=notrunc,nocreat,fsync'] failed with rc=1 out='[suppressed]' err="/usr/bin/dd: error writing '/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\n" from=::ffff:192.168.124.101,34866, flow_id=717fdcaf, task_id=4331548a-3158-4a9b-a146-763db3551484 (api:50)
2019-01-27 10:55:11,318+0400 ERROR (jsonrpc/1) [storage.TaskManager.Task] (Task='4331548a-3158-4a9b-a146-763db3551484') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in createStorageDomain
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2608, in createStorageDomain
    storageType, domVersion)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 87, in create
    remotePath, storageType, version)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 466, in _prepareMetadata
    cls.format_external_leases(sdUUID, xleases_path)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 1219, in format_external_leases
    xlease.format_index(lockspace, backend)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py", line 661, in format_index
    index.dump(file)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py", line 761, in dump
    file.pwrite(INDEX_BASE, self._buf)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py", line 994, in pwrite
    self._run(args, data=buf[:])
  File "/usr/lib/python2.7/site-packages/vdsm/storage/xlease.py", line 1011, in _run
    raise cmdutils.Error(args, rc, "[suppressed]", err)
Error: Command ['/usr/bin/dd', 'iflag=fullblock', u'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases', 'oflag=direct,seek_bytes', 'seek=1048576', 'bs=256512', 'count=1', 'conv=notrunc,nocreat,fsync'] failed with rc=1 out='[suppressed]' err="/usr/bin/dd: error writing '/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\n"
2019-01-27 10:55:11,318+0400 INFO  (jsonrpc/1) [storage.TaskManager.Task] (Task='4331548a-3158-4a9b-a146-763db3551484') aborting: Task is aborted: u'Command [\'/usr/bin/dd\', \'iflag=fullblock\', u\'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases\', \'oflag=direct,seek_bytes\', \'seek=1048576\', \'bs=256512\', \'count=1\', \'conv=notrunc,nocreat,fsync\'] failed with rc=1 out=\'[suppressed]\' err="/usr/bin/dd: error writing \'/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases\': Invalid argument\\n1+0 records in\\n0+0 records out\\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\\n"' - code 100 (task:1181)
2019-01-27 10:55:11,318+0400 ERROR (jsonrpc/1) [storage.Dispatcher] FINISH createStorageDomain error=Command ['/usr/bin/dd', 'iflag=fullblock', u'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases', 'oflag=direct,seek_bytes', 'seek=1048576', 'bs=256512', 'count=1', 'conv=notrunc,nocreat,fsync'] failed with rc=1 out='[suppressed]' err="/usr/bin/dd: error writing '/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\n" (dispatcher:86)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/dispatcher.py", line 73, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 108, in wrapper
    return m(self, *a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 1189, in prepare
    raise self.error
Error: Command ['/usr/bin/dd', 'iflag=fullblock', u'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases', 'oflag=direct,seek_bytes', 'seek=1048576', 'bs=256512', 'count=1', 'conv=notrunc,nocreat,fsync'] failed with rc=1 out='[suppressed]' err="/usr/bin/dd: error writing '/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases': Invalid argument\n1+0 records in\n0+0 records out\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\n"
2019-01-27 10:55:11,319+0400 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call StorageDomain.create failed (error 351) in 0.34 seconds (__init__:573)

Comment 3 Nir Soffer 2019-02-12 16:24:25 UTC
(In reply to Dominik Holler from comment #2)
> Error: Command ['/usr/bin/dd', 'iflag=fullblock',
> u'of=/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/
> e06a2edb-3b14-4459-b185-e8cd2db93b2c/dom_md/xleases',
> 'oflag=direct,seek_bytes', 'seek=1048576', 'bs=256512', 'count=1',
> 'conv=notrunc,nocreat,fsync'] failed with rc=1 out='[suppressed]'
> err="/usr/bin/dd: error writing
> '/rhev/data-center/mnt/glusterSD/z1ppvh05.pki.darkmatter.ae:_engine/e06a2edb-
> 3b14-4459-b185-e8cd2db93b2c/dom_md/xleases': Invalid argument\n1+0 records
> in\n0+0 records out\n0 bytes (0 B) copied, 0.00666646 s, 0.0 kB/s\n"

We are writing 256512 bytes at offset 1048576. This write is aligned
to 512 bytes, but the underlying storage may use 4k drives, or vdo, with
4k alignment.

We don't support 4k yet. It is expected to be supported in 4.3.z and only
on specific setups (e.g. RHHI) and only with certain storage (glusterfs).

Denis, can you help with detecting the underling block size of this 
gluster based setup?

Comment 4 Denis Chaplygin 2019-02-18 10:44:27 UTC
Something like dd if=/dev/zero of=/tmp/test iflag=fullblock bs=512 count=1 should fail on 4k only device, and changing bs to 4096 should work

Comment 5 Tal Nisan 2019-06-19 12:45:55 UTC

*** This bug has been marked as a duplicate of bug 798947 ***


Note You need to log in before you can comment on or make changes to this bug.