Bug 1820283
Summary: | Error creating storage domain via vdsm | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Kaustav Majumder <kmajumde> | ||||
Component: | Gluster | Assignee: | Nir Soffer <nsoffer> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | SATHEESARAN <sasundar> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 4.40.11 | CC: | bugs, eshenitz, godas, michal.skrivanek, nsoffer, sasundar | ||||
Target Milestone: | ovirt-4.4.0 | Keywords: | Regression | ||||
Target Release: | --- | Flags: | sasundar:
ovirt-4.4?
sasundar: blocker? sasundar: planning_ack? sasundar: devel_ack? sasundar: testing_ack? |
||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1821288 (view as bug list) | Environment: | |||||
Last Closed: | 2020-05-20 20:01:41 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1821288 | ||||||
Attachments: |
|
Description
Kaustav Majumder
2020-04-02 16:30:23 UTC
Can you describe how ovirt-ansible-hosted-engine-setup uses engine or vdsm? Does it use vdsm API directly or via engine? If via engine, what are the minimal call sequence that reproduce this issue? Is this reproducible via engine UI? if it is, what are the steps to reproduce? If you cannot answer these questions, what are the steps to reproduce using the mentioned ansible script? We need something that a developer can run to reproduce the issue locally. The error is encountered while deploying ovirt + gluster via cockpit ui, exactly during hosted engine setup. This is a call to an ansible playbook https://github.com/gluster/gluster-ansible/tree/master/playbooks/hc-ansible-deployment. This internally calls ovirt-ansible-hosted-engine-setup role. The playbook errors at https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/create_storage_domain.yml#L56 The role calls ovirt engine which is deployed as a temp vm HostedEngineLocal. Checking engine logs points to failure of creating Storage domain ->https://pastebin.com/35JQyuGE [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [6ac6e81f] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM tendrl25.lab.eng.blr.redhat.com command CreateStorageDomainVDS failed: Error creating a storage domain: ('storageType=7, sdUUID=508909e2-8a8d-4e3f-a1ea-f3b0c8dcc4f8, domainName=hosted_storage, domClass=1, typeSpecificArg=tendrl25.lab.eng.blr.redhat.com:/engine domVersion=5block_size=0, max_hosts=250',) The corresponding VDSM verb errors out at "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2644, in createStorageDomain max_hosts=max_hosts) I have not tested it via engine ui. Will update bug if it is reproducible. (In reply to Kaustav Majumder from comment #2) > The error is encountered while deploying ovirt + gluster via cockpit ui, > exactly during hosted engine setup. > This is a call to an ansible playbook > https://github.com/gluster/gluster-ansible/tree/master/playbooks/hc-ansible- > deployment. > This internally calls ovirt-ansible-hosted-engine-setup role. > The playbook errors at > https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/ > create_storage_domain.yml#L56 > > The role calls ovirt engine which is deployed as a temp vm HostedEngineLocal. > Checking engine logs points to failure of creating Storage domain > ->https://pastebin.com/35JQyuGE > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (default task-1) [6ac6e81f] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), > VDSM tendrl25.lab.eng.blr.redhat.com command CreateStorageDomainVDS failed: > Error creating a storage domain: ('storageType=7, > sdUUID=508909e2-8a8d-4e3f-a1ea-f3b0c8dcc4f8, domainName=hosted_storage, > domClass=1, typeSpecificArg=tendrl25.lab.eng.blr.redhat.com:/engine > domVersion=5block_size=0, max_hosts=250',) > > The corresponding VDSM verb errors out at > "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2644, in > createStorageDomain > max_hosts=max_hosts) > I have not tested it via engine ui. Will update bug if it is reproducible. Gluster storage domain can be created via engine ui with no errors. Gobinda, are you familiar with such failure? How's our CI doing? Hi Yaniv, This issue We found during ovirt 4.4.0 Beta testing. Still we are struggling to make OST hc master suite to pass because of different different issues like infra,move centos7 to centos8 etc. OST is not green yet for 4.4.0 but we are working hard to make it success. I'm not aware of any issue with regular HE deployment (in this are at least) and since it works later on...could it be that not all of gluster is set up/running correctly at that point? Hi Michal, Gluster setup was as expected, we tried via cockpit based deployment and gluster volumes were up and running. This issue is the blocker for HC deployment Nir, I have run the script that you attached in bug 1751722 that probes for the block size of the storage domain. <snip> import sys import atexit from ioprocess import IOProcess dir_path = sys.argv[1] iop = IOProcess(timeout=10) atexit.register(iop.close) try: while True: block_size = iop.probe_block_size(dir_path) sys.stdout.write("{}\n".format(block_size)) sys.stdout.flush() except KeyboardInterrupt: pass </snip> When I run this on RHV 4.3.9 on gluster fuse mount, it works good: [root@ ~]# python probe.py /mnt/test 512 512 When I run this on RHV 4.4 build on gluster fuse mount, I get error: [root@ ~]# python3 probe.py /mnt/test Traceback (most recent call last): File "probe.py", line 26, in <module> block_size = iop.probe_block_size(dir_path) File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 602, in probe_block_size "probe_block_size", {"dir": dir_path}, self.timeout) File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 448, in _sendCommand raise OSError(errcode, errstr) FileNotFoundError: [Errno 2] No such file or directory @Nir, what's the change in behavior ? SATHEESARAN, looks like you are using old ioprocess. On my rhel 8.2 with RHV 4.4.0 I have: $ rpm -q python3-ioprocess ioprocess vdsm python3-ioprocess-1.4.0-1.el8ev.x86_64 ioprocess-1.4.0-1.el8ev.x86_64 vdsm-4.40.11-8.gitd903d0944.el8.x86_64 Looking in the ioprocess git: master: 38022f4 (HEAD -> master, origin/master, origin/HEAD, gerrit/master, gluster-shard) ioprocess: Fix compatibility with Gluster shard d31a211 (tag: v1.4.0) build: Don't add githash and tag for tagged build ovirt-4.3: 10e8d1b (HEAD -> ovirt-4.3, tag: v1.3.1, origin/ovirt-4.3) Bump version to 1.3.1 da79694 build: Don't add githash and tag for tagged build d3587de (gerrit/ovirt-4.3, backport/4.3/gluster-shard) ioprocess: Fix compatibility with Gluster shard d2d9272 (backport/4.3/automation) automation: Use ovirt-4.3 branch for 4.3 4f2b839 build: update build documentation 37425d6 automation: Drop check-merged scripts 9556594 (tag: v1.3.0) spec: Bump version to 1.3.0 So looks like the fix was never released for ioprocess. This explains the issue. Eyal, can you release ioprocess 1.4.1 from current master? This is vdsm bug, and vdsm should require ioproess 1.4.1. Moving back to POST. The root cause is bug 1753901. For some reason we don't have a 4.4 version for this bug and ioprocess was not released for el8. (In reply to Nir Soffer from comment #14) > The root cause is bug 1753901. For some reason we don't have a 4.4 version > for > this bug and ioprocess was not released for el8. Nir, Let us know the corresponding vdsm and ioprocess package is available, I can do the quick test to make sure, it works good. (In reply to SATHEESARAN from comment #15) The vdsm patch was merged today, so the fix should be available in the next 4.4. build. Tested with following components: python3-ioprocess-1.4.1-1.el8ev.x86_64 ioprocess-1.4.1-1.el8ev.x86_64 vdsm-client-4.40.13-1.el8ev.noarch vdsm-common-4.40.13-1.el8ev.noarch vdsm-hook-fcoe-4.40.13-1.el8ev.noarch vdsm-api-4.40.13-1.el8ev.noarch vdsm-hook-openstacknet-4.40.13-1.el8ev.noarch vdsm-network-4.40.13-1.el8ev.x86_64 vdsm-jsonrpc-4.40.13-1.el8ev.noarch vdsm-hook-vmfex-dev-4.40.13-1.el8ev.noarch vdsm-yajsonrpc-4.40.13-1.el8ev.noarch vdsm-python-4.40.13-1.el8ev.noarch vdsm-4.40.13-1.el8ev.x86_64 vdsm-hook-vhostmd-4.40.13-1.el8ev.noarch vdsm-hook-ethtool-options-4.40.13-1.el8ev.noarch vdsm-http-4.40.13-1.el8ev.noarch vdsm-gluster-4.40.13-1.el8ev.x86_64 Now glusterfs storage domain can be mounted successfully, during HE deployment. There is still one another issue, where the Hosted Engine VM is unable to boot, for which I will raise a separate bug This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |