Bug 1820283

Summary: Error creating storage domain via vdsm
Product: [oVirt] vdsm Reporter: Kaustav Majumder <kmajumde>
Component: GlusterAssignee: Nir Soffer <nsoffer>
Status: CLOSED CURRENTRELEASE QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: high    
Version: 4.40.11CC: bugs, eshenitz, godas, michal.skrivanek, nsoffer, sasundar
Target Milestone: ovirt-4.4.0Keywords: Regression
Target Release: ---Flags: sasundar: ovirt-4.4?
sasundar: blocker?
sasundar: planning_ack?
sasundar: devel_ack?
sasundar: testing_ack?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1821288 (view as bug list) Environment:
Last Closed: 2020-05-20 20:01:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1821288    
Attachments:
Description Flags
vdsm.log none

Description Kaustav Majumder 2020-04-02 16:30:23 UTC
Created attachment 1675826 [details]
vdsm.log

Created attachment 1675826 [details]
vdsm.log

Description of problem:
VDSM errors out while creating a new Gluster Storage Domain.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.From ovirt-ansible-hosted-engine-setup create a new Gluster Storage Domain
2.
3.

Actual results:
Vdsm errors out and storage Domain is not created.
ERROR (jsonrpc/3) [storage.Dispatcher] FINISH createStorageDomain error=[Errno 2] No such file or directory (dispatcher:87)

Expected results:
 Gluster Storage Domain should be created with no vdsm errors.

Additional info:

No directory created at /rhev/data-center/mnt/glusterSD and hence gluster volume is also not mounted although the log shows otherwise.

Comment 1 Nir Soffer 2020-04-02 17:00:32 UTC
Can you describe how ovirt-ansible-hosted-engine-setup uses engine 
or vdsm?

Does it use vdsm API directly or via engine? If via engine, what are
the minimal call sequence that reproduce this issue?

Is this reproducible via engine UI? if it is, what are the steps 
to reproduce?

If you cannot answer these questions, what are the steps to reproduce
using the mentioned ansible script? We need something that a developer
can run to reproduce the issue locally.

Comment 2 Kaustav Majumder 2020-04-02 17:13:33 UTC
The error is encountered while deploying ovirt + gluster via cockpit ui, exactly during hosted engine setup.
This is a call to an ansible playbook https://github.com/gluster/gluster-ansible/tree/master/playbooks/hc-ansible-deployment.
This internally calls  ovirt-ansible-hosted-engine-setup role.
The playbook errors at https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/create_storage_domain.yml#L56

The role calls ovirt engine which is deployed as a temp vm HostedEngineLocal.
Checking engine logs points to failure of creating Storage domain ->https://pastebin.com/35JQyuGE
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [6ac6e81f] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM tendrl25.lab.eng.blr.redhat.com command CreateStorageDomainVDS failed: Error creating a storage domain: ('storageType=7, sdUUID=508909e2-8a8d-4e3f-a1ea-f3b0c8dcc4f8, domainName=hosted_storage, domClass=1, typeSpecificArg=tendrl25.lab.eng.blr.redhat.com:/engine domVersion=5block_size=0, max_hosts=250',)

The corresponding VDSM verb errors out at "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2644, in createStorageDomain
    max_hosts=max_hosts)
I have not tested it via engine ui. Will update bug if it is reproducible.

Comment 3 Kaustav Majumder 2020-04-02 17:23:04 UTC
(In reply to Kaustav Majumder from comment #2)
> The error is encountered while deploying ovirt + gluster via cockpit ui,
> exactly during hosted engine setup.
> This is a call to an ansible playbook
> https://github.com/gluster/gluster-ansible/tree/master/playbooks/hc-ansible-
> deployment.
> This internally calls  ovirt-ansible-hosted-engine-setup role.
> The playbook errors at
> https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/
> create_storage_domain.yml#L56
> 
> The role calls ovirt engine which is deployed as a temp vm HostedEngineLocal.
> Checking engine logs points to failure of creating Storage domain
> ->https://pastebin.com/35JQyuGE
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (default task-1) [6ac6e81f] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802),
> VDSM tendrl25.lab.eng.blr.redhat.com command CreateStorageDomainVDS failed:
> Error creating a storage domain: ('storageType=7,
> sdUUID=508909e2-8a8d-4e3f-a1ea-f3b0c8dcc4f8, domainName=hosted_storage,
> domClass=1, typeSpecificArg=tendrl25.lab.eng.blr.redhat.com:/engine
> domVersion=5block_size=0, max_hosts=250',)
> 
> The corresponding VDSM verb errors out at
> "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2644, in
> createStorageDomain
>     max_hosts=max_hosts)
> I have not tested it via engine ui. Will update bug if it is reproducible.
Gluster storage domain can be created via engine ui with no errors.

Comment 4 Yaniv Kaul 2020-04-02 18:58:39 UTC
Gobinda, are you familiar with such failure? How's our CI doing?

Comment 5 Gobinda Das 2020-04-03 06:55:56 UTC
Hi Yaniv,
 This issue We found during ovirt 4.4.0 Beta testing. Still we are struggling to make OST hc master suite to pass because of different different issues like infra,move centos7 to centos8 etc. OST is not green yet for 4.4.0 but we are working hard to make it success.

Comment 6 Michal Skrivanek 2020-04-03 12:47:02 UTC
I'm not aware of any issue with regular HE deployment (in this are at least) and since it works later on...could it be that not all of gluster is set up/running correctly at that point?

Comment 7 Gobinda Das 2020-04-06 13:51:23 UTC
Hi Michal,
 Gluster setup was as expected, we tried via cockpit based deployment and gluster volumes were up and running.

Comment 9 SATHEESARAN 2020-04-08 02:53:55 UTC
This issue is the blocker for HC deployment

Comment 10 SATHEESARAN 2020-04-08 03:28:17 UTC
Nir,

I have run the script that you attached in bug 1751722 that probes for the block size of the storage domain.

<snip>
import sys
import atexit

from ioprocess import IOProcess

dir_path = sys.argv[1]

iop = IOProcess(timeout=10)
atexit.register(iop.close)

try:
    while True:
        block_size = iop.probe_block_size(dir_path)
        sys.stdout.write("{}\n".format(block_size))
        sys.stdout.flush()
except KeyboardInterrupt:
    pass
</snip>


When I run this on RHV 4.3.9 on gluster fuse mount, it works good:
[root@ ~]# python probe.py /mnt/test
512
512

When I run this on RHV 4.4 build on gluster fuse mount, I get error:
[root@ ~]# python3 probe.py /mnt/test
Traceback (most recent call last):
  File "probe.py", line 26, in <module>
    block_size = iop.probe_block_size(dir_path)
  File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 602, in probe_block_size
    "probe_block_size", {"dir": dir_path}, self.timeout)
  File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 448, in _sendCommand
    raise OSError(errcode, errstr)
FileNotFoundError: [Errno 2] No such file or directory

@Nir, what's the change in behavior ?

Comment 11 Nir Soffer 2020-04-08 07:02:09 UTC
SATHEESARAN, looks like you are using old ioprocess.

On my rhel 8.2 with RHV 4.4.0 I have:

$ rpm -q python3-ioprocess ioprocess vdsm
python3-ioprocess-1.4.0-1.el8ev.x86_64
ioprocess-1.4.0-1.el8ev.x86_64
vdsm-4.40.11-8.gitd903d0944.el8.x86_64

Looking in the ioprocess git:

master:
38022f4 (HEAD -> master, origin/master, origin/HEAD, gerrit/master, gluster-shard) ioprocess: Fix compatibility with Gluster shard
d31a211 (tag: v1.4.0) build: Don't add githash and tag for tagged build


ovirt-4.3:
10e8d1b (HEAD -> ovirt-4.3, tag: v1.3.1, origin/ovirt-4.3) Bump version to 1.3.1
da79694 build: Don't add githash and tag for tagged build
d3587de (gerrit/ovirt-4.3, backport/4.3/gluster-shard) ioprocess: Fix compatibility with Gluster shard
d2d9272 (backport/4.3/automation) automation: Use ovirt-4.3 branch for 4.3
4f2b839 build: update build documentation
37425d6 automation: Drop check-merged scripts
9556594 (tag: v1.3.0) spec: Bump version to 1.3.0

So looks like the fix was never released for ioprocess. This explains the issue.

Eyal, can you release ioprocess 1.4.1 from current master?

Comment 13 Nir Soffer 2020-04-08 10:10:49 UTC
This is vdsm bug, and vdsm should require ioproess 1.4.1.

Moving back to POST.

Comment 14 Nir Soffer 2020-04-08 10:25:11 UTC
The root cause is bug 1753901. For some reason we don't have a 4.4 version for
this bug and ioprocess was not released for el8.

Comment 15 SATHEESARAN 2020-04-08 13:24:13 UTC
(In reply to Nir Soffer from comment #14)
> The root cause is bug 1753901. For some reason we don't have a 4.4 version
> for
> this bug and ioprocess was not released for el8.

Nir,

Let us know the corresponding vdsm and ioprocess package is available,
I can do the quick test to make sure, it works good.

Comment 16 Nir Soffer 2020-04-09 10:28:01 UTC
(In reply to SATHEESARAN from comment #15)
The vdsm patch was merged today, so the fix should be available
in the next 4.4. build.

Comment 22 SATHEESARAN 2020-04-13 16:14:24 UTC
Tested with following components:
python3-ioprocess-1.4.1-1.el8ev.x86_64
ioprocess-1.4.1-1.el8ev.x86_64

vdsm-client-4.40.13-1.el8ev.noarch
vdsm-common-4.40.13-1.el8ev.noarch
vdsm-hook-fcoe-4.40.13-1.el8ev.noarch
vdsm-api-4.40.13-1.el8ev.noarch
vdsm-hook-openstacknet-4.40.13-1.el8ev.noarch
vdsm-network-4.40.13-1.el8ev.x86_64
vdsm-jsonrpc-4.40.13-1.el8ev.noarch
vdsm-hook-vmfex-dev-4.40.13-1.el8ev.noarch
vdsm-yajsonrpc-4.40.13-1.el8ev.noarch
vdsm-python-4.40.13-1.el8ev.noarch
vdsm-4.40.13-1.el8ev.x86_64

vdsm-hook-vhostmd-4.40.13-1.el8ev.noarch
vdsm-hook-ethtool-options-4.40.13-1.el8ev.noarch
vdsm-http-4.40.13-1.el8ev.noarch
vdsm-gluster-4.40.13-1.el8ev.x86_64

Now glusterfs storage domain can be mounted successfully, during HE deployment.

There is still one another issue, where the Hosted Engine VM is unable to boot, for which
I will raise a separate bug

Comment 23 Sandro Bonazzola 2020-05-20 20:01:41 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.