Bug 1823033

Summary: Storage domain does not exist when another folder with similar name exists
Product: [oVirt] ovirt-engine Reporter: Strahil Nikolov <hunter86_bg>
Component: BLL.StorageAssignee: Tal Nisan <tnisan>
Status: CLOSED NOTABUG QA Contact: Avihai <aefrat>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.3.9.4CC: bugs, nsoffer
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-11 09:50:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
VDSM DEBUG Log from SPM none

Description Strahil Nikolov 2020-04-11 03:43:57 UTC
Created attachment 1677968 [details]
VDSM DEBUG Log from SPM

Description of problem:
Engine is failing to activate/detach the domain if there is a second directory with similar name:
[root@ovirt3 gluster1:_data__fast4]# ll
общо 1
drwxrwxr-x. 5 vdsm kvm 48 10 ное 13,42 578bca3d-6540-41cd-8e0e-9e3047026484
drwxrwxr-x. 5 vdsm kvm 48 10 ное 13,42 578bca3d-6540-41cd-8e0e-9e3047026484-NEW

[root@ovirt3 gluster1:_data__fast]# ll
общо 1
drwxrwxrwx. 6 vdsm kvm 59 11 апр  6,30 396604d9-2a9e-49cd-9563-fdc79981f67b
drwxr-xr-x. 5 vdsm kvm 48 19 ное 21,32 396604d9-2a9e-49cd-9563-fdc79981f67b-OLD


Version-Release number of selected component (if applicable):
vdsm-api-4.30.43-1.el7.noarch
vdsm-4.30.43-1.el7.x86_64
vdsm-http-4.30.43-1.el7.noarch
vdsm-hook-openstacknet-4.30.43-1.el7.noarch
vdsm-yajsonrpc-4.30.43-1.el7.noarch
vdsm-jsonrpc-4.30.43-1.el7.noarch
vdsm-hook-fcoe-4.30.43-1.el7.noarch
vdsm-hook-vhostmd-4.30.43-1.el7.noarch
vdsm-network-4.30.43-1.el7.x86_64
vdsm-hook-vmfex-dev-4.30.43-1.el7.noarch
vdsm-hook-ethtool-options-4.30.43-1.el7.noarch
vdsm-python-4.30.43-1.el7.noarch
vdsm-client-4.30.43-1.el7.noarch
vdsm-common-4.30.43-1.el7.noarch
vdsm-gluster-4.30.43-1.el7.x86_64


How reproducible:
Always - 4 storage domains were affected

Steps to Reproduce:
1. Set a domain into maintenance
2. Create a copy with the name <SD uuid>-NEW
3. Replace the old and the new:
mv <SD uuid> <SD uuid>-old
mv <SD uuid>-new <SD uuid>
4. Try to activate the domain

Actual results:
Fail to activate the domain with traceback:

2020-04-11 06:27:17,434+0300 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='8ad73c21-cc9c-44ec-ab83-ced16d0bf748') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in activateStorageDomain
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1261, in activateStorageDomain
    pool.activateSD(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1138, in activateSD
    dom = sdCache.produce(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
    domain.getRealDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 176, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'396604d9-2a9e-49cd-9563-fdc79981f67b',)

Expected results:
Engine to get the <SD uuid> from the DB and look for an EXACT match and not for anything that contains the <SD uuid>

Additional info:
Debug logs attached.

Comment 1 Nir Soffer 2020-04-11 09:50:53 UTC
This is caused by searching for storage domain directories using this
pattern:

    /rhev/data-center/mnt/glusterSD/*-*-*-*-*/dom_md

We expect to find exactly one item per mountpoint, since this is the
directory structure we create.

We don't support user created files or directories in a storage domain
mount.

To perform operations described in comment 0, you can use this directory
structure instead:

    
    578bca3d-6540-41cd-8e0e-9e3047026484
    new/578bca3d-6540-41cd-8e0e-9e3047026484

    396604d9-2a9e-49cd-9563-fdc79981f67b
    old/396604d9-2a9e-49cd-9563-fdc79981f67b

But I would avoid this. Instead you can do this on the server side. Assuming
that the directory structure on the NFS server is:

    /export/
        data_fast4/
            578bca3d-6540-41cd-8e0e-9e3047026484/
                dom_md/

Create the copy of the domain at:

    /export/
        data_fast4/
            578bca3d-6540-41cd-8e0e-9e3047026484/
                dom_md/
        data_fast4-new/
            578bca3d-6540-41cd-8e0e-9e3047026484/
                dom_md/

With this oVirt cannot be affected since it does not see /export/data_fast4-new,
but the operations on the file system on the server side are the same.

Closing since this is not a bug.