Bug 1823033 - Storage domain does not exist when another folder with similar name exists
Summary: Storage domain does not exist when another folder with similar name exists
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.3.9.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Tal Nisan
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-11 03:43 UTC by Strahil Nikolov
Modified: 2020-04-11 09:50 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-04-11 09:50:53 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
VDSM DEBUG Log from SPM (50.14 KB, text/plain)
2020-04-11 03:43 UTC, Strahil Nikolov
no flags Details

Description Strahil Nikolov 2020-04-11 03:43:57 UTC
Created attachment 1677968 [details]
VDSM DEBUG Log from SPM

Description of problem:
Engine is failing to activate/detach the domain if there is a second directory with similar name:
[root@ovirt3 gluster1:_data__fast4]# ll
общо 1
drwxrwxr-x. 5 vdsm kvm 48 10 ное 13,42 578bca3d-6540-41cd-8e0e-9e3047026484
drwxrwxr-x. 5 vdsm kvm 48 10 ное 13,42 578bca3d-6540-41cd-8e0e-9e3047026484-NEW

[root@ovirt3 gluster1:_data__fast]# ll
общо 1
drwxrwxrwx. 6 vdsm kvm 59 11 апр  6,30 396604d9-2a9e-49cd-9563-fdc79981f67b
drwxr-xr-x. 5 vdsm kvm 48 19 ное 21,32 396604d9-2a9e-49cd-9563-fdc79981f67b-OLD


Version-Release number of selected component (if applicable):
vdsm-api-4.30.43-1.el7.noarch
vdsm-4.30.43-1.el7.x86_64
vdsm-http-4.30.43-1.el7.noarch
vdsm-hook-openstacknet-4.30.43-1.el7.noarch
vdsm-yajsonrpc-4.30.43-1.el7.noarch
vdsm-jsonrpc-4.30.43-1.el7.noarch
vdsm-hook-fcoe-4.30.43-1.el7.noarch
vdsm-hook-vhostmd-4.30.43-1.el7.noarch
vdsm-network-4.30.43-1.el7.x86_64
vdsm-hook-vmfex-dev-4.30.43-1.el7.noarch
vdsm-hook-ethtool-options-4.30.43-1.el7.noarch
vdsm-python-4.30.43-1.el7.noarch
vdsm-client-4.30.43-1.el7.noarch
vdsm-common-4.30.43-1.el7.noarch
vdsm-gluster-4.30.43-1.el7.x86_64


How reproducible:
Always - 4 storage domains were affected

Steps to Reproduce:
1. Set a domain into maintenance
2. Create a copy with the name <SD uuid>-NEW
3. Replace the old and the new:
mv <SD uuid> <SD uuid>-old
mv <SD uuid>-new <SD uuid>
4. Try to activate the domain

Actual results:
Fail to activate the domain with traceback:

2020-04-11 06:27:17,434+0300 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='8ad73c21-cc9c-44ec-ab83-ced16d0bf748') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in activateStorageDomain
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1261, in activateStorageDomain
    pool.activateSD(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1138, in activateSD
    dom = sdCache.produce(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce
    domain.getRealDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 176, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'396604d9-2a9e-49cd-9563-fdc79981f67b',)

Expected results:
Engine to get the <SD uuid> from the DB and look for an EXACT match and not for anything that contains the <SD uuid>

Additional info:
Debug logs attached.

Comment 1 Nir Soffer 2020-04-11 09:50:53 UTC
This is caused by searching for storage domain directories using this
pattern:

    /rhev/data-center/mnt/glusterSD/*-*-*-*-*/dom_md

We expect to find exactly one item per mountpoint, since this is the
directory structure we create.

We don't support user created files or directories in a storage domain
mount.

To perform operations described in comment 0, you can use this directory
structure instead:

    
    578bca3d-6540-41cd-8e0e-9e3047026484
    new/578bca3d-6540-41cd-8e0e-9e3047026484

    396604d9-2a9e-49cd-9563-fdc79981f67b
    old/396604d9-2a9e-49cd-9563-fdc79981f67b

But I would avoid this. Instead you can do this on the server side. Assuming
that the directory structure on the NFS server is:

    /export/
        data_fast4/
            578bca3d-6540-41cd-8e0e-9e3047026484/
                dom_md/

Create the copy of the domain at:

    /export/
        data_fast4/
            578bca3d-6540-41cd-8e0e-9e3047026484/
                dom_md/
        data_fast4-new/
            578bca3d-6540-41cd-8e0e-9e3047026484/
                dom_md/

With this oVirt cannot be affected since it does not see /export/data_fast4-new,
but the operations on the file system on the server side are the same.

Closing since this is not a bug.


Note You need to log in before you can comment on or make changes to this bug.