Bug 1280225

Summary: when having broken symlink the in a ISO storage domain the whole domain is listed as empty and so other images got hidden
Product: [oVirt] vdsm Reporter: Simone Tiraboschi <stirabos>
Component: GeneralAssignee: Idan Shaby <ishaby>
Status: CLOSED DUPLICATE QA Contact: Aharon Canan <acanan>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.17.10CC: amureini, bugs, laravot, nsoffer, s.danzi, tnisan
Target Milestone: ovirt-3.6.2Flags: amureini: ovirt-3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: 4.17.10   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 12:35:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Simone Tiraboschi 2015-11-11 09:30:16 UTC
Description of problem:
It has been reported by an upstream user.
He was using an NFS ISO storage domain; by fault he manually created a symlink to an ISO file outside that storage domain on the host where he was exporting the NFS share for the ISO storage domain.
Of course this is absolutely wrong cause NFS server is not following the symlink and the NFS client (out VDSM host) will simply try to locally follow it and if the same file doesn't exists in the same position on all the host we just got a broken symlink.

Then the subsequent issue and so this bug:
If just one of the file is not accessible (the broken symlink in that case), the whole StorageDomain.getFileStats raises an exception and so the whole task got aborted and the engine shows an empty storage domain while other files are there.
Probably catching that exception and showing other valid files makes it more usable.
Currently this appears in the events area: Refresh image list failed for domain(s): testiso (All file type). Please check domain activity.

Thread-40273::DEBUG::2015-11-11 10:03:44,309::__init__::481::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'StorageDomain.getFileStats' in bridge with {u'caseSensitive': False, u'pattern': u'*.iso', u'storagedomainID': u'1ffcd41b-b204-456f-be0e-1b22cd94da9f'}
Thread-40273::DEBUG::2015-11-11 10:03:44,310::task::595::Storage.TaskManager.Task::(_updateState) Task=`a1b68de1-2bdb-4aac-a40f-c980d3ecc5ea`::moving from state init -> state preparing
Thread-40273::INFO::2015-11-11 10:03:44,310::logUtils::44::dispatcher::(wrapper) Run and protect: getFileStats(sdUUID=u'1ffcd41b-b204-456f-be0e-1b22cd94da9f', pattern=u'*.iso', caseSensitive=False, options=None)
Thread-40273::DEBUG::2015-11-11 10:03:44,310::resourceManager::198::Storage.ResourceManager.Request::(__init__) ResName=`Storage.1ffcd41b-b204-456f-be0e-1b22cd94da9f`ReqID=`08685966-75bf-471f-b911-a5a43430f9f7`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '2317' at 'getFileStats'
Thread-40273::DEBUG::2015-11-11 10:03:44,311::resourceManager::542::Storage.ResourceManager::(registerResource) Trying to register resource 'Storage.1ffcd41b-b204-456f-be0e-1b22cd94da9f' for lock type 'shared'
Thread-40273::DEBUG::2015-11-11 10:03:44,311::resourceManager::601::Storage.ResourceManager::(registerResource) Resource 'Storage.1ffcd41b-b204-456f-be0e-1b22cd94da9f' is free. Now locking as 'shared' (1 active user)
Thread-40273::DEBUG::2015-11-11 10:03:44,311::resourceManager::238::Storage.ResourceManager.Request::(grant) ResName=`Storage.1ffcd41b-b204-456f-be0e-1b22cd94da9f`ReqID=`08685966-75bf-471f-b911-a5a43430f9f7`::Granted request
Thread-40273::DEBUG::2015-11-11 10:03:44,311::task::827::Storage.TaskManager.Task::(resourceAcquired) Task=`a1b68de1-2bdb-4aac-a40f-c980d3ecc5ea`::_resourcesAcquired: Storage.1ffcd41b-b204-456f-be0e-1b22cd94da9f (shared)
Thread-40273::DEBUG::2015-11-11 10:03:44,311::task::993::Storage.TaskManager.Task::(_decref) Task=`a1b68de1-2bdb-4aac-a40f-c980d3ecc5ea`::ref 1 aborting False
Thread-40273::ERROR::2015-11-11 10:03:44,313::task::866::Storage.TaskManager.Task::(_setError) Task=`a1b68de1-2bdb-4aac-a40f-c980d3ecc5ea`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2324, in getFileStats
    caseSensitive=caseSensitive)
  File "/usr/share/vdsm/storage/fileSD.py", line 271, in getFileList
    filesList = self.oop.simpleWalk(basedir)
  File "/usr/share/vdsm/storage/outOfProcess.py", line 373, in simpleWalk
    if osPath.isdir(fullpath) and not osPath.islink(fullpath):
  File "/usr/share/vdsm/storage/outOfProcess.py", line 286, in isdir
    res = self._iop.stat(path)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 414, in stat
    resdict = self._sendCommand("stat", {"path": path}, self.timeout)
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 391, in _sendCommand
    raise OSError(errcode, errstr)
OSError: [Errno 13] Permission denied
Thread-40273::DEBUG::2015-11-11 10:03:44,338::task::885::Storage.TaskManager.Task::(_run) Task=`a1b68de1-2bdb-4aac-a40f-c980d3ecc5ea`::Task._run: a1b68de1-2bdb-4aac-a40f-c980d3ecc5ea (u'1ffcd41b-b204-456f-be0e-1b22cd94da9f', u'*.iso', False) {} failed - stopping task
Thread-40273::DEBUG::2015-11-11 10:03:44,338::task::1217::Storage.TaskManager.Task::(stop) Task=`a1b68de1-2bdb-4aac-a40f-c980d3ecc5ea`::stopping in state preparing (force False)
Thread-40273::DEBUG::2015-11-11 10:03:44,338::task::993::Storage.TaskManager.Task::(_decref) Task=`a1b68de1-2bdb-4aac-a40f-c980d3ecc5ea`::ref 1 aborting True
Thread-40273::INFO::2015-11-11 10:03:44,338::task::1171::Storage.TaskManager.Task::(prepare) Task=`a1b68de1-2bdb-4aac-a40f-c980d3ecc5ea`::aborting: Task is aborted: u'[Errno 13] Permission denied' - code 100
Thread-40273::DEBUG::2015-11-11 10:03:44,338::task::1176::Storage.TaskManager.Task::(prepare) Task=`a1b68de1-2bdb-4aac-a40f-c980d3ecc5ea`::Prepare: aborted: [Errno 13] Permission denied



Version-Release number of selected component (if applicable):
4.17.10

How reproducible:
100%

Steps to Reproduce:
1. create an NFS ISO storage domain, add valid ISO there
2. manually create a broken symlink within that NFS share
3. Try to refresh the ISO storage domain images list

Actual results:
the whole storage domain appears as empty, other valid images got hidden

Expected results:
a specific error got reported about the broken image specifying its name, other valid images are still usable

Additional info:

Comment 1 Allon Mureinik 2015-11-11 11:51:47 UTC
Liron, haven't you looked into a similar case in the past?

Comment 2 Liron Aravot 2015-11-19 10:13:38 UTC
this is a duplicate of 
https://bugzilla.redhat.com/show_bug.cgi?id=1130024

Comment 3 Liron Aravot 2015-11-19 10:17:53 UTC
we can consider to improve the user experience in that case (for example, display some meaningful message), on the other hand...this issue is fairly rare - Allon, up to you.

Comment 4 Allon Mureinik 2015-11-19 12:35:41 UTC

*** This bug has been marked as a duplicate of bug 1130024 ***