Previously, when attempting to activate five or six storage domains at the same time, refreshStoragePool attempted to create a folder that already existed, which caused an "OSError [Errno 17]" to be thrown. This issue was caused by a race condition in fileUtils.createdir(). An updated version of VDSM now creates the target directory; if the target directory already exists, VDSM ensures that the directory has the correct permissions, and "OSError [Errno 17]" is not thrown.
VDSM: refreshStoragePool attempted to mkdir an exists folder and failed with 'OSError: [Errno 17] File exists:'.
Description:
***********
I worked on scale environment with 40 hosts and 100 iSCSI Storage-Domains (EMC Machine), When attempted to activate several (5-6) Storage Domains at the same time, refreshStoragePool attempted to mkdir folder that was already exist It returned OSError [Errno 17].
Note:
*******
might be related - the SSL request of the refreshStoragePool command took long time (more than 4 minutes) it failed on timeout with "BrokenPipe" Error (same Thread-38098 at vdsm.log).
VDSM.log
************
Thread-38098::ERROR::2012-07-09 08:35:34,901::task::853::TaskManager.Task::(_setError) Task=`43caf957-5dda-46e0-932e-6699ba021304`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 861, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 699, in refreshStoragePool
pool.refresh(msdUUID, masterVersion)
File "/usr/share/vdsm/storage/sp.py", line 1178, in refresh
self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
File "/usr/share/vdsm/storage/sp.py", line 1139, in __rebuild
fileUtils.createdir(os.path.join(domaindir, sd.DOMAIN_META_DATA))
File "/usr/share/vdsm/storage/fileUtils.py", line 149, in createdir
os.makedirs(dirPath)
File "/usr/lib64/python2.6/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 17] File exists: '/rhev/data-center/mnt/blockSD/3191acd9-0fc6-4270-99b3-7775fa70ed15/dom_md'
', '35459266-07dd-4f15-92c1-f6e36e9b851a', 1) {} failed - stopping task
Thread-38098::DEBUG::2012-07-09 08:35:34,905::task::1199::TaskManager.Task::(stop) Task=`43caf957-5dda-46e0-932e-6699ba021304`::stopping in state preparing (force False)
Thread-38098::DEBUG::2012-07-09 08:35:34,905::task::978::TaskManager.Task::(_decref) Task=`43caf957-5dda-46e0-932e-6699ba021304`::ref 1 aborting True
Thread-38098::INFO::2012-07-09 08:35:34,906::task::1157::TaskManager.Task::(prepare) Task=`43caf957-5dda-46e0-932e-6699ba021304`::aborting: Task is aborted: u"[Errno 17] File exists: '/rhev/data-center/mnt/blockSD
/3191acd9-0fc6-4270-99b3-7775fa70ed15/dom_md'" - code 100
Thread-38098::DEBUG::2012-07-09 08:35:34,906::task::1162::TaskManager.Task::(prepare) Task=`43caf957-5dda-46e0-932e-6699ba021304`::Prepare: aborted: [Errno 17] File exists: '/rhev/data-center/mnt/blockSD/3191acd9-
0fc6-4270-99b3-7775fa70ed15/dom_md'
Thread-38098::DEBUG::2012-07-09 08:35:34,906::task::978::TaskManager.Task::(_decref) Task=`43caf957-5dda-46e0-932e-6699ba021304`::ref 0 aborting True
Thread-38098::DEBUG::2012-07-09 08:35:34,907::task::913::TaskManager.Task::(_doAbort) Task=`43caf957-5dda-46e0-932e-6699ba021304`::Task._doAbort: force False
Thread-38098::DEBUG::2012-07-09 08:35:34,907::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-38098::DEBUG::2012-07-09 08:35:34,907::task::588::TaskManager.Task::(_updateState) Task=`43caf957-5dda-46e0-932e-6699ba021304`::moving from state preparing -> state aborting
Thread-38098::DEBUG::2012-07-09 08:35:34,907::task::537::TaskManager.Task::(__state_aborting) Task=`43caf957-5dda-46e0-932e-6699ba021304`::_aborting: recover policy none
Thread-38098::DEBUG::2012-07-09 08:35:34,908::task::588::TaskManager.Task::(_updateState) Task=`43caf957-5dda-46e0-932e-6699ba021304`::moving from state aborting -> state failed
Thread-38098::DEBUG::2012-07-09 08:35:34,908::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.50d275f3-2029-4fd9-bc86-20333f3cb24f': < ResourceRef 'Storag
e.50d275f3-2029-4fd9-bc86-20333f3cb24f', isValid: 'True' obj: 'None'>}
Thread-38098::DEBUG::2012-07-09 08:35:34,908::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-38098::DEBUG::2012-07-09 08:35:34,909::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'Storage.50d275f3-2029-4fd9-bc86-20333f3cb24f'
Thread-38098::DEBUG::2012-07-09 08:35:34,909::resourceManager::553::ResourceManager::(releaseResource) Released resource 'Storage.50d275f3-2029-4fd9-bc86-20333f3cb24f' (0 active users)
Thread-38098::DEBUG::2012-07-09 08:35:34,909::resourceManager::558::ResourceManager::(releaseResource) Resource 'Storage.50d275f3-2029-4fd9-bc86-20333f3cb24f' is free, finding out if anyone is waiting for it.
Thread-38098::DEBUG::2012-07-09 08:35:34,910::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.50d275f3-2029-4fd9-bc86-20333f3cb24f', Clearing records.
Thread-38098::ERROR::2012-07-09 08:35:34,910::dispatcher::69::Storage.Dispatcher.Protect::(run) [Errno 17] File exists: '/rhev/data-center/mnt/blockSD/3191acd9-0fc6-4270-99b3-7775fa70ed15/dom_md'
Traceback (most recent call last):
File "/usr/share/vdsm/storage/dispatcher.py", line 61, in run
result = ctask.prepare(self.func, *args, **kwargs)
File "/usr/share/vdsm/storage/task.py", line 1164, in prepare
raise self.error
OSError: [Errno 17] File exists: '/rhev/data-center/mnt/blockSD/3191acd9-0fc6-4270-99b3-7775fa70ed15/dom_md'
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHSA-2012-1508.html