(Cloning to handle uninformative messages; original bug will be for inconsistent behavior.
To verify, follow the directions below _except chmod to 700 instead of 777_. You will get an error that will now inform you of bad permissions instead of a general error about failing to create the domain.
+++ This bug was initially created as a clone of Bug #1112171 +++
Description of problem:
Creating a sd which is owned by root returns a general error 100 instead of a more informative error message the first try, and succeeds the second try.
Version-Release number of selected component (if applicable):
vdsm-4.14.6-0.el6.x86_64
How reproducible:
100%
Steps to Reproduce:
1. Create a directory:
[root@dhcp-2-166 storage]# mkdir sd_owned_by_root
2. chmod it:
[root@dhcp-2-166 storage]# chmod 777 sd_owned_by_root/
3. Create a local sd in the UI (gets the error).
4. Click OK again (works).
Actual results:
See above.
Expected results:
Should fail consistently with an informative error message.
Additional info:
--- Additional comment from Allon Mureinik on 2015-07-07 09:35:36 EDT ---
--- Additional comment from Allon Mureinik on 2015-07-13 04:35:25 EDT ---
Should also be relevant to NFS.
--- Additional comment from Greg Padgett on 2015-08-05 15:34:51 EDT ---
Proposed fix will cause the following error to be displayed in the UI when the file storage permissions are wrong (checked both local and nfs storage):
Error while executing action Add Storage Connection: Permission settings on the specified path do not allow access to the storage.
Verify permission settings on the specified storage path.
--- Additional comment from Nir Soffer on 2015-08-05 16:10:48 EDT ---
(In reply to Idan Shaby from comment #0)
> Created attachment 911366[details]
> VDSM log file
>
> Description of problem:
> Creating a sd which is owned by root returns a general error 100 instead of
> a more informative error message the first try, and succeeds the second try.
>
> Version-Release number of selected component (if applicable):
> vdsm-4.14.6-0.el6.x86_64
This is a rather old version, can you reproduce it with 4.17?
> 3. Create a local sd in the UI (gets the error).
What error?
> 4. Click OK again (works).> Expected results:
> Should fail consistently with an informative error message.
If this works in the second time, vdsm probably can cope with the
existing permissions, so it is not clear why it should fail.
Where is vdsm log showing the actual error?
--- Additional comment from Greg Padgett on 2015-08-05 16:37:02 EDT ---
(In reply to Nir Soffer from comment #4)
> (In reply to Idan Shaby from comment #0)
> > Created attachment 911366[details]
> > VDSM log file
> >
> > Description of problem:
> > Creating a sd which is owned by root returns a general error 100 instead of
> > a more informative error message the first try, and succeeds the second try.
> >
> > Version-Release number of selected component (if applicable):
> > vdsm-4.14.6-0.el6.x86_64
>
> This is a rather old version, can you reproduce it with 4.17?
>
I did
> > 3. Create a local sd in the UI (gets the error).
>
> What error?
>
I don't recall the wording, something useless about a general error. VDSM log of the error:
Thread-41103::ERROR::2015-08-04 18:09:04,223::hsm::2464::Storage.HSM::(connectStorageServer) Could not connect to storageServer
Traceback (most recent call last):
File "/usr/share/vdsm/storage/hsm.py", line 2461, in connectStorageServer
conObj.connect()
File "/usr/share/vdsm/storage/storageServer.py", line 591, in connect
os.chmod(lnPath, 0o775)
OSError: [Errno 1] Operation not permitted: '/rhev/data-center/mnt/_storage_local'
Thread-41103::DEBUG::2015-08-04 18:09:04,223::hsm::2483::Storage.HSM::(connectStorageServer) knownSDs: {20b5566c-6bc3-499c-9dcc-696888c21d7d: storage.nfsSD.findDomain, bf77b1a4-7bc5-41f7-9e11-2b76e884a2cb: storage.nfsSD.findDomain, 624eb942-111e-4b96-b508-b78a84df3957: storage.nfsSD.findDomain, a1b43220-a08f-4f7a-be4a-e3529448802c: storage.blockSD.findDomain, 09bb7379-e5b3-4400-9aa5-f43924c543fd: storage.nfsSD.findDomain}
Thread-41103::INFO::2015-08-04 18:09:04,223::logUtils::51::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 100, 'id': u'00000000-0000-0000-0000-000000000000'}]}
> > 4. Click OK again (works).
>
> > Expected results:
> > Should fail consistently with an informative error message.
>
> If this works in the second time, vdsm probably can cope with the
> existing permissions, so it is not clear why it should fail.
>
> Where is vdsm log showing the actual error?
It didn't work the second time for me.
Instead, I received a different error... I guess because part of the SD data was present from the first attempt, having made it partially through the creation process before failing.
Here's my log of the second error:
Thread-41118::ERROR::2015-08-04 18:10:02,352::sdc::144::Storage.StorageDomainCache::(_findDomain) domain 25c961a4-9bf3-4e11-ba77-c0bd56e0710e not found
Traceback (most recent call last):
File "/usr/share/vdsm/storage/sdc.py", line 142, in _findDomain
dom = findMethod(sdUUID)
File "/usr/share/vdsm/storage/sdc.py", line 172, in _findUnfetchedDomain
raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'25c961a4-9bf3-4e11-ba77-c0bd56e0710e',)
Thread-41118::INFO::2015-08-04 18:10:02,353::localFsSD::73::Storage.StorageDomain::(create) sdUUID=25c961a4-9bf3-4e11-ba77-c0bd56e0710e domainName=local remotePath=/storage/local domClass=1
Thread-41118::DEBUG::2015-08-04 18:10:02,358::__init__::298::IOProcessClient::(_run) Starting IOProcess...
Thread-41118::ERROR::2015-08-04 18:10:02,364::task::866::Storage.TaskManager.Task::(_setError) Task=`ab7907ce-74a1-4329-9080-ca139db56eea`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 873, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 2695, in createStorageDomain
domVersion)
File "/usr/share/vdsm/storage/localFsSD.py", line 84, in create
cls._preCreateValidation(sdUUID, mntPoint, remotePath, version)
File "/usr/share/vdsm/storage/localFsSD.py", line 51, in _preCreateValidation
fileSD.validateFileSystemFeatures(sdUUID, domPath)
File "/usr/share/vdsm/storage/fileSD.py", line 89, in validateFileSystemFeatures
oop.getProcessPool(sdUUID).directTouch(testFilePath)
File "/usr/share/vdsm/storage/outOfProcess.py", line 350, in directTouch
ioproc.touch(path, flags, mode)
File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 507, in touch
self.timeout)
File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 391, in _sendCommand
raise OSError(errcode, errstr)
OSError: [Errno 13] Permission denied
Thread-41118::DEBUG::2015-08-04 18:10:02,364::task::885::Storage.TaskManager.Task::(_run) Task=`ab7907ce-74a1-4329-9080-ca139db56eea`::Task._run: ab7907ce-74a1-4329-9080-ca139db56eea (4, u'25c961a4-9bf3-4e11-ba77-c0bd56e0710e', u'local', u'/storage/local', 1, u'3') {} failed - stopping task
Thread-41118::DEBUG::2015-08-04 18:10:02,364::task::1246::Storage.TaskManager.Task::(stop) Task=`ab7907ce-74a1-4329-9080-ca139db56eea`::stopping in state preparing (force False)
Thread-41118::DEBUG::2015-08-04 18:10:02,364::task::993::Storage.TaskManager.Task::(_decref) Task=`ab7907ce-74a1-4329-9080-ca139db56eea`::ref 1 aborting True
Thread-41118::INFO::2015-08-04 18:10:02,364::task::1171::Storage.TaskManager.Task::(prepare) Task=`ab7907ce-74a1-4329-9080-ca139db56eea`::aborting: Task is aborted: u'[Errno 13] Permission denied' - code 100
Thread-41118::DEBUG::2015-08-04 18:10:02,364::task::1176::Storage.TaskManager.Task::(prepare) Task=`ab7907ce-74a1-4329-9080-ca139db56eea`::Prepare: aborted: [Errno 13] Permission denied
Thread-41118::DEBUG::2015-08-04 18:10:02,364::task::993::Storage.TaskManager.Task::(_decref) Task=`ab7907ce-74a1-4329-9080-ca139db56eea`::ref 0 aborting True
Thread-41118::DEBUG::2015-08-04 18:10:02,365::task::928::Storage.TaskManager.Task::(_doAbort) Task=`ab7907ce-74a1-4329-9080-ca139db56eea`::Task._doAbort: force False
Thread-41118::DEBUG::2015-08-04 18:10:02,365::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-41118::DEBUG::2015-08-04 18:10:02,365::task::595::Storage.TaskManager.Task::(_updateState) Task=`ab7907ce-74a1-4329-9080-ca139db56eea`::moving from state preparing -> state aborting
Thread-41118::DEBUG::2015-08-04 18:10:02,365::task::550::Storage.TaskManager.Task::(__state_aborting) Task=`ab7907ce-74a1-4329-9080-ca139db56eea`::_aborting: recover policy none
Thread-41118::DEBUG::2015-08-04 18:10:02,365::task::595::Storage.TaskManager.Task::(_updateState) Task=`ab7907ce-74a1-4329-9080-ca139db56eea`::moving from state aborting -> state failed
Thread-41118::DEBUG::2015-08-04 18:10:02,365::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-41118::DEBUG::2015-08-04 18:10:02,365::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-41118::ERROR::2015-08-04 18:10:02,365::dispatcher::79::Storage.Dispatcher::(wrapper) [Errno 13] Permission denied
Traceback (most recent call last):
File "/usr/share/vdsm/storage/dispatcher.py", line 71, in wrapper
result = ctask.prepare(func, *args, **kwargs)
File "/usr/share/vdsm/storage/task.py", line 104, in wrapper
return m(self, *a, **kw)
File "/usr/share/vdsm/storage/task.py", line 1179, in prepare
raise self.error
OSError: [Errno 13] Permission denied
--- Additional comment from Nir Soffer on 2015-08-05 16:55:57 EDT ---
(In reply to Greg Padgett from comment #5)
> > > 4. Click OK again (works).
> >
> > > Expected results:
> > > Should fail consistently with an informative error message.
> >
> > If this works in the second time, vdsm probably can cope with the
> > existing permissions, so it is not clear why it should fail.
> >
> > Where is vdsm log showing the actual error?
>
> It didn't work the second time for me.
Did you use the same permissions used by Idan?
The results you get makes sense, we fail with EPERM when permissions are wrong.
Can you verify that the unlikely flow (failing first time, succeeding in
the second time) described in the bug description does not happen?
--- Additional comment from Greg Padgett on 2015-08-05 17:29:37 EDT ---
(In reply to Nir Soffer from comment #6)
> Did you use the same permissions used by Idan?
>
> The results you get makes sense, we fail with EPERM when permissions are
> wrong.
>
> Can you verify that the unlikely flow (failing first time, succeeding in
> the second time) described in the bug description does not happen?
Actually I found out how to reproduce that behavior. The results above are applicable to directories having root:root ownership with 0755 permissions (or less).
If the permissions are 0777, it fails with OSError on the single os.chmod call in storageServer.py and then succeeds a second time. If I remove the call to chmod, it creates a storage domain without complaint.
The addition of that call was done in a huge commit 93390aab8381bff869bd029bf121f6a1f3761e8e where Saggi did some refactoring and, it seems, functional changes. If we check the permissions and they are adequate, do we need this chmod at all?
---
Also just for the record, here are the two UI errors for a directory with insufficient permissions:
1st: Error while executing action Add Storage Connection: General Exception
2nd: Error while executing action New Local Storage Domain: Error creating a storage domain
--- Additional comment from Nir Soffer on 2015-08-05 17:53:30 EDT ---
(In reply to Greg Padgett from comment #7)
> (In reply to Nir Soffer from comment #6)
> If the permissions are 0777, it fails with OSError on the single os.chmod
> call in storageServer.py and then succeeds a second time. If I remove the
> call to chmod, it creates a storage domain without complaint.
Sound very strange - if the chmod call failed, how this succeeds in the second
time?
Can you explain how this does not fail in the second call?
> The addition of that call was done in a huge commit
> 93390aab8381bff869bd029bf121f6a1f3761e8e where Saggi did some refactoring
> and, it seems, functional changes. If we check the permissions and they are
> adequate, do we need this chmod at all?
Removing bad behavior changes which were part of huge "refactoring" worked
well for us in the past. I would check first this direction.
--- Additional comment from Greg Padgett on 2015-08-05 18:11:30 EDT ---
(In reply to Nir Soffer from comment #8)
> (In reply to Greg Padgett from comment #7)
> > (In reply to Nir Soffer from comment #6)
> > If the permissions are 0777, it fails with OSError on the single os.chmod
> > call in storageServer.py and then succeeds a second time. If I remove the
> > call to chmod, it creates a storage domain without complaint.
>
> Sound very strange - if the chmod call failed, how this succeeds in the
> second
> time?
>
> Can you explain how this does not fail in the second call?
Sure. In storageServer.py, LocalDirectoryConnection.connect() checks for the symlink before trying to create and chmod the symlinkcalling symlink and chmod. After the first failure the link exists, so the second time it doesn't get to the chmod and happily carries on.
> > The addition of that call was done in a huge commit
> > 93390aab8381bff869bd029bf121f6a1f3761e8e where Saggi did some refactoring
> > and, it seems, functional changes. If we check the permissions and they are
> > adequate, do we need this chmod at all?
>
> Removing bad behavior changes which were part of huge "refactoring" worked
> well for us in the past. I would check first this direction.
Seems to work, but I should do more verification. I think the existing patch and this new change would resolve the issue altogether.
--- Additional comment from Nir Soffer on 2015-08-05 18:19:26 EDT ---
(In reply to Greg Padgett from comment #9)
Changing the mode of the file should not be done in connect, this is part
of domain initialization, so we can safely remove it. This is the minimal
fix for this bug, no need to add additional permission checking.
Adding permission checking makes sense but not required for this bug.
--- Additional comment from Greg Padgett on 2015-08-05 18:25:35 EDT ---
(In reply to Nir Soffer from comment #10)
> (In reply to Greg Padgett from comment #9)
>
> Changing the mode of the file should not be done in connect, this is part
> of domain initialization, so we can safely remove it. This is the minimal
> fix for this bug, no need to add additional permission checking.
Good, I'll verify the change and submit it. Thanks.
> Adding permission checking makes sense but not required for this bug.
(In reply to Idan Shaby from comment #0)
> Expected results:
> Should fail consistently with an informative error message.
Well I suppose the chmod removal fix satisfies inconsistent behavior, but I wouldn't consider either error in comment #7 an informative error message :)
--- Additional comment from Nir Soffer on 2015-08-05 18:35:38 EDT ---
Actually there are two bugs here - bad error message and in consistent failures. Lets clone the bug to handle the bad error messages, we already
have a patch for this :-)
Verified using 3.6.0-11
"Error while executing action Add Storage Connection: Permission settings on the specified path do not allow access to the storage.
Verify permission settings on the specified storage path"