Bug 1247475 - Permission error in export domain inhibit any of the host to become SPM
Permission error in export domain inhibit any of the host to become SPM
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.5.0
All All
high Severity medium
: ovirt-4.0.0-alpha
: 4.0.0
Assigned To: Maor
Aharon Canan
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-28 02:09 EDT by nijin ashok
Modified: 2016-05-04 07:10 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-04 07:10:33 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
audit_log_error (21.16 KB, image/png)
2016-05-04 07:07 EDT, Maor
no flags Details

  None (edit)
Description nijin ashok 2015-07-28 02:09:14 EDT
Description of problem:

Permission issues in export domain in RHEV 3.5 is not allowing any of the host to become SPM. The hosts will keep switching from contending to normal.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization 3.5
vdsm-4.16.13.1-1.el6ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. In a RHEV environment with an export domain, change the permission of NFS export domain manually.

chown -R root.root /exports

2. Put the active SPM host in maintenance mode so that the SpmStart is triggered on another host

3. The other hosts will never become SPM and keeps change the status from contending to normal 

Actual results:
Permission error in export domain inhibit any of the host to become SPM

Expected results:
As export domain is not necessary for RHEV environment , issues in export domain should not make the Data Center "Non Responsive".

Additional info:

engine log
2015-07-28 06:06:17,868 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] Command HSMGetAllTasksStatusesVDSCommand(HostName = 172.16.1.1, HostId = 9ab14fe3-fb62-4160-8588-021ecdd218b2) execution failed. Exception: IRSNonOperationalException: IRSGenericException: IRSErrorException: IRSNonOperationalException: Not SPM
2015-07-28 06:06:17,920 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (DefaultQuartzScheduler_Worker-63) [862ba2c] hostFromVds::selectedVds - 172.16.1.1, spmStatus Free, storage pool nijin-datacenter
2015-07-28 06:06:17,926 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (DefaultQuartzScheduler_Worker-63) [862ba2c] starting spm on vds 172.16.1.1, storage pool nijin-datacenter, prevId -1, LVER -1
2015-07-28 06:06:17,926 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] START, SpmStartVDSCommand(HostName = 172.16.1.1, HostId = 9ab14fe3-fb62-4160-8588-021ecdd218b2, storagePoolId = 00ca6639-4986-4990-b529-a23b8054a745, prevId=-1, prevLVER=-1, storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=false), log id: 7dab4e8a
2015-07-28 06:06:17,973 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] spmStart polling started: taskId = c4f9ba41-1402-4a01-bdd0-bbe3cd56738b
2015-07-28 06:06:18,979 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] Failed in HSMGetTaskStatusVDS method
2015-07-28 06:06:18,979 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] spmStart polling ended: taskId = c4f9ba41-1402-4a01-bdd0-bbe3cd56738b task status = finished
2015-07-28 06:06:18,979 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] Start SPM Task failed - result: cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = [Errno 13] Permission denied, code = 100
Comment 3 Allon Mureinik 2015-07-28 11:06:19 EDT
I don't understand the point of this BZ.
The export domain, like any other domain, should be owned by 36:36.

Why would you change it?
Why can't the customer just chown the directory to the appropriate owner/group?
Comment 4 nijin ashok 2015-07-29 08:48:21 EDT
(In reply to Allon Mureinik from comment #3)
> I don't understand the point of this BZ.
> The export domain, like any other domain, should be owned by 36:36.
> 
> Why would you change it?
> Why can't the customer just chown the directory to the appropriate
> owner/group?

Unfortunately I am not sure how the permission got changed in customer's environment. The permission has been fixed already and the environment is up now.

However here the issue is only with export domain. But this causes the Data Center to go "non responsive" and made all storage domains status to "Unknown" even though we don't have any issue with these domains. This makes the whole RHEV environment unmanageable from the portal. As the export domain is not necessary for running the RHEV environment I think this is not an intended behavior.
Comment 6 Sandro Bonazzola 2015-10-26 08:38:03 EDT
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high
Comment 7 Maor 2015-11-09 16:07:46 EST
Looks like the problem was resolved in the customer environment after he fixed the permissions on the Storage Domain

I was trying to investigate the spmStart process in VDSM and it looks like the spm start operation iterates over all the active Storage Domains in the storage pool and tries to call _realProduce (see [1]) for each one.
One of those Storage Domains is the Export Storage Domain, and since there was a permission issue the operation of spmStart failed.

[1]
The Exception:

dcbefba5-f747-44bb-8418-c0aa561e9f01::DEBUG::2015-07-23 14:53:42,746::fileSD::152::Storage.StorageDomain::(__init__) Reading domain in path /rhev/data-center/mnt/10.0.12.90:_volume1_RHEVExport/df457e4d-536d-428b-87f1-79e3422325f5
dcbefba5-f747-44bb-8418-c0aa561e9f01::ERROR::2015-07-23 14:53:42,747::sp::294::Storage.StoragePool::(startSpm) Backup domain validation failed
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 291, in startSpm
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
  File "/usr/share/vdsm/storage/sp.py", line 1424, in checkBackupDomain
  File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
  File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
  File "/usr/share/vdsm/storage/fileSD.py", line 159, in __init__
  File "/usr/share/vdsm/storage/fileSD.py", line 88, in validateFileSystemFeatures
  File "/usr/share/vdsm/storage/outOfProcess.py", line 351, in directTouch
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 507, in touch
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 391, in _sendCommand
OSError: [Errno 13] Permission denied
dcbefba5-f747-44bb-8418-c0aa561e9f01::DEBUG::2015-07-23 14:53:42,757::fileSD::152::Storage.StorageDomain::(__init__) Reading domain in path /rhev/data-center/mnt/10.0.12.90:_volume1_RHEVExport/df457e4d-536d-428b-87f1-79e3422325f5
dcbefba5-f747-44bb-8418-c0aa561e9f01::ERROR::2015-07-23 14:53:42,758::sp::330::Storage.StoragePool::(startSpm) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 302, in startSpm
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
  File "/usr/share/vdsm/storage/sp.py", line 205, in _updateDomainsRole
  File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
  File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
  File "/usr/share/vdsm/storage/fileSD.py", line 159, in __init__
  File "/usr/share/vdsm/storage/fileSD.py", line 88, in validateFileSystemFeatures
  File "/usr/share/vdsm/storage/outOfProcess.py", line 351, in directTouch
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 507, in touch
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 391, in _sendCommand
OSError: [Errno 13] Permission denied
dcbefba5-f747-44bb-8418-c0aa561e9f01::ERROR::2015-07-23 14:53:42,758::sp::331::Storage.StoragePool::(startSpm) failed: [Errno 13] Permission denied
dcbefba5-f747-44bb-8418-c0aa561e9f01::DEBUG::2015-07-23 14:53:42,758::sp::337::Storage.StoragePool::(_shutDownUpgrade) Shutting down upgrade process
Comment 9 Allon Mureinik 2016-03-28 10:57:23 EDT
(In reply to Maor from comment #7)
> Looks like the problem was resolved in the customer environment after he
> fixed the permissions on the Storage Domain
> 
> I was trying to investigate the spmStart process in VDSM and it looks like
> the spm start operation iterates over all the active Storage Domains in the
> storage pool and tries to call _realProduce (see [1]) for each one.
> One of those Storage Domains is the Export Storage Domain, and since there
> was a permission issue the operation of spmStart failed.
So what's the remaining action item here?
Comment 10 Maor 2016-03-28 18:29:51 EDT
(In reply to Allon Mureinik from comment #9)
> (In reply to Maor from comment #7)
> > Looks like the problem was resolved in the customer environment after he
> > fixed the permissions on the Storage Domain
> > 
> > I was trying to investigate the spmStart process in VDSM and it looks like
> > the spm start operation iterates over all the active Storage Domains in the
> > storage pool and tries to call _realProduce (see [1]) for each one.
> > One of those Storage Domains is the Export Storage Domain, and since there
> > was a permission issue the operation of spmStart failed.
> So what's the remaining action item here?

It depends what are we plan to do with the SPM, this issue can be solved as part of the SPM removal.
Comment 11 Yaniv Lavi 2016-03-29 05:20:38 EDT
I would want a clear error in the manager to say this is the issue. The main problem here is that you need to dig to get the permission error.
Comment 12 Maor 2016-05-04 07:07 EDT
Created attachment 1153789 [details]
audit_log_error
Comment 13 Maor 2016-05-04 07:10:33 EDT
Looks like we already have an audit log indicating there was a permissions error (see Comment 12), probably due to fix https://gerrit.ovirt.org/#/c/54958/.
Moving this bug to Currentrelease status, please feel free to re-open if there is an issue that needs to be addressed.

Note You need to log in before you can comment on or make changes to this bug.