Bug 1247475

Summary: Permission error in export domain inhibit any of the host to become SPM
Product: Red Hat Enterprise Virtualization Manager Reporter: nijin ashok <nashok>
Component: vdsmAssignee: Maor <mlipchuk>
Status: CLOSED CURRENTRELEASE QA Contact: Aharon Canan <acanan>
Severity: medium Docs Contact:
Priority: high    
Version: 3.5.0CC: adevolder, amureini, bazulay, kgoldbla, lpeer, lsurette, mgoldboi, mlipchuk, nashok, srevivo, tnisan, ycui, ykaul, ylavi
Target Milestone: ovirt-4.0.0-alpha   
Target Release: 4.0.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-04 11:10:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
audit_log_error none

Description nijin ashok 2015-07-28 06:09:14 UTC
Description of problem:

Permission issues in export domain in RHEV 3.5 is not allowing any of the host to become SPM. The hosts will keep switching from contending to normal.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization 3.5
vdsm-4.16.13.1-1.el6ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. In a RHEV environment with an export domain, change the permission of NFS export domain manually.

chown -R root.root /exports

2. Put the active SPM host in maintenance mode so that the SpmStart is triggered on another host

3. The other hosts will never become SPM and keeps change the status from contending to normal 

Actual results:
Permission error in export domain inhibit any of the host to become SPM

Expected results:
As export domain is not necessary for RHEV environment , issues in export domain should not make the Data Center "Non Responsive".

Additional info:

engine log
2015-07-28 06:06:17,868 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] Command HSMGetAllTasksStatusesVDSCommand(HostName = 172.16.1.1, HostId = 9ab14fe3-fb62-4160-8588-021ecdd218b2) execution failed. Exception: IRSNonOperationalException: IRSGenericException: IRSErrorException: IRSNonOperationalException: Not SPM
2015-07-28 06:06:17,920 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (DefaultQuartzScheduler_Worker-63) [862ba2c] hostFromVds::selectedVds - 172.16.1.1, spmStatus Free, storage pool nijin-datacenter
2015-07-28 06:06:17,926 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (DefaultQuartzScheduler_Worker-63) [862ba2c] starting spm on vds 172.16.1.1, storage pool nijin-datacenter, prevId -1, LVER -1
2015-07-28 06:06:17,926 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] START, SpmStartVDSCommand(HostName = 172.16.1.1, HostId = 9ab14fe3-fb62-4160-8588-021ecdd218b2, storagePoolId = 00ca6639-4986-4990-b529-a23b8054a745, prevId=-1, prevLVER=-1, storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=false), log id: 7dab4e8a
2015-07-28 06:06:17,973 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] spmStart polling started: taskId = c4f9ba41-1402-4a01-bdd0-bbe3cd56738b
2015-07-28 06:06:18,979 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] Failed in HSMGetTaskStatusVDS method
2015-07-28 06:06:18,979 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] spmStart polling ended: taskId = c4f9ba41-1402-4a01-bdd0-bbe3cd56738b task status = finished
2015-07-28 06:06:18,979 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-63) [862ba2c] Start SPM Task failed - result: cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = [Errno 13] Permission denied, code = 100

Comment 3 Allon Mureinik 2015-07-28 15:06:19 UTC
I don't understand the point of this BZ.
The export domain, like any other domain, should be owned by 36:36.

Why would you change it?
Why can't the customer just chown the directory to the appropriate owner/group?

Comment 4 nijin ashok 2015-07-29 12:48:21 UTC
(In reply to Allon Mureinik from comment #3)
> I don't understand the point of this BZ.
> The export domain, like any other domain, should be owned by 36:36.
> 
> Why would you change it?
> Why can't the customer just chown the directory to the appropriate
> owner/group?

Unfortunately I am not sure how the permission got changed in customer's environment. The permission has been fixed already and the environment is up now.

However here the issue is only with export domain. But this causes the Data Center to go "non responsive" and made all storage domains status to "Unknown" even though we don't have any issue with these domains. This makes the whole RHEV environment unmanageable from the portal. As the export domain is not necessary for running the RHEV environment I think this is not an intended behavior.

Comment 6 Sandro Bonazzola 2015-10-26 12:38:03 UTC
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 7 Maor 2015-11-09 21:07:46 UTC
Looks like the problem was resolved in the customer environment after he fixed the permissions on the Storage Domain

I was trying to investigate the spmStart process in VDSM and it looks like the spm start operation iterates over all the active Storage Domains in the storage pool and tries to call _realProduce (see [1]) for each one.
One of those Storage Domains is the Export Storage Domain, and since there was a permission issue the operation of spmStart failed.

[1]
The Exception:

dcbefba5-f747-44bb-8418-c0aa561e9f01::DEBUG::2015-07-23 14:53:42,746::fileSD::152::Storage.StorageDomain::(__init__) Reading domain in path /rhev/data-center/mnt/10.0.12.90:_volume1_RHEVExport/df457e4d-536d-428b-87f1-79e3422325f5
dcbefba5-f747-44bb-8418-c0aa561e9f01::ERROR::2015-07-23 14:53:42,747::sp::294::Storage.StoragePool::(startSpm) Backup domain validation failed
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 291, in startSpm
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
  File "/usr/share/vdsm/storage/sp.py", line 1424, in checkBackupDomain
  File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
  File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
  File "/usr/share/vdsm/storage/fileSD.py", line 159, in __init__
  File "/usr/share/vdsm/storage/fileSD.py", line 88, in validateFileSystemFeatures
  File "/usr/share/vdsm/storage/outOfProcess.py", line 351, in directTouch
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 507, in touch
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 391, in _sendCommand
OSError: [Errno 13] Permission denied
dcbefba5-f747-44bb-8418-c0aa561e9f01::DEBUG::2015-07-23 14:53:42,757::fileSD::152::Storage.StorageDomain::(__init__) Reading domain in path /rhev/data-center/mnt/10.0.12.90:_volume1_RHEVExport/df457e4d-536d-428b-87f1-79e3422325f5
dcbefba5-f747-44bb-8418-c0aa561e9f01::ERROR::2015-07-23 14:53:42,758::sp::330::Storage.StoragePool::(startSpm) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 302, in startSpm
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
  File "/usr/share/vdsm/storage/sp.py", line 205, in _updateDomainsRole
  File "/usr/share/vdsm/storage/sdc.py", line 98, in produce
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
  File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomain
  File "/usr/share/vdsm/storage/fileSD.py", line 159, in __init__
  File "/usr/share/vdsm/storage/fileSD.py", line 88, in validateFileSystemFeatures
  File "/usr/share/vdsm/storage/outOfProcess.py", line 351, in directTouch
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 507, in touch
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 391, in _sendCommand
OSError: [Errno 13] Permission denied
dcbefba5-f747-44bb-8418-c0aa561e9f01::ERROR::2015-07-23 14:53:42,758::sp::331::Storage.StoragePool::(startSpm) failed: [Errno 13] Permission denied
dcbefba5-f747-44bb-8418-c0aa561e9f01::DEBUG::2015-07-23 14:53:42,758::sp::337::Storage.StoragePool::(_shutDownUpgrade) Shutting down upgrade process

Comment 9 Allon Mureinik 2016-03-28 14:57:23 UTC
(In reply to Maor from comment #7)
> Looks like the problem was resolved in the customer environment after he
> fixed the permissions on the Storage Domain
> 
> I was trying to investigate the spmStart process in VDSM and it looks like
> the spm start operation iterates over all the active Storage Domains in the
> storage pool and tries to call _realProduce (see [1]) for each one.
> One of those Storage Domains is the Export Storage Domain, and since there
> was a permission issue the operation of spmStart failed.
So what's the remaining action item here?

Comment 10 Maor 2016-03-28 22:29:51 UTC
(In reply to Allon Mureinik from comment #9)
> (In reply to Maor from comment #7)
> > Looks like the problem was resolved in the customer environment after he
> > fixed the permissions on the Storage Domain
> > 
> > I was trying to investigate the spmStart process in VDSM and it looks like
> > the spm start operation iterates over all the active Storage Domains in the
> > storage pool and tries to call _realProduce (see [1]) for each one.
> > One of those Storage Domains is the Export Storage Domain, and since there
> > was a permission issue the operation of spmStart failed.
> So what's the remaining action item here?

It depends what are we plan to do with the SPM, this issue can be solved as part of the SPM removal.

Comment 11 Yaniv Lavi 2016-03-29 09:20:38 UTC
I would want a clear error in the manager to say this is the issue. The main problem here is that you need to dig to get the permission error.

Comment 12 Maor 2016-05-04 11:07:14 UTC
Created attachment 1153789 [details]
audit_log_error

Comment 13 Maor 2016-05-04 11:10:33 UTC
Looks like we already have an audit log indicating there was a permissions error (see Comment 12), probably due to fix https://gerrit.ovirt.org/#/c/54958/.
Moving this bug to Currentrelease status, please feel free to re-open if there is an issue that needs to be addressed.