Bug 959149 - Cannot acquire host id: ('a459c008-9cee-4b00-9ff2-3189f1b595a4', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))
Summary: Cannot acquire host id: ('a459c008-9cee-4b00-9ff2-3189f1b595a4', SanlockExcep...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 3.3.0
Assignee: Federico Simoncelli
QA Contact: Elad
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-03 08:49 UTC by Pablo Iranzo Gómez
Modified: 2019-06-13 07:57 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-06-10 15:19:33 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Sosreport from host 2 (1.74 MB, application/x-xz)
2013-05-03 09:22 UTC, Pablo Iranzo Gómez
no flags Details
Sosreport from host 1 (6.60 MB, application/x-xz)
2013-05-03 09:24 UTC, Pablo Iranzo Gómez
no flags Details
engine.log when adding a FC SD (8.58 KB, text/x-log)
2013-06-07 15:03 UTC, Karim Boumedhel
no flags Details
vdsm.log when adding a FC SD (59.09 KB, text/x-log)
2013-06-07 15:03 UTC, Karim Boumedhel
no flags Details
additional vdsm.log when attaching a SD from rhvges2 (56.76 KB, text/x-log)
2013-06-07 15:12 UTC, Karim Boumedhel
no flags Details
correct vdsm.log (3.21 MB, text/x-log)
2013-06-07 15:17 UTC, Karim Boumedhel
no flags Details
sanlock log from rhvges1 (4.66 KB, text/x-log)
2013-06-07 15:23 UTC, Karim Boumedhel
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 373643 0 None None None Never

Description Pablo Iranzo Gómez 2013-05-03 08:49:17 UTC
Description of problem:

When creating a new RHEV environment (rhevm-cleanup, rhevm-setup, etc), after adding two RHEL hosts and trying to add a NFS Data Domain, it gets added to the UI (and some files created under the mount point), but it fails...

Trying to attach the SD to the DC fails too, and rhevm logs (engine.log) shows messages from the vdsm trying to do the operation.


Version-Release number of selected component (if applicable):

vdsm-4.10.2-1.13.el6ev.x86_64

How reproducible:

Steps to Reproduce:
1. Add a Data Domain (NFS)
2. It will create some files on the FS
3. It will fail afterwards with that message in logs
  
Actual results:

No data domain, so no working RHEV environment

Expected results:

Data domain attached

Additional info:

### ENGINE.LOG ####
2013-05-02 12:55:52,872 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (ajp-/127.0.0.1:8702-16) Lock Acquired to object EngineLock [exclusiveLocks= key: b4ce5912-b1d0-11e2-9c85-080027c6913a value: POOL
, sharedLocks= ]
2013-05-02 12:55:53,050 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-49) [ddcd2ff] Running command: AddStoragePoolWithStoragesCommand internal: false. Entities affected :  ID: b4ce5912-b1d0-11e2-9c85-080027c6913a Type: StoragePool
2013-05-02 12:55:53,216 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (pool-3-thread-49) [7248988c] START, ValidateStorageServerConnectionVDSCommand(HostName = Virtualizador, HostId = 57062226-b1ec-11e2-b016-080027c6913a, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: ce655af9-4de6-4b76-a992-c6564b57e562, connection: 192.168.2.122:/mnt/data, iqn: null, vfsType: null, mountOptions: null, nfsVersion: AUTO, nfsRetrans: null, nfsTimeo: null };]), log id: 303ee650
2013-05-02 12:55:53,986 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (pool-3-thread-49) [7248988c] FINISH, ValidateStorageServerConnectionVDSCommand, return: {ce655af9-4de6-4b76-a992-c6564b57e562=0}, log id: 303ee650
2013-05-02 12:55:53,987 INFO  [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (pool-3-thread-49) [7248988c] Running command: ConnectStorageToVdsCommand internal: true. Entities affected :  ID: aaa00000-0000-0000-0000-123456789aaa Type: System
2013-05-02 12:55:53,993 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-3-thread-49) [7248988c] START, ConnectStorageServerVDSCommand(HostName = Virtualizador, HostId = 57062226-b1ec-11e2-b016-080027c6913a, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: ce655af9-4de6-4b76-a992-c6564b57e562, connection: 192.168.2.122:/mnt/data, iqn: null, vfsType: null, mountOptions: null, nfsVersion: AUTO, nfsRetrans: null, nfsTimeo: null };]), log id: 19da0b66
2013-05-02 12:55:54,822 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (pool-3-thread-49) [7248988c] FINISH, ConnectStorageServerVDSCommand, return: {ce655af9-4de6-4b76-a992-c6564b57e562=0}, log id: 19da0b66
2013-05-02 12:55:54,986 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (pool-3-thread-49) [7248988c] START, CreateStoragePoolVDSCommand(HostName = Virtualizador, HostId = 57062226-b1ec-11e2-b016-080027c6913a, storagePoolId=b4ce5912-b1d0-11e2-9c85-080027c6913a, storageType=NFS, storagePoolName=Default, masterDomainId=a459c008-9cee-4b00-9ff2-3189f1b595a4, domainsIdList=[a459c008-9cee-4b00-9ff2-3189f1b595a4], masterVersion=21), log id: 4db21a0a
2013-05-02 12:55:56,900 WARN  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) [7248988c] Weird return value: Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         661
mMessage                      Cannot acquire host id: ('a459c008-9cee-4b00-9ff2-3189f1b595a4', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))

2013-05-02 12:55:56,902 WARN  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) [7248988c] Weird return value: Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         661
mMessage                      Cannot acquire host id: ('a459c008-9cee-4b00-9ff2-3189f1b595a4', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))

2013-05-02 12:55:56,902 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) [7248988c] Failed in CreateStoragePoolVDS method
2013-05-02 12:55:56,904 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) [7248988c] Error code unexpected and error message VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: ('a459c008-9cee-4b00-9ff2-3189f1b595a4', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))
2013-05-02 12:55:56,906 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) [7248988c] Command org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         661
mMessage                      Cannot acquire host id: ('a459c008-9cee-4b00-9ff2-3189f1b595a4', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))


2013-05-02 12:55:56,910 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) [7248988c] HostName = Virtualizador
2013-05-02 12:55:56,910 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) [7248988c] Command CreateStoragePoolVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: ('a459c008-9cee-4b00-9ff2-3189f1b595a4', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))
2013-05-02 12:55:56,910 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (pool-3-thread-49) [7248988c] FINISH, CreateStoragePoolVDSCommand, log id: 4db21a0a
2013-05-02 12:55:56,910 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-49) [7248988c] Command org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: ('a459c008-9cee-4b00-9ff2-3189f1b595a4', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))
2013-05-02 12:55:56,951 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-49) [7248988c] Command [id=9d602bf8-b72f-4bd1-8c03-10588655c885]: Compensating CHANGED_ENTITY of org.ovirt.engine.core.common.businessentities.storage_pool; snapshot: id=b4ce5912-b1d0-11e2-9c85-080027c6913a.
2013-05-02 12:55:56,964 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-49) [7248988c] Command [id=9d602bf8-b72f-4bd1-8c03-10588655c885]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.storage_pool_iso_map; snapshot: storagePoolId = b4ce5912-b1d0-11e2-9c85-080027c6913a, storageId = a459c008-9cee-4b00-9ff2-3189f1b595a4.
2013-05-02 12:55:56,988 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-49) [7248988c] Command [id=9d602bf8-b72f-4bd1-8c03-10588655c885]: Compensating CHANGED_ENTITY of org.ovirt.engine.core.common.businessentities.storage_domain_static; snapshot: id=a459c008-9cee-4b00-9ff2-3189f1b595a4.
2013-05-02 12:55:57,095 INFO  [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (pool-3-thread-49) [7248988c] Lock freed to object EngineLock [exclusiveLocks= key: b4ce5912-b1d0-11e2-9c85-080027c6913a value: POOL
, sharedLocks= ]






############ VDSM.LOG ##############
Thread-1312::DEBUG::2013-05-03 10:45:12,946::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1312::DEBUG::2013-05-03 10:45:12,947::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1312::DEBUG::2013-05-03 10:45:12,947::task::957::TaskManager.Task::(_decref) Task=`e4a7722b-5734-4bbf-956c-599f9222dfa4`::ref 0 aborting False
Thread-1318::DEBUG::2013-05-03 10:45:28,238::task::568::TaskManager.Task::(_updateState) Task=`b9927342-8b91-4188-ba3a-1e2b697e08f9`::moving from state init -> state preparing
Thread-1318::INFO::2013-05-03 10:45:28,238::logUtils::37::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1318::INFO::2013-05-03 10:45:28,240::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {}
Thread-1318::DEBUG::2013-05-03 10:45:28,240::task::1151::TaskManager.Task::(prepare) Task=`b9927342-8b91-4188-ba3a-1e2b697e08f9`::finished: {}
Thread-1318::DEBUG::2013-05-03 10:45:28,240::task::568::TaskManager.Task::(_updateState) Task=`b9927342-8b91-4188-ba3a-1e2b697e08f9`::moving from state preparing -> state finished
Thread-1318::DEBUG::2013-05-03 10:45:28,240::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1318::DEBUG::2013-05-03 10:45:28,240::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1318::DEBUG::2013-05-03 10:45:28,241::task::957::TaskManager.Task::(_decref) Task=`b9927342-8b91-4188-ba3a-1e2b697e08f9`::ref 0 aborting False
Thread-1324::DEBUG::2013-05-03 10:45:43,409::task::568::TaskManager.Task::(_updateState) Task=`3e5e2e45-7b1a-4124-bc29-591fcda6d3a2`::moving from state init -> state preparing
Thread-1324::INFO::2013-05-03 10:45:43,409::logUtils::37::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1324::INFO::2013-05-03 10:45:43,409::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {}
Thread-1324::DEBUG::2013-05-03 10:45:43,409::task::1151::TaskManager.Task::(prepare) Task=`3e5e2e45-7b1a-4124-bc29-591fcda6d3a2`::finished: {}
Thread-1324::DEBUG::2013-05-03 10:45:43,409::task::568::TaskManager.Task::(_updateState) Task=`3e5e2e45-7b1a-4124-bc29-591fcda6d3a2`::moving from state preparing -> state finished
Thread-1324::DEBUG::2013-05-03 10:45:43,410::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1324::DEBUG::2013-05-03 10:45:43,410::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1324::DEBUG::2013-05-03 10:45:43,410::task::957::TaskManager.Task::(_decref) Task=`3e5e2e45-7b1a-4124-bc29-591fcda6d3a2`::ref 0 aborting False
Thread-1330::DEBUG::2013-05-03 10:45:58,873::task::568::TaskManager.Task::(_updateState) Task=`16893207-e770-42dc-8bce-bc4cf3aedc90`::moving from state init -> state preparing
Thread-1330::INFO::2013-05-03 10:45:58,873::logUtils::37::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1330::INFO::2013-05-03 10:45:58,873::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {}
Thread-1330::DEBUG::2013-05-03 10:45:58,874::task::1151::TaskManager.Task::(prepare) Task=`16893207-e770-42dc-8bce-bc4cf3aedc90`::finished: {}
Thread-1330::DEBUG::2013-05-03 10:45:58,874::task::568::TaskManager.Task::(_updateState) Task=`16893207-e770-42dc-8bce-bc4cf3aedc90`::moving from state preparing -> state finished
Thread-1330::DEBUG::2013-05-03 10:45:58,874::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1330::DEBUG::2013-05-03 10:45:58,874::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1330::DEBUG::2013-05-03 10:45:58,874::task::957::TaskManager.Task::(_decref) Task=`16893207-e770-42dc-8bce-bc4cf3aedc90`::ref 0 aborting False
Thread-1336::DEBUG::2013-05-03 10:46:14,051::task::568::TaskManager.Task::(_updateState) Task=`26c57057-914e-48f7-8440-d184512b0760`::moving from state init -> state preparing
Thread-1336::INFO::2013-05-03 10:46:14,051::logUtils::37::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1336::INFO::2013-05-03 10:46:14,053::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {}
Thread-1336::DEBUG::2013-05-03 10:46:14,053::task::1151::TaskManager.Task::(prepare) Task=`26c57057-914e-48f7-8440-d184512b0760`::finished: {}
Thread-1336::DEBUG::2013-05-03 10:46:14,053::task::568::TaskManager.Task::(_updateState) Task=`26c57057-914e-48f7-8440-d184512b0760`::moving from state preparing -> state finished
Thread-1336::DEBUG::2013-05-03 10:46:14,053::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1336::DEBUG::2013-05-03 10:46:14,054::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1336::DEBUG::2013-05-03 10:46:14,054::task::957::TaskManager.Task::(_decref) Task=`26c57057-914e-48f7-8440-d184512b0760`::ref 0 aborting False
Thread-1342::DEBUG::2013-05-03 10:46:29,334::task::568::TaskManager.Task::(_updateState) Task=`bc22ed03-4f02-4fe9-9886-35a5bf34eff0`::moving from state init -> state preparing
Thread-1342::INFO::2013-05-03 10:46:29,335::logUtils::37::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1342::INFO::2013-05-03 10:46:29,335::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {}
Thread-1342::DEBUG::2013-05-03 10:46:29,335::task::1151::TaskManager.Task::(prepare) Task=`bc22ed03-4f02-4fe9-9886-35a5bf34eff0`::finished: {}
Thread-1342::DEBUG::2013-05-03 10:46:29,335::task::568::TaskManager.Task::(_updateState) Task=`bc22ed03-4f02-4fe9-9886-35a5bf34eff0`::moving from state preparing -> state finished
Thread-1342::DEBUG::2013-05-03 10:46:29,335::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1342::DEBUG::2013-05-03 10:46:29,335::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1342::DEBUG::2013-05-03 10:46:29,335::task::957::TaskManager.Task::(_decref) Task=`bc22ed03-4f02-4fe9-9886-35a5bf34eff0`::ref 0 aborting False
Thread-1348::DEBUG::2013-05-03 10:46:44,719::task::568::TaskManager.Task::(_updateState) Task=`e7d58dee-f349-49eb-b2c0-4d42c556ce60`::moving from state init -> state preparing
Thread-1348::INFO::2013-05-03 10:46:44,719::logUtils::37::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1348::INFO::2013-05-03 10:46:44,719::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {}
Thread-1348::DEBUG::2013-05-03 10:46:44,719::task::1151::TaskManager.Task::(prepare) Task=`e7d58dee-f349-49eb-b2c0-4d42c556ce60`::finished: {}
Thread-1348::DEBUG::2013-05-03 10:46:44,720::task::568::TaskManager.Task::(_updateState) Task=`e7d58dee-f349-49eb-b2c0-4d42c556ce60`::moving from state preparing -> state finished
Thread-1348::DEBUG::2013-05-03 10:46:44,720::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1348::DEBUG::2013-05-03 10:46:44,720::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1348::DEBUG::2013-05-03 10:46:44,720::task::957::TaskManager.Task::(_decref) Task=`e7d58dee-f349-49eb-b2c0-4d42c556ce60`::ref 0 aborting False
Thread-1354::DEBUG::2013-05-03 10:46:59,911::task::568::TaskManager.Task::(_updateState) Task=`4128abb0-bd4c-43e2-ae87-c992ab025224`::moving from state init -> state preparing
Thread-1354::INFO::2013-05-03 10:46:59,911::logUtils::37::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1354::INFO::2013-05-03 10:46:59,912::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {}
Thread-1354::DEBUG::2013-05-03 10:46:59,912::task::1151::TaskManager.Task::(prepare) Task=`4128abb0-bd4c-43e2-ae87-c992ab025224`::finished: {}
Thread-1354::DEBUG::2013-05-03 10:46:59,912::task::568::TaskManager.Task::(_updateState) Task=`4128abb0-bd4c-43e2-ae87-c992ab025224`::moving from state preparing -> state finished
Thread-1354::DEBUG::2013-05-03 10:46:59,912::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1354::DEBUG::2013-05-03 10:46:59,912::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1354::DEBUG::2013-05-03 10:46:59,912::task::957::TaskManager.Task::(_decref) Task=`4128abb0-bd4c-43e2-ae87-c992ab025224`::ref 0 aborting False
Thread-1360::DEBUG::2013-05-03 10:47:15,276::task::568::TaskManager.Task::(_updateState) Task=`9a1e1ace-a461-43f7-a579-cb14bbf21903`::moving from state init -> state preparing
Thread-1360::INFO::2013-05-03 10:47:15,276::logUtils::37::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1360::INFO::2013-05-03 10:47:15,278::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {}
Thread-1360::DEBUG::2013-05-03 10:47:15,278::task::1151::TaskManager.Task::(prepare) Task=`9a1e1ace-a461-43f7-a579-cb14bbf21903`::finished: {}
Thread-1360::DEBUG::2013-05-03 10:47:15,278::task::568::TaskManager.Task::(_updateState) Task=`9a1e1ace-a461-43f7-a579-cb14bbf21903`::moving from state preparing -> state finished
Thread-1360::DEBUG::2013-05-03 10:47:15,278::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1360::DEBUG::2013-05-03 10:47:15,278::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1360::DEBUG::2013-05-03 10:47:15,278::task::957::TaskManager.Task::(_decref) Task=`9a1e1ace-a461-43f7-a579-cb14bbf21903`::ref 0 aborting False
Thread-1364::DEBUG::2013-05-03 10:47:23,196::BindingXMLRPC::161::vds::(wrapper) [25.97.225.61]
Thread-1364::DEBUG::2013-05-03 10:47:23,196::task::568::TaskManager.Task::(_updateState) Task=`62156094-319e-4223-9cb0-a24ac391fb70`::moving from state init -> state preparing
Thread-1364::INFO::2013-05-03 10:47:23,197::logUtils::37::dispatcher::(wrapper) Run and protect: validateStorageServerConnection(domType=1, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': '192.168.2.122:/mnt/data', 'iqn': '', 'portal': '', 'user': '', 'protocol_version': 'auto', 'password': '******', 'id': 'ce655af9-4de6-4b76-a992-c6564b57e562'}], options=None)
Thread-1364::INFO::2013-05-03 10:47:23,197::logUtils::39::dispatcher::(wrapper) Run and protect: validateStorageServerConnection, Return response: {'statuslist': [{'status': 0, 'id': 'ce655af9-4de6-4b76-a992-c6564b57e562'}]}
Thread-1364::DEBUG::2013-05-03 10:47:23,197::task::1151::TaskManager.Task::(prepare) Task=`62156094-319e-4223-9cb0-a24ac391fb70`::finished: {'statuslist': [{'status': 0, 'id': 'ce655af9-4de6-4b76-a992-c6564b57e562'}]}
Thread-1364::DEBUG::2013-05-03 10:47:23,197::task::568::TaskManager.Task::(_updateState) Task=`62156094-319e-4223-9cb0-a24ac391fb70`::moving from state preparing -> state finished
Thread-1364::DEBUG::2013-05-03 10:47:23,198::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1364::DEBUG::2013-05-03 10:47:23,198::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1364::DEBUG::2013-05-03 10:47:23,198::task::957::TaskManager.Task::(_decref) Task=`62156094-319e-4223-9cb0-a24ac391fb70`::ref 0 aborting False
Thread-1365::DEBUG::2013-05-03 10:47:23,997::BindingXMLRPC::161::vds::(wrapper) [25.97.225.61]
Thread-1365::DEBUG::2013-05-03 10:47:23,998::task::568::TaskManager.Task::(_updateState) Task=`61e6d67d-54d9-4f99-abfd-c015e38f10ad`::moving from state init -> state preparing
Thread-1365::INFO::2013-05-03 10:47:23,998::logUtils::37::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=1, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': '192.168.2.122:/mnt/data', 'iqn': '', 'portal': '', 'user': '', 'protocol_version': 'auto', 'password': '******', 'id': 'ce655af9-4de6-4b76-a992-c6564b57e562'}], options=None)
Thread-1365::INFO::2013-05-03 10:47:24,002::logUtils::39::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': 'ce655af9-4de6-4b76-a992-c6564b57e562'}]}
Thread-1365::DEBUG::2013-05-03 10:47:24,002::task::1151::TaskManager.Task::(prepare) Task=`61e6d67d-54d9-4f99-abfd-c015e38f10ad`::finished: {'statuslist': [{'status': 0, 'id': 'ce655af9-4de6-4b76-a992-c6564b57e562'}]}
Thread-1365::DEBUG::2013-05-03 10:47:24,002::task::568::TaskManager.Task::(_updateState) Task=`61e6d67d-54d9-4f99-abfd-c015e38f10ad`::moving from state preparing -> state finished
Thread-1365::DEBUG::2013-05-03 10:47:24,002::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-1365::DEBUG::2013-05-03 10:47:24,002::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1365::DEBUG::2013-05-03 10:47:24,002::task::957::TaskManager.Task::(_decref) Task=`61e6d67d-54d9-4f99-abfd-c015e38f10ad`::ref 0 aborting False
Thread-1366::DEBUG::2013-05-03 10:47:24,736::BindingXMLRPC::161::vds::(wrapper) [25.97.225.61]
Thread-1366::DEBUG::2013-05-03 10:47:24,736::task::568::TaskManager.Task::(_updateState) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::moving from state init -> state preparing
Thread-1366::INFO::2013-05-03 10:47:24,736::logUtils::37::dispatcher::(wrapper) Run and protect: createStoragePool(poolType=None, spUUID='b4ce5912-b1d0-11e2-9c85-080027c6913a', poolName='Default', masterDom='a459c008-9cee-4b00-9ff2-3189f1b595a4', domList=['a459c008-9cee-4b00-9ff2-3189f1b595a4'], masterVersion=22, lockPolicy=None, lockRenewalIntervalSec=5, leaseTimeSec=60, ioOpTimeoutSec=10, leaseRetries=3, options=None)
Thread-1366::INFO::2013-05-03 10:47:24,737::fileSD::310::Storage.StorageDomain::(validate) sdUUID=a459c008-9cee-4b00-9ff2-3189f1b595a4
Thread-1366::DEBUG::2013-05-03 10:47:24,751::persistentDict::234::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=rpi-data', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'POOL_UUID=', 'REMOTE_PATH=192.168.2.122:/mnt/data', 'ROLE=Regular', 'SDUUID=a459c008-9cee-4b00-9ff2-3189f1b595a4', 'TYPE=NFS', 'VERSION=3', '_SHA_CKSUM=96a09f31a467960cab663ef912b87f3fd6e0ac12']
Thread-1366::DEBUG::2013-05-03 10:47:24,752::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.b4ce5912-b1d0-11e2-9c85-080027c6913a`ReqID=`46dbba87-bb13-461a-9d96-7d8fb2557f96`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '174' at '__init__'
Thread-1366::DEBUG::2013-05-03 10:47:24,752::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.b4ce5912-b1d0-11e2-9c85-080027c6913a' for lock type 'exclusive'
Thread-1366::DEBUG::2013-05-03 10:47:24,752::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.b4ce5912-b1d0-11e2-9c85-080027c6913a' is free. Now locking as 'exclusive' (1 active user)
Thread-1366::DEBUG::2013-05-03 10:47:24,752::resourceManager::212::ResourceManager.Request::(grant) ResName=`Storage.b4ce5912-b1d0-11e2-9c85-080027c6913a`ReqID=`46dbba87-bb13-461a-9d96-7d8fb2557f96`::Granted request
Thread-1366::DEBUG::2013-05-03 10:47:24,753::task::794::TaskManager.Task::(resourceAcquired) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::_resourcesAcquired: Storage.b4ce5912-b1d0-11e2-9c85-080027c6913a (exclusive)
Thread-1366::DEBUG::2013-05-03 10:47:24,753::task::957::TaskManager.Task::(_decref) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::ref 1 aborting False
Thread-1366::DEBUG::2013-05-03 10:47:24,753::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.a459c008-9cee-4b00-9ff2-3189f1b595a4`ReqID=`73ad75b9-8193-41ec-9a0a-8a2ebffcaca3`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '174' at '__init__'
Thread-1366::DEBUG::2013-05-03 10:47:24,753::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.a459c008-9cee-4b00-9ff2-3189f1b595a4' for lock type 'exclusive'
Thread-1366::DEBUG::2013-05-03 10:47:24,754::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.a459c008-9cee-4b00-9ff2-3189f1b595a4' is free. Now locking as 'exclusive' (1 active user)
Thread-1366::DEBUG::2013-05-03 10:47:24,754::resourceManager::212::ResourceManager.Request::(grant) ResName=`Storage.a459c008-9cee-4b00-9ff2-3189f1b595a4`ReqID=`73ad75b9-8193-41ec-9a0a-8a2ebffcaca3`::Granted request
Thread-1366::DEBUG::2013-05-03 10:47:24,754::task::794::TaskManager.Task::(resourceAcquired) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::_resourcesAcquired: Storage.a459c008-9cee-4b00-9ff2-3189f1b595a4 (exclusive)
Thread-1366::DEBUG::2013-05-03 10:47:24,754::task::957::TaskManager.Task::(_decref) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::ref 1 aborting False
Thread-1366::INFO::2013-05-03 10:47:24,754::sp::543::Storage.StoragePool::(create) spUUID=b4ce5912-b1d0-11e2-9c85-080027c6913a poolName=Default master_sd=a459c008-9cee-4b00-9ff2-3189f1b595a4 domList=['a459c008-9cee-4b00-9ff2-3189f1b595a4'] masterVersion=22 {'LEASETIMESEC': 60, 'IOOPTIMEOUTSEC': 10, 'LEASERETRIES': 3, 'LOCKRENEWALINTERVALSEC': 5}
Thread-1366::INFO::2013-05-03 10:47:24,755::fileSD::310::Storage.StorageDomain::(validate) sdUUID=a459c008-9cee-4b00-9ff2-3189f1b595a4
Thread-1366::DEBUG::2013-05-03 10:47:24,761::persistentDict::234::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=rpi-data', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'POOL_UUID=', 'REMOTE_PATH=192.168.2.122:/mnt/data', 'ROLE=Regular', 'SDUUID=a459c008-9cee-4b00-9ff2-3189f1b595a4', 'TYPE=NFS', 'VERSION=3', '_SHA_CKSUM=96a09f31a467960cab663ef912b87f3fd6e0ac12']
Thread-1366::DEBUG::2013-05-03 10:47:24,767::persistentDict::234::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=rpi-data', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'POOL_UUID=', 'REMOTE_PATH=192.168.2.122:/mnt/data', 'ROLE=Regular', 'SDUUID=a459c008-9cee-4b00-9ff2-3189f1b595a4', 'TYPE=NFS', 'VERSION=3', '_SHA_CKSUM=96a09f31a467960cab663ef912b87f3fd6e0ac12']
Thread-1366::WARNING::2013-05-03 10:47:24,767::fileUtils::184::Storage.fileUtils::(createdir) Dir /rhev/data-center/b4ce5912-b1d0-11e2-9c85-080027c6913a already exists
Thread-1366::DEBUG::2013-05-03 10:47:24,768::persistentDict::167::Storage.PersistentDict::(transaction) Starting transaction
Thread-1366::DEBUG::2013-05-03 10:47:24,768::persistentDict::175::Storage.PersistentDict::(transaction) Finished transaction
Thread-1366::INFO::2013-05-03 10:47:24,768::clusterlock::172::SANLock::(acquireHostId) Acquiring host id for domain a459c008-9cee-4b00-9ff2-3189f1b595a4 (id: 250)
Thread-1366::ERROR::2013-05-03 10:47:25,775::task::833::TaskManager.Task::(_setError) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 893, in createStoragePool
    masterVersion, leaseParams)
  File "/usr/share/vdsm/storage/sp.py", line 568, in create
    self._acquireTemporaryClusterLock(msdUUID, leaseParams)
  File "/usr/share/vdsm/storage/sp.py", line 510, in _acquireTemporaryClusterLock
    msd.acquireHostId(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 445, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/clusterlock.py", line 187, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: ('a459c008-9cee-4b00-9ff2-3189f1b595a4', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))
Thread-1366::DEBUG::2013-05-03 10:47:25,781::task::852::TaskManager.Task::(_run) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::Task._run: af1a2711-317a-4b71-aea9-22d430e91b66 (None, 'b4ce5912-b1d0-11e2-9c85-080027c6913a', 'Default', 'a459c008-9cee-4b00-9ff2-3189f1b595a4', ['a459c008-9cee-4b00-9ff2-3189f1b595a4'], 22, None, 5, 60, 10, 3) {} failed - stopping task
Thread-1366::DEBUG::2013-05-03 10:47:25,781::task::1177::TaskManager.Task::(stop) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::stopping in state preparing (force False)
Thread-1366::DEBUG::2013-05-03 10:47:25,781::task::957::TaskManager.Task::(_decref) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::ref 1 aborting True
Thread-1366::INFO::2013-05-03 10:47:25,781::task::1134::TaskManager.Task::(prepare) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::aborting: Task is aborted: 'Cannot acquire host id' - code 661
Thread-1366::DEBUG::2013-05-03 10:47:25,782::task::1139::TaskManager.Task::(prepare) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::Prepare: aborted: Cannot acquire host id
Thread-1366::DEBUG::2013-05-03 10:47:25,782::task::957::TaskManager.Task::(_decref) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::ref 0 aborting True
Thread-1366::DEBUG::2013-05-03 10:47:25,782::task::892::TaskManager.Task::(_doAbort) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::Task._doAbort: force False
Thread-1366::DEBUG::2013-05-03 10:47:25,782::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1366::DEBUG::2013-05-03 10:47:25,782::task::568::TaskManager.Task::(_updateState) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::moving from state preparing -> state aborting
Thread-1366::DEBUG::2013-05-03 10:47:25,782::task::523::TaskManager.Task::(__state_aborting) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::_aborting: recover policy none
Thread-1366::DEBUG::2013-05-03 10:47:25,783::task::568::TaskManager.Task::(_updateState) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::moving from state aborting -> state failed
Thread-1366::DEBUG::2013-05-03 10:47:25,783::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.a459c008-9cee-4b00-9ff2-3189f1b595a4': < ResourceRef 'Storage.a459c008-9cee-4b00-9ff2-3189f1b595a4', isValid: 'True' obj: 'None'>, 'Storage.b4ce5912-b1d0-11e2-9c85-080027c6913a': < ResourceRef 'Storage.b4ce5912-b1d0-11e2-9c85-080027c6913a', isValid: 'True' obj: 'None'>}
Thread-1366::DEBUG::2013-05-03 10:47:25,783::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-1366::DEBUG::2013-05-03 10:47:25,783::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'Storage.a459c008-9cee-4b00-9ff2-3189f1b595a4'
Thread-1366::DEBUG::2013-05-03 10:47:25,783::resourceManager::553::ResourceManager::(releaseResource) Released resource 'Storage.a459c008-9cee-4b00-9ff2-3189f1b595a4' (0 active users)
Thread-1366::DEBUG::2013-05-03 10:47:25,783::resourceManager::558::ResourceManager::(releaseResource) Resource 'Storage.a459c008-9cee-4b00-9ff2-3189f1b595a4' is free, finding out if anyone is waiting for it.
Thread-1366::DEBUG::2013-05-03 10:47:25,784::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.a459c008-9cee-4b00-9ff2-3189f1b595a4', Clearing records.
Thread-1366::DEBUG::2013-05-03 10:47:25,784::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'Storage.b4ce5912-b1d0-11e2-9c85-080027c6913a'
Thread-1366::DEBUG::2013-05-03 10:47:25,784::resourceManager::553::ResourceManager::(releaseResource) Released resource 'Storage.b4ce5912-b1d0-11e2-9c85-080027c6913a' (0 active users)
Thread-1366::DEBUG::2013-05-03 10:47:25,784::resourceManager::558::ResourceManager::(releaseResource) Resource 'Storage.b4ce5912-b1d0-11e2-9c85-080027c6913a' is free, finding out if anyone is waiting for it.
Thread-1366::DEBUG::2013-05-03 10:47:25,784::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.b4ce5912-b1d0-11e2-9c85-080027c6913a', Clearing records.
Thread-1366::ERROR::2013-05-03 10:47:25,785::dispatcher::66::Storage.Dispatcher.Protect::(run) {'status': {'message': "Cannot acquire host id: ('a459c008-9cee-4b00-9ff2-3189f1b595a4', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))", 'code': 661}}
Thread-1369::DEBUG::2013-05-03 10:47:30,589::task::568::TaskManager.Task::(_updateState) Task=`5baf4299-619a-4cbc-80dd-f72a9a68aaed`::moving from state init -> state preparing
Thread-1369::INFO::2013-05-03 10:47:30,589::logUtils::37::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-1369::INFO::2013-05-03 10:47:30,589::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {}
Thread-1369::DEBUG::2013-05-03 10:47:30,589::task::1151::TaskManager.Task::(prepare) Task=`5baf4299-619a-4cbc-80dd-f72a9a68aaed`::finished: {}
Thread-1369::DEBUG::2013-05-03 10:47:30,590::task::568::TaskManager.Task::(_updateState) Task=`5baf4299-619a-4cbc-80dd-f72a9a68aaed`::moving from state preparing -> state finished

Comment 1 Haim 2013-05-03 09:14:53 UTC
please attach vdsm logs (preferred SOS report).
can you list the permissions on the nfs storage server (dir)?
what type of NFS are you using (i.e; is it NFSv3 or NFSv4).

Comment 2 Pablo Iranzo Gómez 2013-05-03 09:20:55 UTC
Haim,

NFS in default mode (NFS3) (I can see the nfs being mounted when issuing a 'mount' from the hypervisor)

Permisions are user and group 36:36 for the full folder and it creates some content:

(from the NFS server)

/mnt/data/a459c008-9cee-4b00-9ff2-3189f1b595a4
/mnt/data/a459c008-9cee-4b00-9ff2-3189f1b595a4/images
/mnt/data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md
/mnt/data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md/outbox
/mnt/data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md/inbox
/mnt/data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md/metadata
/mnt/data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md/ids
/mnt/data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md/leases


I'm attaching sos report in a few minutes for both hypervisors.

Thanks,
Pablo

Comment 3 Pablo Iranzo Gómez 2013-05-03 09:22:38 UTC
Created attachment 743127 [details]
Sosreport from host 2

Comment 4 Pablo Iranzo Gómez 2013-05-03 09:24:04 UTC
Created attachment 743128 [details]
Sosreport from host 1

Comment 5 Ayal Baron 2013-05-08 10:36:01 UTC
Fede, any idea?

Comment 6 Federico Simoncelli 2013-05-08 12:53:51 UTC
WRT:

Thread-1366::ERROR::2013-05-03 10:47:25,775::task::833::TaskManager.Task::(_setError) Task=`af1a2711-317a-4b71-aea9-22d430e91b66`::Unexpected error
Traceback (most recent call last):
AcquireHostIdFailure: Cannot acquire host id: ('a459c008-9cee-4b00-9ff2-3189f1b595a4', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))

The sanlock error is (found in /var/log/messages):

May  3 10:01:08 Virtualizador sanlock[2041]: 2013-05-03 10:01:08+0200 1939 [3939]: open error -13 /rhev/data-center/mnt/192.168.2.122:_mnt_data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md/ids
May  3 10:01:08 Virtualizador sanlock[2041]: 2013-05-03 10:01:08+0200 1939 [3939]: s4 open_disk /rhev/data-center/mnt/192.168.2.122:_mnt_data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md/ids error -13

The file cannot be opened because of errno 13: EACCESS

This is most likely a problem in the file permissions.
Please double check (from the vdsm host) the permissions (mode/owner/group) of the ids file and the sanlock process:

# ls -l /rhev/data-center/mnt/192.168.2.122:_mnt_data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md/ids

# ps aux | grep sanlock

# id sanlock

Also make sure that all the previous paths (/rhev, /rhev/data-center, etc...) are accessible by the sanlock user.

Comment 7 Pablo Iranzo Gómez 2013-05-10 19:31:47 UTC
Tomorrow I'll try to mount it from the RHEV environment.

Regards,
Pablo

[iranzo@x201 d3382225-a3fd-44e0-9973-67d369c9b017]$ ls -lR
.:
total 24
drwxr-xr-x. 3   36 kvm   4096 may  3 09:59 data
drwxr-xr-x. 2   36 kvm   4096 abr 30 20:54 iso
drwx------. 2 root root 16384 abr 30 18:50 lost+found

./data:
total 4
drwxr-xr-x. 4 36 kvm 4096 may  3 09:59 a459c008-9cee-4b00-9ff2-3189f1b595a4

./data/a459c008-9cee-4b00-9ff2-3189f1b595a4:
total 8
drwxr-xr-x. 2 36 kvm 4096 may  3 09:59 dom_md
drwxr-xr-x. 2 36 kvm 4096 may  3 09:59 images

./data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md:
total 2052
-rw-rw----. 1 36 kvm 1048576 may  3 09:59 ids
-rw-rw----. 1 36 kvm       0 may  3 09:59 inbox
-rw-rw----. 1 36 kvm 2097152 may  3 09:59 leases
-rw-r--r--. 1 36 kvm     293 may  3 09:59 metadata
-rw-rw----. 1 36 kvm       0 may  3 09:59 outbox

./data/a459c008-9cee-4b00-9ff2-3189f1b595a4/images:
total 0

./iso:
total 0
ls: no se puede abrir el directorio ./lost+found: Permiso denegado
[iranzo@x201 d3382225-a3fd-44e0-9973-67d369c9b017]$ ls -lRZ
.:
drwxr-xr-x.   36 kvm  system_u:object_r:file_t:s0      data
drwxr-xr-x.   36 kvm  system_u:object_r:file_t:s0      iso
drwx------. root root system_u:object_r:file_t:s0      lost+found

./data:
drwxr-xr-x. 36 kvm system_u:object_r:file_t:s0      a459c008-9cee-4b00-9ff2-3189f1b595a4

./data/a459c008-9cee-4b00-9ff2-3189f1b595a4:
drwxr-xr-x. 36 kvm system_u:object_r:file_t:s0      dom_md
drwxr-xr-x. 36 kvm system_u:object_r:file_t:s0      images

./data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md:
-rw-rw----. 36 kvm system_u:object_r:file_t:s0      ids
-rw-rw----. 36 kvm system_u:object_r:file_t:s0      inbox
-rw-rw----. 36 kvm system_u:object_r:file_t:s0      leases
-rw-r--r--. 36 kvm system_u:object_r:file_t:s0      metadata
-rw-rw----. 36 kvm system_u:object_r:file_t:s0      outbox

./data/a459c008-9cee-4b00-9ff2-3189f1b595a4/images:

./iso:
ls: no se puede abrir el directorio ./lost+found: Permiso denegado

Comment 8 Pablo Iranzo Gómez 2013-05-11 20:17:48 UTC
[root@Virtualizador2 ~]# id sanlock
uid=179(sanlock) gid=179(sanlock) grupos=179(sanlock),6(disk),36(kvm),107(qemu)

[root@Virtualizador2 ~]# ps aux|grep sanlock
root      2087  0.0  0.0  13548  3316 ?        SLs  22:11   0:00 wdmd -G sanlock
sanlock   2142  0.0  0.4 263236 17516 ?        SLsl 22:11   0:00 sanlock daemon -U sanlock -G sanlock
root      2144  0.0  0.0  23076   288 ?        S    22:11   0:00 sanlock daemon -U sanlock -G sanlock
root      2738  0.0  0.0 105312   884 pts/0    S+   22:17   0:00 grep sanlock

Comment 9 Federico Simoncelli 2013-05-15 16:55:55 UTC
Try to access the ids file as the sanlock user:

# su -s /bin/bash sanlock
$ cd /rhev/data-center/mnt/192.168.2.122:_mnt_data/a459c008-9cee-4b00-9ff2-3189f1b595a4/dom_md
$ hexdump -C ids

Comment 10 Jordan Metzmeier 2013-05-16 15:20:08 UTC
bash-4.1$ id
uid=179(sanlock) gid=179(sanlock) groups=179(sanlock),6(disk),36(kvm),107(qemu) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
bash-4.1$ ls -l
total 2060
-rw-rw----. 1 vdsm kvm 1048576 2013-05-16 14:56 ids
-rw-rw----. 1 vdsm kvm       0 2013-05-16 14:56 inbox
-rw-rw----. 1 vdsm kvm 2097152 2013-05-16 14:56 leases
-rw-r--r--. 1 vdsm kvm     296 2013-05-16 14:56 metadata
-rw-rw----. 1 vdsm kvm       0 2013-05-16 14:56 outbox
bash-4.1$ hexdump -C ids
hexdump: ids: Permission denied
hexdump: ids: Bad file descriptor

The same issue occurs with root:
[root@revtest1 dom_md]# hexdump -C ids
hexdump: ids: Permission denied
hexdump: ids: Bad file descriptor

Comment 11 Pablo Iranzo Gómez 2013-05-18 10:26:45 UTC
[sanlock@Virtualizador2 dom_md]$ hexdump -C ids
hexdump: ids: Permission denied
hexdump: ids: Bad file descriptor
[sanlock@Virtualizador2 dom_md]$ ls -l
total 2052
-rw-rw----. 1 vdsm kvm 1048576 May  3 09:59 ids
-rw-rw----. 1 vdsm kvm       0 May  3 09:59 inbox
-rw-rw----. 1 vdsm kvm 2097152 May  3 09:59 leases
-rw-r--r--. 1 vdsm kvm     293 May  3 09:59 metadata
-rw-rw----. 1 vdsm kvm       0 May  3 09:59 outbox
[sanlock@Virtualizador2 dom_md]$ whoami 
sanlock
[sanlock@Virtualizador2 dom_md]$ id
uid=179(sanlock) gid=179(sanlock) groups=179(sanlock),6(disk),36(kvm),107(qemu) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[sanlock@Virtualizador2 dom_md]$

Comment 12 Pablo Iranzo Gómez 2013-05-18 10:35:59 UTC
I forgot to mention that I had to manually mount it under /mnt as RHEV couldn't mount it directly because of the sanlock failure.

Comment 13 Pablo Iranzo Gómez 2013-05-18 11:34:56 UTC
(In reply to comment #11)
> [sanlock@Virtualizador2 dom_md]$ hexdump -C ids
> hexdump: ids: Permission denied
> hexdump: ids: Bad file descriptor
> [sanlock@Virtualizador2 dom_md]$ ls -l
> total 2052
> -rw-rw----. 1 vdsm kvm 1048576 May  3 09:59 ids
> -rw-rw----. 1 vdsm kvm       0 May  3 09:59 inbox
> -rw-rw----. 1 vdsm kvm 2097152 May  3 09:59 leases
> -rw-r--r--. 1 vdsm kvm     293 May  3 09:59 metadata
> -rw-rw----. 1 vdsm kvm       0 May  3 09:59 outbox
> [sanlock@Virtualizador2 dom_md]$ whoami 
> sanlock
> [sanlock@Virtualizador2 dom_md]$ id
> uid=179(sanlock) gid=179(sanlock)
> groups=179(sanlock),6(disk),36(kvm),107(qemu)
> context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> [sanlock@Virtualizador2 dom_md]$

BTW, putting permissions as 777 for the whole export allowed me to mount it, which shouldn't be the thing to do on production environment

After that, I was able to activate the data domain, create a new iso domain and an export domain (and putting them as 777 too).

Even after that, the files created for a VM have 660 attributes and can't be used to boot the vm unles changing permissions again :/

Comment 14 Pablo Iranzo Gómez 2013-06-07 14:43:34 UTC
NFS server is a raspberry running raspbian exporting an USB mounted HDD

Comment 15 Federico Simoncelli 2013-06-07 14:48:37 UTC
(In reply to Pablo Iranzo Gómez from comment #14)
> NFS server is a raspberry running raspbian exporting an USB mounted HDD

This configuration is highly untested, I guess you have to discover why your server isn't allowing qemu and sanlock to access the files.

Given:

# id qemu
uid=107(qemu) gid=107(qemu) groups=107(qemu),36(kvm)

# id sanlock
uid=179(sanlock) gid=179(sanlock) groups=179(sanlock),6(disk),36(kvm),107(qemu)

The server should allow the users qemu/sanlock to access the files:

-rw-rw----. 1 vdsm kvm 1048576 May  3 09:59 ids
-rw-rw----. 1 vdsm kvm 2097152 May  3 09:59 leases
...
(disk images...)

if this doesn't happen it means that you have to discover what is wrong with the NFS server provided with raspbian.

Quick tests to verify that the server is fixed:

# su -s /bin/sh qemu
# hexdump -C ids
...

# su -s /bin/sh sanlock
# hexdump -C ids
...

Comment 16 Karim Boumedhel 2013-06-07 15:01:49 UTC
hello, i m getting a similar error message when trying to add a brand new storage FC domain on rhev 3.1
find attached engine.log and vdsm.log when creating storage domain GES-CP-000

Comment 17 Karim Boumedhel 2013-06-07 15:03:13 UTC
Created attachment 758198 [details]
engine.log when adding a FC SD

Comment 18 Karim Boumedhel 2013-06-07 15:03:47 UTC
Created attachment 758199 [details]
vdsm.log when adding a FC SD

Comment 19 Karim Boumedhel 2013-06-07 15:12:10 UTC
Created attachment 758206 [details]
additional vdsm.log when attaching a SD from rhvges2

Comment 20 Karim Boumedhel 2013-06-07 15:17:21 UTC
Created attachment 758209 [details]
correct vdsm.log

Comment 21 Karim Boumedhel 2013-06-07 15:23:04 UTC
Created attachment 758220 [details]
sanlock log from rhvges1

Comment 22 Pablo Iranzo Gómez 2013-06-07 16:19:56 UTC
(In reply to Federico Simoncelli from comment #15)
> (In reply to Pablo Iranzo Gómez from comment #14)
> > NFS server is a raspberry running raspbian exporting an USB mounted HDD
> 
> This configuration is highly untested, I guess you have to discover why your
> server isn't allowing qemu and sanlock to access the files.
> 
> Given:
> 
> # id qemu
> uid=107(qemu) gid=107(qemu) groups=107(qemu),36(kvm)
> 
> # id sanlock
> uid=179(sanlock) gid=179(sanlock)
> groups=179(sanlock),6(disk),36(kvm),107(qemu)
> 
> The server should allow the users qemu/sanlock to access the files:
> 
> -rw-rw----. 1 vdsm kvm 1048576 May  3 09:59 ids
> -rw-rw----. 1 vdsm kvm 2097152 May  3 09:59 leases
> ...
> (disk images...)
> 
> if this doesn't happen it means that you have to discover what is wrong with
> the NFS server provided with raspbian.
> 
> Quick tests to verify that the server is fixed:
> 
> # su -s /bin/sh qemu
> # hexdump -C ids
> ...
> 
> # su -s /bin/sh sanlock
> # hexdump -C ids
> ...

[sanlock@Virtualizador data-center]$ cd mnt/192.168.2.122\:_mnt_data/
bash: cd: mnt/192.168.2.122:_mnt_data/: Permiso denegado
[sanlock@Virtualizador data-center]$ cd mnt
[sanlock@Virtualizador mnt]$ ls -l
total 8
drwxrwx---. 3 vdsm kvm 4096 may  3 09:59 192.168.2.122:_mnt_data
drwxr-xr-x. 2 vdsm kvm 4096 may 13 22:57 rhevm.example.com:_var_lib_exports_iso
[sanlock@Virtualizador mnt]$ id
uid=179(sanlock) gid=179(sanlock) grupos=179(sanlock),6(disk),36(kvm),107(qemu) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[sanlock@Virtualizador mnt]$ newgrp 
[sanlock@Virtualizador mnt]$ man newgrp
No hay ninguna página sobre newgrp
[sanlock@Virtualizador mnt]$ newgrp kvm
[sanlock@Virtualizador mnt]$ id
uid=179(sanlock) gid=36(kvm) grupos=179(sanlock),6(disk),36(kvm),107(qemu) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[sanlock@Virtualizador mnt]$ cd 192.168.2.122\:_mnt_data/
[sanlock@Virtualizador 192.168.2.122:_mnt_data]$ ls
a459c008-9cee-4b00-9ff2-3189f1b595a4
[sanlock@Virtualizador 192.168.2.122:_mnt_data]$ cd a459c008-9cee-4b00-9ff2-3189f1b595a4/
[sanlock@Virtualizador a459c008-9cee-4b00-9ff2-3189f1b595a4]$ ls
dom_md  images  master
[sanlock@Virtualizador a459c008-9cee-4b00-9ff2-3189f1b595a4]$ 


When I'm doing the newgrp kvm I can now access the files without further problems.

Before this I did a chmod -R 770 /mnt/data/ to revert back to the previous status of the NFS server

Comment 23 Federico Simoncelli 2013-06-07 16:24:35 UTC
(In reply to Karim Boumedhel from comment #16)
> hello, i m getting a similar error message when trying to add a brand new
> storage FC domain on rhev 3.1
> find attached engine.log and vdsm.log when creating storage domain GES-CP-000

In this case I've determined that sanlock was started before being added to the correct groups.
Restarting the service fixed the issue.

# ps ax | grep sanlock
12402 ?        SLsl   0:00 sanlock daemon -U sanlock -G sanlock

# cat /proc/12402/status
Name:   sanlock
Uid:    179     179     179     179
Gid:    179     179     179     179
Groups: 0 6 179

# service sanlock restart

# ps ax | grep sanlock
15840 ?        SLsl   0:00 sanlock daemon -U sanlock -G sanlock

# cat /proc/15840/status                                                                                                                                       
Name:   sanlock
Uid:    179     179     179     179
Gid:    179     179     179     179
Groups: 0 6 36 107 179 

This happened because the packages were installed in an unusual order:

Jun 07 08:24:23 Installed: sanlock-2.6-2.el6.x86_64
...
Jun 07 10:28:54 Installed: vdsm-4.9-113.1.el6.x86_64

And the host was never rebooted since.

This is unrelated to the bug.

Comment 24 Federico Simoncelli 2013-06-07 17:31:59 UTC
As described in comment 15 this was a server side issue, after logging in the raspberry pi I noticed:

# ps aux | grep mountd
root      2426  0.0  0.8   3320  1528 ?        Ss   16:05   0:00 /usr/sbin/rpc.mountd --manage-gids

I edited /etc/default/nfs-kernel-server commenting out RPCMOUNTDOPTS:

# grep RPCMOUNTDOPTS /etc/default/nfs-kernel-server
#RPCMOUNTDOPTS=--manage-gids

and I restarted the server:

# /etc/init.d/nfs-kernel-server restart

# ps aux | grep mount
root      2912  0.0  0.6   3188  1200 ?        Ss   17:03   0:00 /usr/sbin/rpc.mountd

After that it has been possible to use the NFS storage domain.

Some background on -g/--manage-gids:

"""
Accept  requests  from  the kernel to map user id numbers into  lists of group id numbers for use in access control.  An NFS request will normally (except when using Kerberos or other crypto‐graphic authentication) contains a user-id and a list of group-ids.  Due to a limitation in the NFS protocol, at most 16 groups ids can be listed.  If you use the -g flag, then  the  list  of group  ids  received  from the client will be replaced by a list of group ids determined by an appropriate lookup on the server. Note that the 'primary' group id is not affected so a newgroup command on the client will still be effective.  This function requires a Linux Kernel with version at least 2.6.21.
"""

Comment 25 Ayal Baron 2013-06-10 15:19:33 UTC
This issue was a configuration problem on the server side (raspberry pi)


Note You need to log in before you can comment on or make changes to this bug.