Description of problem: Reproduces everytime on RHV automation: Storage pool removal fails and a TemporaryFailure appears on vdsm.log. Version-Release number of selected component (if applicable): vdsm-4.20.35-1.el7ev.x86_64 ovirt-engine-4.2.4.5-0.1.el7_3.noarch How reproducible: Always Steps to Reproduce: 1. Create Datacenter for test DC1 and a cluster for it with one active host 2. Create and attach an iSCSI domain to the DC 2. Deactivate the host 3. Remove the DC Actual results: Storage pool removal fails: 2018-08-05 04:04:55,049+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (default task-13) [7e8e7fb7] START, ConnectStoragePoolVDSCommand(HostName = host_mixed_3, ConnectStorageP oolVDSCommandParameters:{hostId='b9b0ad97-5ef1-48d1-9da8-e0f4ed67da6a', vdsId='b9b0ad97-5ef1-48d1-9da8-e0f4ed67da6a', storagePoolId='d8fa912c-17ff-43e7-a7b1-15e4c4e27739', masterVersion='2'}), log id: 78e7ec1f 2018-08-05 04:04:55,050+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (default task-13) [7e8e7fb7] Executing with domain map: {0034cd0d-ab48-451e-8277-ab5b9948cb6f=active} 2018-08-05 04:04:55,758+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (default task-13) [7e8e7fb7] FINISH, ConnectStoragePoolVDSCommand, log id: 78e7ec1f 2018-08-05 04:04:55,773+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (default task-13) [7e8e7fb7] hostFromVds::selectedVds - 'host_mixed_3', spmStatus 'SPM', storage pool 'dc_type_0504023528', storage pool version '4.2' 2018-08-05 04:04:55,776+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (default task-13) [7e8e7fb7] Initialize Irs proxy from vds: rose12.scl.lab.tlv.redhat.com 2018-08-05 04:04:55,780+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [7e8e7fb7] EVENT_ID: IRS_HOSTED_ON_VDS(204), Storage Pool Manager runs on Host rose12.scl.lab.tlv.redhat.com (Address: rose12.scl.lab.tlv.redhat.com), Data Center dc_type_0504023528. 2018-08-05 04:04:55,782+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-13) [7e8e7fb7] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{storagePoolId='d8fa912c-17ff-43e7-a7b1-15e4c4e27739', ignoreFailoverLimit='false', vdsId='b9b0ad97-5ef1-48d1-9da8-e0f4ed67da6a', ignoreStopFailed='false'}), log id: 218ef3d0 2018-08-05 04:04:55,785+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-13) [7e8e7fb7] START, SpmStopVDSCommand(HostName = host_mixed_3, SpmStopVDSCommandParameters:{hostId='b9b0ad97-5ef1-48d1-9da8-e0f4ed67da6a', storagePoolId='d8fa912c-17ff-43e7-a7b1-15e4c4e27739'}), log id: 5eae358f 2018-08-05 04:04:55,788+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-13) [7e8e7fb7] SpmStopVDSCommand::Stopping SPM on vds 'host_mixed_3', pool id 'd8fa912c-17ff-43e7-a7b1-15e4c4e27739' 2018-08-05 04:04:55,795+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-13) [7e8e7fb7] FINISH, SpmStopVDSCommand, log id: 5eae358f 2018-08-05 04:04:55,797+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (default task-13) [7e8e7fb7] FINISH, ResetIrsVDSCommand, log id: 218ef3d0 2018-08-05 04:04:55,797+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SpmStopOnIrsVDSCommand] (default task-13) [7e8e7fb7] FINISH, SpmStopOnIrsVDSCommand, log id: 2dd9355b 2018-08-05 04:04:55,799+03 ERROR [org.ovirt.engine.core.bll.storage.pool.RemoveStoragePoolCommand] (default task-13) [7e8e7fb7] Command 'org.ovirt.engine.core.bll.storage.pool.RemoveStoragePoolCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed (Failed with error ENGINE and code 5001) 2018-08-05 04:04:55,802+03 INFO [org.ovirt.engine.core.bll.CommandCompensator] (default task-13) [7e8e7fb7] Command [id=f755e999-16d6-40df-a97f-44151ad9849b]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: id=StoragePoolIsoMapId:{storagePoolId='d8fa912c-17ff-43e7-a7b1-15e4c4e27739', storageId='0034cd0d-ab48-451e-8277-ab5b9948cb6f'}. 2018-08-05 04:04:55,805+03 INFO [org.ovirt.engine.core.bll.CommandCompensator] (default task-13) [7e8e7fb7] Command [id=f755e999-16d6-40df-a97f-44151ad9849b]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.network.Network; snapshot: id=98fe5b6c-b7cd-4d11-a591-ba0a629830f0. 2018-08-05 04:04:55,808+03 INFO [org.ovirt.engine.core.bll.CommandCompensator] (default task-13) [7e8e7fb7] Command [id=f755e999-16d6-40df-a97f-44151ad9849b]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.network.VnicProfile; snapshot: id=eea2861e-714c-47e9-8826-5dbf9d4bb165. 2018-08-05 04:04:55,819+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [7e8e7fb7] EVENT_ID: USER_REMOVE_STORAGE_POOL_FAILED(955), Failed to remove Data Center dc_type_0504023528. (User: admin@internal-authz) 2018-08-05 04:04:55,137+0300 INFO (jsonrpc/6) [vdsm.api] START getSpmStatus(spUUID='d8fa912c-17ff-43e7-a7b1-15e4c4e27739', options=None) from=::ffff:10.35.162.71,33552, task_id=7fd38f0f-538b-4dd5-81aa-792706e77 895 (api:46) 2018-08-05 04:04:55,147+0300 INFO (jsonrpc/6) [vdsm.api] FINISH getSpmStatus error=Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__g e12__nfs__3/0034cd0d-ab48-451e-8277-ab5b9948cb6f/dom_md/leases', offset=1048576): (11, 'Sanlock get hosts failure', 'Resource temporarily unavailable') from=::ffff:10.35.162.71,33552, task_id=7fd38f0f-538b-4dd5- 81aa-792706e77895 (api:50) 2018-08-05 04:04:55,148+0300 ERROR (jsonrpc/6) [storage.TaskManager.Task] (Task='7fd38f0f-538b-4dd5-81aa-792706e77895') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in getSpmStatus File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 636, in getSpmStatus status = self._getSpmStatusInfo(pool) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 630, in _getSpmStatusInfo (pool.spmRole,) + pool.getSpmStatus())) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 141, in getSpmStatus return self._backend.getSpmStatus() File "/usr/lib/python2.7/site-packages/vdsm/storage/spbackends.py", line 437, in getSpmStatus raise exception.expected(e) TemporaryFailure: Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge12__nfs__3/0034cd0d-ab48-451e-8277-ab5b9948cb6f/dom_md/leases', offset=1048576): (11, 'Sanlock get hosts failure', 'Resource temporarily unavailable') 2018-08-05 04:04:55,148+0300 DEBUG (jsonrpc/6) [storage.TaskManager.Task] (Task='7fd38f0f-538b-4dd5-81aa-792706e77895') Task._run: 7fd38f0f-538b-4dd5-81aa-792706e77895 ('d8fa912c-17ff-43e7-a7b1-15e4c4e27739',) {} failed - stopping task (task:894) Expected results: Storage pool removal should succeed Additional info: logs
Created attachment 1474291 [details] logs
Created attachment 1474353 [details] REST calls log I'm unable to reproduce this manually, this happens so far only in automation (using REST). Attached the log we generate during the test execution which contains all the REST calls. 2018-08-05 04:04:31,701 - MainThread - datacenters - INFO - Using Correlation-Id: datacenters_delete_713677b2-e313-44e8 2018-08-05 04:04:31,701 - MainThread - datacenters - DEBUG - DELETE request content is -- url:/ovirt-engine/api/datacenters/d8fa912c-17ff-43e7-a7b1-15e4c4e27739 2018-08-05 04:04:55,937 - MainThread - core_api - DEBUG - Request DELETE response time: 0.110 2018-08-05 04:04:55,937 - MainThread - datacenters - DEBUG - Cleaning Correlation-Id: datacenters_delete_713677b2-e313-44e8 2018-08-05 04:04:55,938 - MainThread - api_utils - ERROR - Failed to delete element NOT as expected: Status: 400 Reason: Bad Request Detail: [Internal Engine Error] 2018-08-05 04:04:55,939 - MainThread - datacenters - ERROR - Response code is not valid, expected is: [200, 202, 204], actual is: 400
Hi, Was this test OK in 4.1? Thanks
Yes, this test was written in November 2016 and being executed since then
Benny, please take a look. Thanks
Hi, Is there only one host in the DC? It seems not possible to remove a DC without an active host: Error while executing action: Cannot remove Data Center. There is no active Host in the Data Center. What kind of setup is this?
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Hi Benny, This is the only host in the DC. Sorry for the confusion. We first deactivate the only domain in the DC and then remove the DC (without deactivating the host). From REST calls log: We first deactivate the master domain: 2018-08-05 04:03:56,756 - MainThread - storagedomains - INFO - Deactivating domain sd_type_0504025897 in data center dc_type_0504023528 2018-08-05 04:03:56,756 - MainThread - storagedomains - DEBUG - Action request content is -- url:/ovirt-engine/api/datacenters/d8fa912c-17ff-43e7-a7b1-15e4c4e27739/storagedomains/0034cd0d-ab48-451e-8277-ab5b9948cb6f/deactivate body:<action> <async>false</async> <grace_period> <expiry>10</expiry> </grace_period> </action> 2018-08-05 04:03:56,757 - MainThread - storagedomains - INFO - Using Correlation-Id: storagedomains_syncAction_116d9a0b-bbc7-4dec 2018-08-05 04:03:59,897 - MainThread - core_api - DEBUG - Request POST response time: 0.020 2018-08-05 04:03:59,898 - MainThread - storagedomains - DEBUG - Cleaning Correlation-Id: storagedomains_syncAction_116d9a0b-bbc7-4dec 2018-08-05 04:03:59,898 - MainThread - storagedomains - DEBUG - Response code is valid: [200, 202] 2018-08-05 04:03:59,898 - MainThread - storagedomains - DEBUG - Response body for action request is: <action> <async>false</async> <grace_period> <expiry>10</expiry> </grace_period> <status>complete</status> <storage_domain href="/ovirt-engine/api/datacenters/d8fa912c-17ff-43e7-a7b1-15e4c4e27739/storagedomains/0034cd0d-ab48-451e-8277-ab5b9948cb6f" id="0034cd0d-ab48-451e-8277-ab5b9948cb6f"> <actions> <link href="/ovirt-engine/api/datacenters/d8fa912c-17ff-43e7-a7b1-15e4c4e27739/storagedomains/0034cd0d-ab48-451e-8277-ab5b9948cb6f/activate" rel="activate"/> <link href="/ovirt-engine/api/datacenters/d8fa912c-17ff-43e7-a7b1-15e4c4e27739/storagedomains/0034cd0d-ab48-451e-8277-ab5b9948cb6f/deactivate" rel="deactivate"/> </actions> <link href="/ovirt-engine/api/datacenters/d8fa912c-17ff-43e7-a7b1-15e4c4e27739/storagedomains/0034cd0d-ab48-451e-8277-ab5b9948cb6f/disks" rel="disks"/> <data_center href="/ovirt-engine/api/datacenters/d8fa912c-17ff-43e7-a7b1-15e4c4e27739" id="d8fa912c-17ff-43e7-a7b1-15e4c4e27739"/> <data_centers> <data_center href="/ovirt-engine/api/datacenters/d8fa912c-17ff-43e7-a7b1-15e4c4e27739" id="d8fa912c-17ff-43e7-a7b1-15e4c4e27739"/> </data_centers> </storage_domain> </action> And then delete the data center: 2018-08-05 04:04:31,701 - MainThread - datacenters - INFO - Using Correlation-Id: datacenters_delete_713677b2-e313-44e8 2018-08-05 04:04:31,701 - MainThread - datacenters - DEBUG - DELETE request content is -- url:/ovirt-engine/api/datacenters/d8fa912c-17ff-43e7-a7b1-15e4c4e27739 2018-08-05 04:04:55,937 - MainThread - core_api - DEBUG - Request DELETE response time: 0.110 2018-08-05 04:04:55,937 - MainThread - datacenters - DEBUG - Cleaning Correlation-Id: datacenters_delete_713677b2-e313-44e8 2018-08-05 04:04:55,938 - MainThread - api_utils - ERROR - Failed to delete element NOT as expected: Status: 400 Reason: Bad Request Detail: [Internal Engine Error] 2018-08-05 04:04:55,939 - MainThread - datacenters - ERROR - Response code is not valid, expected is: [200, 202, 204], actual is: 400 2018-08-05 04:04:55,939 - MainThread - root - ERROR - Failed to remove datacenter dc_type_0504023528 from setup with {'force': false} 2018-08-05 04:04:55,939 - MainThread - art.hl_lib.dcs - ERROR - Remove data center dc_type_0504023528 failed
Can please you attach sanlock.log as well?
Clearing needinfo, I can now reproduce. This is related specifically to NFS v3
Notes: 2018-08-15 18:37:29,559+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (default task-7) [439422a8] START, SpmStartVDSCommand(HostName = hosto-rhel, SpmStartVDSCommandParameters:{hostId='bb16fd26-d326-4531-b2f7-a0e1c23f1a76', storagePoolId='4a5902c1-266d-4c3c-8e81-b0043582d6dd', prevId='-1', prevLVER='-1', storagePoolFormatType='V4', recoveryMode='Manual', SCSIFencing='false'}), log id: 1886ffb3 2018-08-15 18:37:50,869+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (default task-7) [439422a8] spmStart polling ended: taskId '69205d06-b6d6-41b8-beb4-b94c2e6c85f2' task status 'finished' SpmStatus Failed: 2018-08-15 18:37:29,565+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (default task-7) [439422a8] spmStart polling started: taskId '69205d06-b6d6-41b8-beb4-b94c2e6c85f2' 2018-08-15 18:37:50,869+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (default task-7) [439422a8] spmStart polling ended: taskId '69205d06-b6d6-41b8-beb4-b94c2e6c85f2' task status 'finished' 2018-08-15 18:37:50,881+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (default task-7) [439422a8] Failed in 'SpmStatusVDS' method 2018-08-15 18:37:50,891+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-7) [439422a8] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM hosto-rhel command SpmStatusVDS failed: Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/10.35.0.233:_root_storage__domains_sd1/e2b2720f-ffad-4517-b7c4-d09c11b6b717/dom_md/leases', offset=1048576): (11, 'Sanlock get hosts failure', 'Resource temporarily unavailable') VDSM: startSpm 2018-08-15 18:37:29,559+0300 INFO (jsonrpc/3) [vdsm.api] START spmStart(spUUID=u'4a5902c1-266d-4c3c-8e81-b0043582d6dd', prevID=-1, prevLVER=u'-1', maxHostID=250, domVersion=u'4', options=None) from=::ffff:10.35.206.51,56932, flow_id=439422a8, task_id=69205d06-b6d6-41b8-beb4-b94c2e6c85f2 (api:47) Lease acquisiton: 2018-08-15 18:37:50,567+0300 INFO (tasks/8) [storage.SANLock] Acquiring Lease(name='SDM', path=u'/rhev/data-center/mnt/10.35.0.233:_root_storage__domains_sd1/e2b2720f-ffad-4517-b7c4-d09c11b6b717/dom_md/leases', offset=1048576) for host id 2 (clusterlock:386) 2018-08-15 18:37:50,669+0300 INFO (tasks/8) [storage.SANLock] Successfully acquired Lease(name='SDM', path=u'/rhev/data-center/mnt/10.35.0.233:_root_storage__domains_sd1/e2b2720f-ffad-4517-b7c4-d09c11b6b717/dom_md/leases', offset=1048576) for host id 2 (clusterlock:424) 2018-08-15 18:37:50,670+0300 DEBUG (tasks/8) [storage.StoragePool] spm lock acquired successfully (sp:320) 2018-08-15 18:37:50,670+0300 DEBUG (tasks/8) [storage.StoragePoolMemoryBackend] this storage pool implementation ignores the set spm status requests (lver=0, spmid=2) (spbackends:443) Completed successfully: 2018-08-15 18:37:50,770+0300 DEBUG (tasks/8) [storage.TaskManager.Task] (Task='69205d06-b6d6-41b8-beb4-b94c2e6c85f2') moving from state finished -> state finished (task:602) SpmStatus Failed: 2018-08-15 18:37:50,872+0300 INFO (jsonrpc/0) [vdsm.api] START getSpmStatus(spUUID=u'4a5902c1-266d-4c3c-8e81-b0043582d6dd', options=None) from=::ffff:10.35.206.51,56932, flow_id=439422a8, task_id=bb55e24a-750a-4d55-8115-591583e58fc4 (api:47) 2018-08-15 18:37:50,874+0300 INFO (jsonrpc/0) [vdsm.api] FINISH getSpmStatus error=Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/10.35.0.233:_root_storage__domains_sd1/e2b2720f-ffad-4517-b7c4-d09c11b6b717/dom_md/leases', offset=1048576): (11, 'Sanlock get hosts failure', 'Resource temporarily unavailable') from=::ffff:10.35.206.51,56932, flow_id=439422a8, task_id=bb55e24a-750a-4d55-8115-591583e58fc4 (api:51) 2018-08-15 18:37:50,874+0300 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='bb55e24a-750a-4d55-8115-591583e58fc4') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in getSpmStatus File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 49, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 634, in getSpmStatus status = self._getSpmStatusInfo(pool) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 628, in _getSpmStatusInfo (pool.spmRole,) + pool.getSpmStatus())) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 138, in getSpmStatus return self._backend.getSpmStatus() File "/usr/lib/python2.7/site-packages/vdsm/storage/spbackends.py", line 437, in getSpmStatus raise exception.expected(e) TemporaryFailure: Cannot inquire Lease(name='SDM', path=u'/rhev/data-center/mnt/10.35.0.233:_root_storage__domains_sd1/e2b2720f-ffad-4517-b7c4-d09c11b6b717/dom_md/leases', offset=1048576): (11, 'Sanlock get hosts failure', 'Resource temporarily unavailable')
Storage pool removal succeeds using NFSv3 domain. 2018-09-01 21:08:48,542+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-66) [6abce38] START, DisconnectStorageServerVDSComma nd(HostName = host_mixed_3, StorageServerConnectionManagementVDSParameters:{hostId='5c23a319-c917-49d1-9113-c02e6f83a147', storagePoolId='00000000-0000-0000-0000-000000000000', storageType='NFS', connectionList= '[StorageServerConnections:{id='fc7c5ba2-421e-4b48-91dd-fd8a453da3d6', connection='mantis-nfs-lif2.lab.eng.tlv2.redhat.com:/nas01/ge_3_unused_nfs_1', iqn='null', vfsType='null', mountOptions='null', nfsVersion=' V3', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]', sendNetworkEventOnFailure='true'}), log id: 9321f52 2018-09-01 21:08:57,779+03 INFO [org.ovirt.engine.core.bll.storage.pool.RemoveStoragePoolCommand] (EE-ManagedThreadFactory-engine-Thread-601) [5eeecb3c-74cf-40e6-aa09-4c7fdc5a3185] Running command: RemoveStoragePoolCommand internal: false. Entities affected : ID: 345e8f1a-45eb-4f30-8d84-eb596c00eedf Type: StoragePoolAction group DELETE_STORAGE_POOL with role type ADMIN Used: ovirt-engine-4.2.6.4-0.1.el7ev.noarch vdsm-4.20.39-1.el7ev.x86_64