Created attachment 908019 [details] vdsm.log from node1 Description of problem: After upgrading to 3.4.2 rc I want to reinstall the nodes. But there are some benchmark VM's that I'm unable to shutdown to put the nodes in maintenance. I got into this situation because of a botched iSCSI failover that lead to a split brain on the storage. After resolving the iSCSI problems the VM's are still up and ovirt never noticed the disruption. Currently I upgraded to 3.4.2 stable but now the nodes are down, but the VM's are still up...... Version-Release number of selected component (if applicable): 3.4.2 Event log: 2014-Jun-05, 12:24 Failed to switch Host node1.test.now to Maintenance mode. 2014-Jun-05, 12:24 Host node1.test.now cannot change into maintenance mode - not all Vms have been migrated successfully. Consider manual intervention: stopping/migrating Vms: Bench1 (User: admin). 2014-Jun-05, 12:05 Shutdown of VM Bench2 failed. 2014-Jun-05, 12:05 Shutdown of VM Bench4 failed. 2014-Jun-05, 12:05 Shutdown of VM Bench3 failed. 2014-Jun-05, 12:05 Shutdown of VM Bench1 failed. 2014-Jun-05, 12:00 VM shutdown initiated by admin on VM Bench2 (Host: node2.test.now). 2014-Jun-05, 12:00 VM shutdown initiated by admin on VM Bench4 (Host: node4.test.now). 2014-Jun-05, 12:00 VM shutdown initiated by admin on VM Bench3 (Host: node3.test.now). 2014-Jun-05, 12:00 VM shutdown initiated by admin on VM Bench1 (Host: node1.test.now). /var/log/messages: Jun 10 02:49:46 node1 vdsm scanDomains WARNING Metadata collection for domain path /rhev/data-center/mnt/xx.xxx.xxx.xxx:_var_lib_exports_iso timedout#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/fileSD.py", line 662, in collectMetaFiles#012 sd.DOMAIN_META_DATA))#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 297, in callCrabRPCFunction#012 *args, **kwargs)#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 184, in callCrabRPCFunction#012 rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 150, in _recvAll#012 raise Timeout()#012Timeout Jun 10 02:50:56 node1 vdsm scanDomains WARNING Metadata collection for domain path /rhev/data-center/mnt/xx.xxx.xxx.xxx:_var_lib_exports_iso timedout#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/fileSD.py", line 662, in collectMetaFiles#012 sd.DOMAIN_META_DATA))#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 297, in callCrabRPCFunction#012 *args, **kwargs)#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 184, in callCrabRPCFunction#012 rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 150, in _recvAll#012 raise Timeout()#012Timeout Jun 10 02:52:06 node1 vdsm scanDomains WARNING Metadata collection for domain path /rhev/data-center/mnt/xx.xxx.xxx.xxx:_var_lib_exports_iso timedout#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/fileSD.py", line 662, in collectMetaFiles#012 sd.DOMAIN_META_DATA))#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 297, in callCrabRPCFunction#012 *args, **kwargs)#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 184, in callCrabRPCFunction#012 rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 150, in _recvAll#012 raise Timeout()#012Timeout Jun 10 02:53:16 node1 vdsm scanDomains WARNING Metadata collection for domain path /rhev/data-center/mnt/xx.xxx.xxx.xxx:_var_lib_exports_iso timedout#012Traceback (most recent call last):#012 File "/usr/share/vdsm/storage/fileSD.py", line 662, in collectMetaFiles#012 sd.DOMAIN_META_DATA))#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 297, in callCrabRPCFunction#012 *args, **kwargs)#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 184, in callCrabRPCFunction#012 rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)#012 File "/usr/share/vdsm/storage/remoteFileHandler.py", line 150, in _recvAll#012 raise Timeout()#012Timeout Jun 10 02:53:32 node1 dhclient[18102]: DHCPREQUEST on bond0 to 10.100.100.106 port 67 (xid=0x2844d170) Jun 10 02:53:32 node1 dhclient[18102]: DHCPACK from 10.100.100.106 (xid=0x2844d170) Jun 10 02:53:34 node1 dhclient[18102]: bound to 10.100.100.53 -- renewal in 269 seconds. /var/log/vdsm/vdsm.log: Thread-15::DEBUG::2014-06-10 02:50:52,084::blockSD::600::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/0751cdb1-0ee3-494c-88e2-c0e7df58aa52/metadata bs=4096 count=1' (cwd None) Thread-15::DEBUG::2014-06-10 02:50:52,094::blockSD::600::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000820898 s, 5.0 MB/s\n'; <rc> = 0 Thread-5167::DEBUG::2014-06-10 02:50:52,191::task::595::TaskManager.Task::(_updateState) Task=`77271c2c-b1c7-4fed-a992-da7cd677c09a`::moving from state init -> state preparing Thread-5167::INFO::2014-06-10 02:50:52,192::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-5167::INFO::2014-06-10 02:50:52,192::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'9923a5a1-61e0-4edb-a04d-22c962190c53': {'code': 358, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '65.2', 'valid': False}, '0751cdb1-0ee3-494c-88e2-c0e7df58aa52': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000820898', 'lastCheck': '0.1', 'valid': True}} Thread-5167::DEBUG::2014-06-10 02:50:52,192::task::1185::TaskManager.Task::(prepare) Task=`77271c2c-b1c7-4fed-a992-da7cd677c09a`::finished: {u'9923a5a1-61e0-4edb-a04d-22c962190c53': {'code': 358, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '65.2', 'valid': False}, '0751cdb1-0ee3-494c-88e2-c0e7df58aa52': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000820898', 'lastCheck': '0.1', 'valid': True}} Thread-5167::DEBUG::2014-06-10 02:50:52,192::task::595::TaskManager.Task::(_updateState) Task=`77271c2c-b1c7-4fed-a992-da7cd677c09a`::moving from state preparing -> state finished Thread-5167::DEBUG::2014-06-10 02:50:52,193::resourceManager::940::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-5167::DEBUG::2014-06-10 02:50:52,193::resourceManager::977::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-5167::DEBUG::2014-06-10 02:50:52,193::task::990::TaskManager.Task::(_decref) Task=`77271c2c-b1c7-4fed-a992-da7cd677c09a`::ref 0 aborting False VM Channels Listener::DEBUG::2014-06-10 02:50:54,321::vmChannels::91::vds::(_handle_timeouts) Timeout on fileno 46. Thread-5167::DEBUG::2014-06-10 02:50:55,381::task::595::TaskManager.Task::(_updateState) Task=`458009da-0516-40d0-a02c-352377f4108f`::moving from state init -> state preparing Thread-5167::INFO::2014-06-10 02:50:55,381::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-5167::INFO::2014-06-10 02:50:55,381::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'9923a5a1-61e0-4edb-a04d-22c962190c53': {'code': 358, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '68.4', 'valid': False}, '0751cdb1-0ee3-494c-88e2-c0e7df58aa52': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000820898', 'lastCheck': '3.3', 'valid': True}} Thread-5167::DEBUG::2014-06-10 02:50:55,381::task::1185::TaskManager.Task::(prepare) Task=`458009da-0516-40d0-a02c-352377f4108f`::finished: {u'9923a5a1-61e0-4edb-a04d-22c962190c53': {'code': 358, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '68.4', 'valid': False}, '0751cdb1-0ee3-494c-88e2-c0e7df58aa52': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000820898', 'lastCheck': '3.3', 'valid': True}} Thread-5167::DEBUG::2014-06-10 02:50:55,382::task::595::TaskManager.Task::(_updateState) Task=`458009da-0516-40d0-a02c-352377f4108f`::moving from state preparing -> state finished Thread-5167::DEBUG::2014-06-10 02:50:55,382::resourceManager::940::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-5167::DEBUG::2014-06-10 02:50:55,382::resourceManager::977::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-5167::DEBUG::2014-06-10 02:50:55,382::task::990::TaskManager.Task::(_decref) Task=`458009da-0516-40d0-a02c-352377f4108f`::ref 0 aborting False Thread-12128::WARNING::2014-06-10 02:50:56,969::fileSD::673::scanDomains::(collectMetaFiles) Metadata collection for domain path /rhev/data-center/mnt/xx.xxx.xxx.xxx:_var_lib_exports_iso timedout Traceback (most recent call last): File "/usr/share/vdsm/storage/fileSD.py", line 662, in collectMetaFiles sd.DOMAIN_META_DATA)) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 297, in callCrabRPCFunction *args, **kwargs) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 184, in callCrabRPCFunction rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 150, in _recvAll raise Timeout() Timeout Thread-40::ERROR::2014-06-10 02:50:56,970::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 9923a5a1-61e0-4edb-a04d-22c962190c53 not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/nfsSD.py", line 132, in findDomain return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomainPath raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'9923a5a1-61e0-4edb-a04d-22c962190c53',) Thread-40::ERROR::2014-06-10 02:50:56,971::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 9923a5a1-61e0-4edb-a04d-22c962190c53 monitoring information Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 215, in _monitorDomain self.domain.selftest() File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce domain = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/nfsSD.py", line 132, in findDomain return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/nfsSD.py", line 122, in findDomainPath raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'9923a5a1-61e0-4edb-a04d-22c962190c53',) Thread-5167::DEBUG::2014-06-10 02:50:58,602::task::595::TaskManager.Task::(_updateState) Task=`6a746109-728c-4029-b4a2-7b47981f3dba`::moving from state init -> state preparing Thread-5167::INFO::2014-06-10 02:50:58,602::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-5167::INFO::2014-06-10 02:50:58,603::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'9923a5a1-61e0-4edb-a04d-22c962190c53': {'code': 358, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '1.6', 'valid': False}, '0751cdb1-0ee3-494c-88e2-c0e7df58aa52': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000820898', 'lastCheck': '6.5', 'valid': True}} Thread-5167::DEBUG::2014-06-10 02:50:58,603::task::1185::TaskManager.Task::(prepare) Task=`6a746109-728c-4029-b4a2-7b47981f3dba`::finished: {u'9923a5a1-61e0-4edb-a04d-22c962190c53': {'code': 358, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '1.6', 'valid': False}, '0751cdb1-0ee3-494c-88e2-c0e7df58aa52': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000820898', 'lastCheck': '6.5', 'valid': True}} Thread-5167::DEBUG::2014-06-10 02:50:58,603::task::595::TaskManager.Task::(_updateState) Task=`6a746109-728c-4029-b4a2-7b47981f3dba`::moving from state preparing -> state finished Thread-5167::DEBUG::2014-06-10 02:50:58,603::resourceManager::940::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-5167::DEBUG::2014-06-10 02:50:58,603::resourceManager::977::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-5167::DEBUG::2014-06-10 02:50:58,604::task::990::TaskManager.Task::(_decref) Task=`6a746109-728c-4029-b4a2-7b47981f3dba`::ref 0 aborting False Thread-5167::DEBUG::2014-06-10 02:51:01,969::task::595::TaskManager.Task::(_updateState) Task=`0f9bf6e2-ea86-4b4a-8fe2-d7634705429c`::moving from state init -> state preparing Thread-5167::INFO::2014-06-10 02:51:01,969::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-5167::INFO::2014-06-10 02:51:01,969::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'9923a5a1-61e0-4edb-a04d-22c962190c53': {'code': 358, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '5.0', 'valid': False}, '0751cdb1-0ee3-494c-88e2-c0e7df58aa52': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000820898', 'lastCheck': '9.8', 'valid': True}} Thread-5167::DEBUG::2014-06-10 02:51:01,970::task::1185::TaskManager.Task::(prepare) Task=`0f9bf6e2-ea86-4b4a-8fe2-d7634705429c`::finished: {u'9923a5a1-61e0-4edb-a04d-22c962190c53': {'code': 358, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '5.0', 'valid': False}, '0751cdb1-0ee3-494c-88e2-c0e7df58aa52': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000820898', 'lastCheck': '9.8', 'valid': True}} Thread-5167::DEBUG::2014-06-10 02:51:01,970::task::595::TaskManager.Task::(_updateState) Task=`0f9bf6e2-ea86-4b4a-8fe2-d7634705429c`::moving from state preparing -> state finished Thread-5167::DEBUG::2014-06-10 02:51:01,970::resourceManager::940::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-5167::DEBUG::2014-06-10 02:51:01,970::resourceManager::977::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-5167::DEBUG::2014-06-10 02:51:01,970::task::990::TaskManager.Task::(_decref) Task=`0f9bf6e2-ea86-4b4a-8fe2-d7634705429c`::ref 0 aborting False /var/log/sanlock.log: 2014-05-19 09:49:00+0000 38 [10499]: sanlock daemon started 2.8 host d185aced-10ef-4d82-9b9f-74ccfa0c4f9c.localhost. 2014-05-19 09:54:54+0000 64 [13495]: sanlock daemon started 2.8 host cf089115-ac2c-4598-b6a2-e7db8c9368c5.node1.n 2014-05-19 09:56:49+0000 178 [14390]: sanlock daemon started 2.8 host 968a9da8-76ef-4440-aad7-d4cd48b9d1f9.node1.n 2014-05-19 10:02:36+0000 525 [14396]: s1 lockspace 0751cdb1-0ee3-494c-88e2-c0e7df58aa52:4:/dev/0751cdb1-0ee3-494c-88e2-c0e7df58aa52/ids:0 2014-05-22 11:53:26+0000 266375 [19131]: 0751cdb1 aio timeout 0 0x7f4fbc0008c0:0x7f4fbc0008d0:0x7f4fceff3000 ioto 10 to_count 1 2014-05-22 11:53:26+0000 266375 [19131]: s1 delta_renew read rv -202 offset 0 /dev/0751cdb1-0ee3-494c-88e2-c0e7df58aa52/ids 2014-05-22 11:53:26+0000 266375 [19131]: s1 renewal error -202 delta_length 10 last_success 266344 2014-05-22 11:53:37+0000 266386 [19131]: 0751cdb1 aio timeout 0 0x7f4fbc000910:0x7f4fbc000920:0x7f4fcedf0000 ioto 10 to_count 2 2014-05-22 11:53:37+0000 266386 [19131]: s1 delta_renew read rv -202 offset 0 /dev/0751cdb1-0ee3-494c-88e2-c0e7df58aa52/ids 2014-05-22 11:53:37+0000 266386 [19131]: s1 renewal error -202 delta_length 11 last_success 266344 2014-05-22 11:53:48+0000 266397 [19131]: 0751cdb1 aio timeout 0 0x7f4fbc000960:0x7f4fbc000970:0x7f4fcecee000 ioto 10 to_count 3 2014-05-22 11:53:48+0000 266397 [19131]: s1 delta_renew read rv -202 offset 0 /dev/0751cdb1-0ee3-494c-88e2-c0e7df58aa52/ids 2014-05-22 11:53:48+0000 266397 [19131]: s1 renewal error -202 delta_length 11 last_success 266344 2014-05-22 11:53:55+0000 266404 [14390]: s1 check_our_lease warning 60 last_success 266344 2014-05-22 11:53:56+0000 266405 [14390]: s1 check_our_lease warning 61 last_success 266344 2014-05-22 11:53:57+0000 266406 [14390]: s1 check_our_lease warning 62 last_success 266344 2014-05-22 11:53:58+0000 266407 [14390]: s1 check_our_lease warning 63 last_success 266344 2014-05-22 11:53:59+0000 266408 [14390]: s1 check_our_lease warning 64 last_success 266344 2014-05-22 11:53:59+0000 266408 [19131]: 0751cdb1 aio timeout 0 0x7f4fbc0009b0:0x7f4fbc0009c0:0x7f4fcebec000 ioto 10 to_count 4 2014-05-22 11:53:59+0000 266408 [19131]: s1 delta_renew read rv -202 offset 0 /dev/0751cdb1-0ee3-494c-88e2-c0e7df58aa52/ids 2014-05-22 11:53:59+0000 266408 [19131]: s1 renewal error -202 delta_length 11 last_success 266344 2014-05-22 11:54:00+0000 266409 [14390]: s1 check_our_lease warning 65 last_success 266344 2014-05-22 11:54:01+0000 266410 [14390]: s1 check_our_lease warning 66 last_success 266344 2014-05-22 11:54:02+0000 266411 [14390]: s1 check_our_lease warning 67 last_success 266344 2014-05-22 11:54:03+0000 266412 [14390]: s1 check_our_lease warning 68 last_success 266344 2014-05-22 11:54:04+0000 266413 [14390]: s1 check_our_lease warning 69 last_success 266344 2014-05-22 11:54:05+0000 266414 [14390]: s1 check_our_lease warning 70 last_success 266344 2014-05-22 11:54:06+0000 266415 [14390]: s1 check_our_lease warning 71 last_success 266344 2014-05-22 11:54:07+0000 266416 [14390]: s1 check_our_lease warning 72 last_success 266344 2014-05-22 11:54:08+0000 266417 [14390]: s1 check_our_lease warning 73 last_success 266344 2014-05-22 11:54:09+0000 266418 [14390]: s1 check_our_lease warning 74 last_success 266344 2014-05-22 11:54:10+0000 266419 [14390]: s1 check_our_lease warning 75 last_success 266344 2014-05-22 11:54:10+0000 266419 [19131]: s1 delta_renew read rv -2 offset 0 /dev/0751cdb1-0ee3-494c-88e2-c0e7df58aa52/ids 2014-05-22 11:54:10+0000 266419 [19131]: s1 renewal error -2 delta_length 10 last_success 266344 2014-05-22 11:54:11+0000 266420 [14390]: s1 check_our_lease warning 76 last_success 266344 2014-05-22 11:54:12+0000 266421 [14390]: s1 check_our_lease warning 77 last_success 266344 2014-05-22 11:54:13+0000 266422 [14390]: s1 check_our_lease warning 78 last_success 266344 2014-05-22 11:54:14+0000 266423 [14390]: s1 check_our_lease warning 79 last_success 266344 2014-05-22 11:54:15+0000 266424 [14390]: s1 check_our_lease failed 80 2014-05-22 11:54:15+0000 266424 [14390]: s1 all pids clear 2014-05-22 11:54:20+0000 266430 [19131]: s1 delta_renew read rv -2 offset 0 /dev/0751cdb1-0ee3-494c-88e2-c0e7df58aa52/ids 2014-05-22 11:54:20+0000 266430 [19131]: s1 renewal error -2 delta_length 10 last_success 266344 2014-05-22 11:54:30+0000 266440 [19131]: 0751cdb1 close_task_aio 0 0x7f4fbc0008c0 busy 2014-05-22 11:54:30+0000 266440 [19131]: 0751cdb1 close_task_aio 1 0x7f4fbc000910 busy 2014-05-22 11:54:30+0000 266440 [19131]: 0751cdb1 close_task_aio 2 0x7f4fbc000960 busy 2014-05-22 11:54:30+0000 266440 [19131]: 0751cdb1 close_task_aio 3 0x7f4fbc0009b0 busy 2014-05-22 11:54:40+0000 266450 [19131]: 0751cdb1 close_task_aio 0 0x7f4fbc0008c0 busy 2014-05-22 11:54:40+0000 266450 [19131]: 0751cdb1 close_task_aio 1 0x7f4fbc000910 busy 2014-05-22 11:54:40+0000 266450 [19131]: 0751cdb1 close_task_aio 2 0x7f4fbc000960 busy 2014-05-22 11:54:40+0000 266450 [19131]: 0751cdb1 close_task_aio 3 0x7f4fbc0009b0 busy 2014-05-22 11:54:50+0000 266460 [19131]: 0751cdb1 close_task_aio 0 0x7f4fbc0008c0 busy 2014-05-22 11:54:50+0000 266460 [19131]: 0751cdb1 close_task_aio 1 0x7f4fbc000910 busy 2014-05-22 11:54:50+0000 266460 [19131]: 0751cdb1 close_task_aio 2 0x7f4fbc000960 busy 2014-05-22 11:54:50+0000 266460 [19131]: 0751cdb1 close_task_aio 3 0x7f4fbc0009b0 busy 2014-05-22 11:55:00+0000 266470 [19131]: 0751cdb1 close_task_aio 0 0x7f4fbc0008c0 busy 2014-05-22 11:55:00+0000 266470 [19131]: 0751cdb1 close_task_aio 1 0x7f4fbc000910 busy 2014-05-22 11:55:00+0000 266470 [19131]: 0751cdb1 close_task_aio 2 0x7f4fbc000960 busy 2014-05-22 11:55:00+0000 266470 [19131]: 0751cdb1 close_task_aio 3 0x7f4fbc0009b0 busy 2014-05-22 11:55:10+0000 266480 [19131]: 0751cdb1 close_task_aio 0 0x7f4fbc0008c0 busy 2014-05-22 11:55:10+0000 266480 [19131]: 0751cdb1 close_task_aio 1 0x7f4fbc000910 busy 2014-05-22 11:55:10+0000 266480 [19131]: 0751cdb1 close_task_aio 2 0x7f4fbc000960 busy 2014-05-22 11:55:10+0000 266480 [19131]: 0751cdb1 close_task_aio 3 0x7f4fbc0009b0 busy 2014-05-22 11:55:16+0000 266485 [19131]: 0751cdb1 aio collect 0x7f4fbc0008c0:0x7f4fbc0008d0:0x7f4fceff3000 result -5:0 close free 2014-05-22 11:55:16+0000 266485 [19131]: 0751cdb1 aio collect 0x7f4fbc000910:0x7f4fbc000920:0x7f4fcedf0000 result -5:0 close free 2014-05-22 11:55:16+0000 266485 [19131]: 0751cdb1 aio collect 0x7f4fbc000960:0x7f4fbc000970:0x7f4fcecee000 result -5:0 close free 2014-05-22 11:55:16+0000 266485 [19131]: 0751cdb1 aio collect 0x7f4fbc0009b0:0x7f4fbc0009c0:0x7f4fcebec000 result -5:0 close free 2014-05-22 12:57:13+0000 270202 [14395]: s2 lockspace 0751cdb1-0ee3-494c-88e2-c0e7df58aa52:4:/dev/0751cdb1-0ee3-494c-88e2-c0e7df58aa52/ids:0 2014-06-10 08:11:20+0000 81 [15201]: sanlock daemon started 2.8 host 195b1c86-5d28-4b6a-a31c-924495d6a908.node1.n 2014-06-10 08:13:49+0000 230 [15207]: s1 lockspace 0751cdb1-0ee3-494c-88e2-c0e7df58aa52:4:/dev/0751cdb1-0ee3-494c-88e2-c0e7df58aa52/ids:0 /var/log/ovirt-engine.log: 2014-06-10 10:49:27,492 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-42) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:49:27,514 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-19) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:50:00,265 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-57) [da9c949] Autorecovering 1 storage domains 2014-06-10 10:50:00,265 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-57) [da9c949] Autorecovering storage domains id: 9923a5a1-61e0-4edb-a04d-22c962190c53 2014-06-10 10:50:00,266 INFO [org.ovirt.engine.core.bll.storage.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-57) [300d83fb] Running command: ConnectDomainToStorageCommand internal: true. Entities affected : ID: 9923a5a1-61e0-4edb-a04d-22c962190c53 Type: Storage 2014-06-10 10:50:00,266 INFO [org.ovirt.engine.core.bll.storage.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-57) [300d83fb] ConnectDomainToStorage. Before Connect all hosts to pool. Time:6/10/14 10:50 AM 2014-06-10 10:50:00,466 INFO [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (org.ovirt.thread.pool-6-thread-47) [358da518] Running command: ConnectStorageToVdsCommand internal: true. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: System 2014-06-10 10:50:00,470 INFO [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (org.ovirt.thread.pool-6-thread-49) [2aa454] Running command: ConnectStorageToVdsCommand internal: true. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: System 2014-06-10 10:50:00,474 INFO [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (org.ovirt.thread.pool-6-thread-38) [6156f20d] Running command: ConnectStorageToVdsCommand internal: true. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: System 2014-06-10 10:50:00,570 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-38) [6156f20d] START, ConnectStorageServerVDSCommand(HostName = node2.test.me, HostId = 28fdcb5d-7acd-410e-8b65-0b4f483cb575, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: f47e9f99-9989-4297-a84f-4f75338eeced, connection: xx.xxx.xxx.xxx:/var/lib/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 5ac9e75d 2014-06-10 10:50:00,573 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-49) [2aa454] START, ConnectStorageServerVDSCommand(HostName = node3.test.me, HostId = 7415506c-cda7-4018-804d-5f6d3beddbfb, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: f47e9f99-9989-4297-a84f-4f75338eeced, connection: xx.xxx.xxx.xxx:/var/lib/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 11873507 2014-06-10 10:50:00,572 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-47) [358da518] START, ConnectStorageServerVDSCommand(HostName = node4.test.me, HostId = bb13752e-85cb-4945-822b-48ab2a7b1329, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: f47e9f99-9989-4297-a84f-4f75338eeced, connection: xx.xxx.xxx.xxx:/var/lib/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 487c35ad 2014-06-10 10:50:04,526 INFO [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (ajp--127.0.0.1-8702-1) [52284e89] Lock Acquired to object EngineLock [exclusiveLocks= key: 9923a5a1-61e0-4edb-a04d-22c962190c53 value: STORAGE , sharedLocks= ] 2014-06-10 10:50:05,865 INFO [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (org.ovirt.thread.pool-6-thread-20) [52284e89] Running command: ActivateStorageDomainCommand internal: false. Entities affected : ID: 9923a5a1-61e0-4edb-a04d-22c962190c53 Type: Storage 2014-06-10 10:50:06,010 INFO [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (org.ovirt.thread.pool-6-thread-20) [52284e89] Lock freed to object EngineLock [exclusiveLocks= key: 9923a5a1-61e0-4edb-a04d-22c962190c53 value: STORAGE , sharedLocks= ] 2014-06-10 10:50:06,010 INFO [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (org.ovirt.thread.pool-6-thread-20) [52284e89] ActivateStorage Domain. Before Connect all hosts to pool. Time:6/10/14 10:50 AM 2014-06-10 10:50:06,062 INFO [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (org.ovirt.thread.pool-6-thread-13) [649a51c3] Running command: ConnectStorageToVdsCommand internal: true. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: System 2014-06-10 10:50:06,066 INFO [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (org.ovirt.thread.pool-6-thread-14) [27b7ad5a] Running command: ConnectStorageToVdsCommand internal: true. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: System 2014-06-10 10:50:06,070 INFO [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (org.ovirt.thread.pool-6-thread-11) [65c2858f] Running command: ConnectStorageToVdsCommand internal: true. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: System 2014-06-10 10:50:06,176 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-13) [649a51c3] START, ConnectStorageServerVDSCommand(HostName = node4.test.me, HostId = bb13752e-85cb-4945-822b-48ab2a7b1329, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: f47e9f99-9989-4297-a84f-4f75338eeced, connection: xx.xxx.xxx.xxx:/var/lib/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 1561dfc 2014-06-10 10:50:06,199 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-14) [27b7ad5a] START, ConnectStorageServerVDSCommand(HostName = node3.test.me, HostId = 7415506c-cda7-4018-804d-5f6d3beddbfb, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: f47e9f99-9989-4297-a84f-4f75338eeced, connection: xx.xxx.xxx.xxx:/var/lib/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: d51a4d5 2014-06-10 10:50:06,206 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-11) [65c2858f] START, ConnectStorageServerVDSCommand(HostName = node2.test.me, HostId = 28fdcb5d-7acd-410e-8b65-0b4f483cb575, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: f47e9f99-9989-4297-a84f-4f75338eeced, connection: xx.xxx.xxx.xxx:/var/lib/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 48278314 2014-06-10 10:51:00,615 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-47) [358da518] FINISH, ConnectStorageServerVDSCommand, return: {f47e9f99-9989-4297-a84f-4f75338eeced=0}, log id: 487c35ad 2014-06-10 10:51:00,620 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-49) [2aa454] FINISH, ConnectStorageServerVDSCommand, return: {f47e9f99-9989-4297-a84f-4f75338eeced=0}, log id: 11873507 2014-06-10 10:51:00,623 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-38) [6156f20d] FINISH, ConnectStorageServerVDSCommand, return: {f47e9f99-9989-4297-a84f-4f75338eeced=0}, log id: 5ac9e75d 2014-06-10 10:51:00,625 INFO [org.ovirt.engine.core.bll.storage.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-57) [300d83fb] ConnectDomainToStorage. After Connect all hosts to pool. Time:6/10/14 10:51 AM 2014-06-10 10:51:03,733 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-45) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:51:06,229 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-13) [649a51c3] FINISH, ConnectStorageServerVDSCommand, return: {f47e9f99-9989-4297-a84f-4f75338eeced=0}, log id: 1561dfc 2014-06-10 10:51:06,271 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-14) [27b7ad5a] FINISH, ConnectStorageServerVDSCommand, return: {f47e9f99-9989-4297-a84f-4f75338eeced=0}, log id: d51a4d5 2014-06-10 10:51:06,273 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-11) [65c2858f] FINISH, ConnectStorageServerVDSCommand, return: {f47e9f99-9989-4297-a84f-4f75338eeced=0}, log id: 48278314 2014-06-10 10:51:06,275 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-20) [52284e89] START, ActivateStorageDomainVDSCommand( storagePoolId = 00000002-0002-0002-0002-000000000390, ignoreFailoverLimit = false, storageDomainId = 9923a5a1-61e0-4edb-a04d-22c962190c53), log id: 23981a1b 2014-06-10 10:51:13,278 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-50) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:51:22,474 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-6) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:51:35,071 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-35) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:52:34,068 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-20) [52284e89] Failed in ActivateStorageDomainVDS method 2014-06-10 10:52:34,069 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [52284e89] IrsBroker::Failed::ActivateStorageDomainVDS due to: IRSErrorException: IRSGenericException: IRSErrorException: Failed to ActivateStorageDomainVDS, error = Storage domain does not exist: ('9923a5a1-61e0-4edb-a04d-22c962190c53',), code = 358 2014-06-10 10:52:34,146 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-20) [52284e89] START, SpmStopVDSCommand(HostName = node2.test.me, HostId = 28fdcb5d-7acd-410e-8b65-0b4f483cb575, storagePoolId = 00000002-0002-0002-0002-000000000390), log id: 2c173efc 2014-06-10 10:52:34,202 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-20) [52284e89] SpmStopVDSCommand::Stopping SPM on vds node2.test.me, pool id 00000002-0002-0002-0002-000000000390 2014-06-10 10:52:38,260 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-20) [52284e89] FINISH, SpmStopVDSCommand, log id: 2c173efc 2014-06-10 10:52:38,261 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [52284e89] Irs placed on server 28fdcb5d-7acd-410e-8b65-0b4f483cb575 failed. Proceed Failover 2014-06-10 10:52:38,327 INFO [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] Running command: SetStoragePoolStatusCommand internal: true. Entities affected : ID: 00000002-0002-0002-0002-000000000390 Type: StoragePool 2014-06-10 10:52:38,329 INFO [org.ovirt.engine.core.vdsbroker.storage.StoragePoolDomainHelper] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] Storage Pool 00000002-0002-0002-0002-000000000390 - Updating Storage Domain 0751cdb1-0ee3-494c-88e2-c0e7df58aa52 status from Active to Unknown, reason : null 2014-06-10 10:52:38,402 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] Correlation ID: 6808e2ec, Call Stack: null, Custom Event ID: -1, Message: Data Center is being initialized, please wait for initialization to complete. 2014-06-10 10:52:38,409 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] hostFromVds::selectedVds - node3.test.me, spmStatus Free, storage pool Default 2014-06-10 10:52:38,412 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] SPM Init: could not find reported vds or not up - pool:Default vds_spm_id: 3 2014-06-10 10:52:38,452 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] SPM selection - vds seems as spm node2.test.me 2014-06-10 10:52:38,482 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] START, SpmStopVDSCommand(HostName = node2.test.me, HostId = 28fdcb5d-7acd-410e-8b65-0b4f483cb575, storagePoolId = 00000002-0002-0002-0002-000000000390), log id: 4c424d7c 2014-06-10 10:52:38,491 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] Command org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand return value TaskStatusListReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=654, mMessage=Not SPM]] 2014-06-10 10:52:38,509 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] HostName = node2.test.me 2014-06-10 10:52:38,509 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] Command HSMGetAllTasksStatusesVDSCommand(HostName = node2.test.me, HostId = 28fdcb5d-7acd-410e-8b65-0b4f483cb575) execution failed. Exception: IRSNonOperationalException: IRSGenericException: IRSErrorException: IRSNonOperationalException: Not SPM 2014-06-10 10:52:38,510 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] FINISH, SpmStopVDSCommand, log id: 4c424d7c 2014-06-10 10:52:38,510 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] spm stop succeeded, continuing with spm selection 2014-06-10 10:52:38,524 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] starting spm on vds node3.test.me, storage pool Default, prevId 3, LVER 2 2014-06-10 10:52:38,542 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] START, SpmStartVDSCommand(HostName = node3.test.me, HostId = 7415506c-cda7-4018-804d-5f6d3beddbfb, storagePoolId = 00000002-0002-0002-0002-000000000390, prevId=3, prevLVER=2, storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=true), log id: 3056d822 2014-06-10 10:52:38,555 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] spmStart polling started: taskId = 34c9a402-a886-41e3-bf64-52f554669aba 2014-06-10 10:52:39,301 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-22) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:52:39,324 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-42) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:52:52,175 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-10) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:53:41,630 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-13) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:54:00,852 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-36) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:54:06,936 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-43) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:54:09,979 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-22) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:54:43,966 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-6) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:54:53,204 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-43) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:54:56,248 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-22) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:55:26,994 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-47) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:55:30,037 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-38) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:55:30,058 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-10) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:56:50,452 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] spmStart polling ended: taskId = 34c9a402-a886-41e3-bf64-52f554669aba task status = finished 2014-06-10 10:56:50,498 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] spmStart polling ended, spm status: SPM 2014-06-10 10:56:50,517 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] START, HSMClearTaskVDSCommand(HostName = node3.test.me, HostId = 7415506c-cda7-4018-804d-5f6d3beddbfb, taskId=34c9a402-a886-41e3-bf64-52f554669aba), log id: 2e6d491a 2014-06-10 10:56:50,573 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] FINISH, HSMClearTaskVDSCommand, log id: 2e6d491a 2014-06-10 10:56:50,574 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] FINISH, SpmStartVDSCommand, return: org.ovirt.engine.core.common.businessentities.SpmStatusResult@6e67a4c7, log id: 3056d822 2014-06-10 10:56:50,597 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] Initialize Irs proxy from vds: 10.100.100.48 2014-06-10 10:56:50,622 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Storage Pool Manager runs on Host node3.test.me (Address: 10.100.100.48). 2014-06-10 10:56:50,644 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetAllTasksInfoVDSCommand] (org.ovirt.thread.pool-6-thread-31) [6808e2ec] START, SPMGetAllTasksInfoVDSCommand( storagePoolId = 00000002-0002-0002-0002-000000000390, ignoreFailoverLimit = false), log id: 7eba1136 2014-06-10 10:57:48,685 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-40) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:58:28,613 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-27) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:58:34,731 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-39) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 10:58:49,587 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] Failed in ActivateStorageDomainVDS method 2014-06-10 10:58:49,588 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] IrsBroker::Failed::ActivateStorageDomainVDS due to: IRSErrorException: IRSGenericException: IRSErrorException: Failed to ActivateStorageDomainVDS, error = Storage domain does not exist: ('9923a5a1-61e0-4edb-a04d-22c962190c53',), code = 358 2014-06-10 10:58:49,666 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] START, SpmStopVDSCommand(HostName = node3.test.me, HostId = 7415506c-cda7-4018-804d-5f6d3beddbfb, storagePoolId = 00000002-0002-0002-0002-000000000390), log id: 3a60caf5 2014-06-10 10:58:49,711 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] SpmStopVDSCommand::Stopping SPM on vds node3.test.me, pool id 00000002-0002-0002-0002-000000000390 2014-06-10 10:58:52,611 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] FINISH, SpmStopVDSCommand, log id: 3a60caf5 2014-06-10 10:58:52,611 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [6808e2ec] Irs placed on server 7415506c-cda7-4018-804d-5f6d3beddbfb failed. Proceed Failover 2014-06-10 10:58:52,668 INFO [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] Running command: SetStoragePoolStatusCommand internal: true. Entities affected : ID: 00000002-0002-0002-0002-000000000390 Type: StoragePool 2014-06-10 10:58:52,991 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] Correlation ID: 69ff8cdd, Call Stack: null, Custom Event ID: -1, Message: Data Center is being initialized, please wait for initialization to complete. 2014-06-10 10:58:52,997 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] hostFromVds::selectedVds - node4.test.me, spmStatus Free, storage pool Default 2014-06-10 10:58:53,000 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] SPM Init: could not find reported vds or not up - pool:Default vds_spm_id: 1 2014-06-10 10:58:53,017 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] SPM selection - vds seems as spm node3.test.me 2014-06-10 10:58:53,035 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] START, SpmStopVDSCommand(HostName = node3.test.me, HostId = 7415506c-cda7-4018-804d-5f6d3beddbfb, storagePoolId = 00000002-0002-0002-0002-000000000390), log id: 26ccb709 2014-06-10 10:58:53,043 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] Command org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand return value TaskStatusListReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=654, mMessage=Not SPM]] 2014-06-10 10:58:53,060 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] HostName = node3.test.me 2014-06-10 10:58:53,061 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] Command HSMGetAllTasksStatusesVDSCommand(HostName = node3.test.me, HostId = 7415506c-cda7-4018-804d-5f6d3beddbfb) execution failed. Exception: IRSNonOperationalException: IRSGenericException: IRSErrorException: IRSNonOperationalException: Not SPM 2014-06-10 10:58:53,061 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] FINISH, SpmStopVDSCommand, log id: 26ccb709 2014-06-10 10:58:53,062 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] spm stop succeeded, continuing with spm selection 2014-06-10 10:58:53,091 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] starting spm on vds node4.test.me, storage pool Default, prevId 1, LVER 3 2014-06-10 10:58:53,108 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] START, SpmStartVDSCommand(HostName = node4.test.me, HostId = bb13752e-85cb-4945-822b-48ab2a7b1329, storagePoolId = 00000002-0002-0002-0002-000000000390, prevId=1, prevLVER=3, storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=true), log id: cf0e42f 2014-06-10 10:58:53,117 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] spmStart polling started: taskId = 95b6b948-b7b8-46cb-8cc1-cf9bde431874 2014-06-10 11:01:46,832 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-8) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:01:55,979 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-25) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:02:51,420 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-42) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:02:54,477 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-23) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:03:06,978 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-15) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:03:10,033 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-41) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:03:50,450 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] spmStart polling ended: taskId = 95b6b948-b7b8-46cb-8cc1-cf9bde431874 task status = finished 2014-06-10 11:03:50,511 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] spmStart polling ended, spm status: SPM 2014-06-10 11:03:50,528 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] START, HSMClearTaskVDSCommand(HostName = node4.test.me, HostId = bb13752e-85cb-4945-822b-48ab2a7b1329, taskId=95b6b948-b7b8-46cb-8cc1-cf9bde431874), log id: 2d9cb3dc 2014-06-10 11:03:50,588 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] FINISH, HSMClearTaskVDSCommand, log id: 2d9cb3dc 2014-06-10 11:03:50,588 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] FINISH, SpmStartVDSCommand, return: org.ovirt.engine.core.common.businessentities.SpmStatusResult@6b9b5783, log id: cf0e42f 2014-06-10 11:03:50,890 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] Initialize Irs proxy from vds: 10.100.100.59 2014-06-10 11:03:50,935 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Storage Pool Manager runs on Host node4.test.me (Address: 10.100.100.59). 2014-06-10 11:03:50,980 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetAllTasksInfoVDSCommand] (org.ovirt.thread.pool-6-thread-11) [69ff8cdd] START, SPMGetAllTasksInfoVDSCommand( storagePoolId = 00000002-0002-0002-0002-000000000390, ignoreFailoverLimit = false), log id: 6ff3e3ea 2014-06-10 11:04:29,664 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-23) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:04:32,809 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-17) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:04:54,324 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-50) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:04:57,368 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-37) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:05:00,433 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-19) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:05:49,611 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] Failed in ActivateStorageDomainVDS method 2014-06-10 11:05:49,612 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] IrsBroker::Failed::ActivateStorageDomainVDS due to: IRSErrorException: IRSGenericException: IRSErrorException: Failed to ActivateStorageDomainVDS, error = Storage domain does not exist: ('9923a5a1-61e0-4edb-a04d-22c962190c53',), code = 358 2014-06-10 11:05:49,612 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetAllTasksInfoVDSCommand] (org.ovirt.thread.pool-6-thread-11) [69ff8cdd] -- executeIrsBrokerCommand: Attempting on storage pool 00000002-0002-0002-0002-000000000390 2014-06-10 11:05:49,616 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ActivateStorageDomainVDSCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] FINISH, ActivateStorageDomainVDSCommand, log id: 23981a1b 2014-06-10 11:05:49,617 ERROR [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] Command org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to ActivateStorageDomainVDS, error = Storage domain does not exist: ('9923a5a1-61e0-4edb-a04d-22c962190c53',), code = 358 (Failed with error StorageDomainDoesNotExist and code 358) 2014-06-10 11:05:49,659 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksInfoVDSCommand] (org.ovirt.thread.pool-6-thread-11) [69ff8cdd] START, HSMGetAllTasksInfoVDSCommand(HostName = node4.test.me, HostId = bb13752e-85cb-4945-822b-48ab2a7b1329), log id: 6229449d 2014-06-10 11:05:49,665 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksInfoVDSCommand] (org.ovirt.thread.pool-6-thread-11) [69ff8cdd] FINISH, HSMGetAllTasksInfoVDSCommand, return: [], log id: 6229449d 2014-06-10 11:05:49,666 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetAllTasksInfoVDSCommand] (org.ovirt.thread.pool-6-thread-11) [69ff8cdd] FINISH, SPMGetAllTasksInfoVDSCommand, return: [], log id: 6ff3e3ea 2014-06-10 11:05:49,666 INFO [org.ovirt.engine.core.bll.AsyncTaskManager] (org.ovirt.thread.pool-6-thread-11) [69ff8cdd] Discovered no tasks on Storage Pool Default 2014-06-10 11:05:49,667 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetAllTasksInfoVDSCommand] (org.ovirt.thread.pool-6-thread-31) [6808e2ec] -- executeIrsBrokerCommand: Attempting on storage pool 00000002-0002-0002-0002-000000000390 2014-06-10 11:05:49,678 INFO [org.ovirt.engine.core.bll.storage.ActivateStorageDomainCommand] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] Command [id=8d0ac1fe-ed12-48ac-b8a1-13b8a5f17377]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: EntityStatusSnapshot [id=storagePoolId = 00000002-0002-0002-0002-000000000390, storageId = 9923a5a1-61e0-4edb-a04d-22c962190c53, status=InActive]. 2014-06-10 11:05:49,692 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksInfoVDSCommand] (org.ovirt.thread.pool-6-thread-31) [6808e2ec] START, HSMGetAllTasksInfoVDSCommand(HostName = node4.test.me, HostId = bb13752e-85cb-4945-822b-48ab2a7b1329), log id: 58f9a1ab 2014-06-10 11:05:49,733 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-20) [69ff8cdd] Correlation ID: 52284e89, Job ID: f4061f81-ea0d-479a-bd89-b67acae8ec1a, Call Stack: null, Custom Event ID: -1, Message: Failed to activate Storage Domain ISO (Data Center Default) by admin 2014-06-10 11:05:49,738 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksInfoVDSCommand] (org.ovirt.thread.pool-6-thread-31) [6808e2ec] FINISH, HSMGetAllTasksInfoVDSCommand, return: [], log id: 58f9a1ab 2014-06-10 11:05:49,744 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SPMGetAllTasksInfoVDSCommand] (org.ovirt.thread.pool-6-thread-31) [6808e2ec] FINISH, SPMGetAllTasksInfoVDSCommand, return: [], log id: 7eba1136 2014-06-10 11:05:49,745 INFO [org.ovirt.engine.core.bll.AsyncTaskManager] (org.ovirt.thread.pool-6-thread-31) [6808e2ec] Discovered no tasks on Storage Pool Default 2014-06-10 11:07:41,006 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-4) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:07:44,068 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-20) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:07:47,128 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-44) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:07:50,190 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-37) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:07:50,221 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-42) Executing a command: java.util.concurrent.FutureTask , but note that there are 2 tasks in the queue. 2014-06-10 11:07:50,222 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-19) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:09:38,306 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-41) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:09:38,328 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-20) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:09:50,816 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-17) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:09:53,859 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-47) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:09:53,881 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-16) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:10:00,118 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-31) [17ab55ad] Autorecovering 1 storage domains 2014-06-10 11:10:00,119 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (DefaultQuartzScheduler_Worker-31) [17ab55ad] Autorecovering storage domains id: 9923a5a1-61e0-4edb-a04d-22c962190c53 2014-06-10 11:10:00,119 INFO [org.ovirt.engine.core.bll.storage.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-31) [336d7eb4] Running command: ConnectDomainToStorageCommand internal: true. Entities affected : ID: 9923a5a1-61e0-4edb-a04d-22c962190c53 Type: Storage 2014-06-10 11:10:00,120 INFO [org.ovirt.engine.core.bll.storage.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-31) [336d7eb4] ConnectDomainToStorage. Before Connect all hosts to pool. Time:6/10/14 11:10 AM 2014-06-10 11:10:00,196 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-36) Executing a command: java.util.concurrent.FutureTask , but note that there are 2 tasks in the queue. 2014-06-10 11:10:00,196 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-32) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:10:00,198 INFO [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (org.ovirt.thread.pool-6-thread-15) [5f9a3237] Running command: ConnectStorageToVdsCommand internal: true. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: System 2014-06-10 11:10:00,199 INFO [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (org.ovirt.thread.pool-6-thread-32) [31a8ddb2] Running command: ConnectStorageToVdsCommand internal: true. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: System 2014-06-10 11:10:00,200 INFO [org.ovirt.engine.core.bll.storage.ConnectStorageToVdsCommand] (org.ovirt.thread.pool-6-thread-36) [3fc1d201] Running command: ConnectStorageToVdsCommand internal: true. Entities affected : ID: aaa00000-0000-0000-0000-123456789aaa Type: System 2014-06-10 11:10:00,321 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-15) [5f9a3237] START, ConnectStorageServerVDSCommand(HostName = node2.test.me, HostId = 28fdcb5d-7acd-410e-8b65-0b4f483cb575, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: f47e9f99-9989-4297-a84f-4f75338eeced, connection: xx.xxx.xxx.xxx:/var/lib/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 7d344fac 2014-06-10 11:10:00,330 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-32) [31a8ddb2] START, ConnectStorageServerVDSCommand(HostName = node3.test.me, HostId = 7415506c-cda7-4018-804d-5f6d3beddbfb, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: f47e9f99-9989-4297-a84f-4f75338eeced, connection: xx.xxx.xxx.xxx:/var/lib/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 7c77503a 2014-06-10 11:10:00,334 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-36) [3fc1d201] START, ConnectStorageServerVDSCommand(HostName = node4.test.me, HostId = bb13752e-85cb-4945-822b-48ab2a7b1329, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = NFS, connectionList = [{ id: f47e9f99-9989-4297-a84f-4f75338eeced, connection: xx.xxx.xxx.xxx:/var/lib/exports/iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 449484a0 2014-06-10 11:10:21,875 WARN [org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil] (org.ovirt.thread.pool-6-thread-48) Executing a command: java.util.concurrent.FutureTask , but note that there are 1 tasks in the queue. 2014-06-10 11:11:00,372 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-32) [31a8ddb2] FINISH, ConnectStorageServerVDSCommand, return: {f47e9f99-9989-4297-a84f-4f75338eeced=0}, log id: 7c77503a 2014-06-10 11:11:00,382 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-15) [5f9a3237] FINISH, ConnectStorageServerVDSCommand, return: {f47e9f99-9989-4297-a84f-4f75338eeced=0}, log id: 7d344fac 2014-06-10 11:11:00,409 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (org.ovirt.thread.pool-6-thread-36) [3fc1d201] FINISH, ConnectStorageServerVDSCommand, return: {f47e9f99-9989-4297-a84f-4f75338eeced=0}, log id: 449484a0 2014-06-10 11:11:00,410 INFO [org.ovirt.engine.core.bll.storage.ConnectDomainToStorageCommand] (DefaultQuartzScheduler_Worker-31) [336d7eb4] ConnectDomainToStorage. After Connect all hosts to pool. Time:6/10/14 11:11 AM
Created attachment 908024 [details] nodes down
Created attachment 908025 [details] vm's still up
rebooting the nodes restore the VM's to working order so I can put the nodes in maintenance again.
hm, I'm not sure I understand what was the sequence of events and what you did. I don't see any errors about the VM shutdown…are you sure the problem is/was not in the guest OS (if it fails to shut down we revert back to up after 5 mins saying shutdown failed)
Sequence of events: - Running 4 nodes with HA iSCSI cluster datacenter. - Installed a benchmark VM on each node - Started running Phoronix benchmark on a VM on each node. - Killed the primary iSCSI server by killing the power. - iSCSI failover goes wrong as I misconfigured something in the cluster - The nodes report they are still up and the VM's are all up - Disk errors on the benchmark VM occur - Everything in ovirt is still green and UP - I startup the iSCSI target that was shutdown, fix a drbd split brain situation and restart the target - Everything still up in ovirt - I want to shutdown the benchmark VM but it doesn't shutdown - I want to put the node in maintenance mode but it won't because there are still VM's running - The VM's have been up now for 2 weeks without responding - I upgrade ovirt-engine to 3.4.2 and the nodes are down - The VM's still report as UP and won't shutdown - I reboot a node and the VM goes UP again, so I'm able to shut it down finally So I had zombie VM's that were running without iSCSI storage and later on nodes that were unavailable. I still have two nodes unavailable with running VM's on it according ovirt (haven't restarted these yet) Why do the VM's stay UP and give me a nice green everything is ok status?
i'm no storage expert so I can't really comment on that too much, but I'm interested about the reason your VM refuses to die… s what's going on inside the guest after you issue shutdown? after poweroff? are you running ovirt-guest-agent? Was there even the "Powering Down" intermediate state for 5 mins after you did shutdown? Or it just stayed "Up" all the time?
I cannot access the guest as the host is unavailable. Not running ovirt guest agent. As I said, the storage went offline so it's understandable things went wrong,but in my opinion that should mean the VM's should not tell me they are still up. But I have to reinstall the dev setup to start testing 3.5 so I cannot spend more time on it.
Things are fully broken wich is logical as I have messed up my iSCSI big time now ;-) But the ghost VM's remain. After reinstalling node1 and node2, I now get some more errors. So I'm posting them for reference: 2014-06-16 12:19:02,737 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-81) [655cf4a6] Command SpmStatusVDSCommand(HostName = node2.test.now, HostId = 2e3ccbdf-650d-4bd4-97d5-713878b397db, storagePoolId = 00000002-0002-0002-0002-000000000390) execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',) 2014-06-16 12:19:02,738 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-81) [655cf4a6] hostFromVds::selectedVds - vh02en01.netbulae.mgmt, spmStatus returned null! 2014-06-16 12:19:02,738 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-81) [655cf4a6] IrsBroker::Failed::GetStoragePoolInfoVDS 2014-06-16 12:19:02,739 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-81) [655cf4a6] Exception: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',) Which is referenced here: https://bugzilla.redhat.com/show_bug.cgi?id=974991 And some info from node2 multipath -ll 3600304800613dd001b074aa7122a934e dm-0 SMC,SMC2108 size=232G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 0:2:0:0 sda 8:0 active ready running 36001405d4a402200a5846a9982d5b908 dm-10 LIO-ORG,IBLOCK size=3.6T features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=enabled `- 5:0:0:0 sdb 8:16 failed faulty running vdsm.log: Thread-13::DEBUG::2014-06-16 10:37:21,438::task::990::TaskManager.Task::(_decref) Task=`bdd8c0e5-355d-46a3-a45e-b2dd7668ec39`::ref 0 aborting False Thread-15::ERROR::2014-06-16 10:37:28,525::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 9923a5a1-61e0-4edb-a04d-22c962190c53 Thread-15::ERROR::2014-06-16 10:37:28,525::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain 9923a5a1-61e0-4edb-a04d-22c962190c53 Thread-15::DEBUG::2014-06-16 10:37:28,526::lvm::373::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-15::DEBUG::2014-06-16 10:37:28,526::lvm::296::Storage.Misc.excCmd::(cmd) u'/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/36001405d4a402200a5846a9982d5b908|/dev/mapper/3600304800613dd001b074aa7122a934e|\', \'r|.*|\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 9923a5a1-61e0-4edb-a04d-22c962190c53' (cwd None) Thread-15::DEBUG::2014-06-16 10:37:28,824::lvm::296::Storage.Misc.excCmd::(cmd) FAILED: <err> = ' /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 3942189760512: Input/output error\n /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 3942189858816: Input/output error\n /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/36001405d4a402200a5846a9982d5b908 was disabled\n Volume group "9923a5a1-61e0-4edb-a04d-22c962190c53" not found\n Skipping volume group 9923a5a1-61e0-4edb-a04d-22c962190c53\n'; <rc> = 5 Thread-15::WARNING::2014-06-16 10:37:28,826::lvm::378::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 3942189760512: Input/output error', ' /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 3942189858816: Input/output error', ' /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 0: Input/output error', ' WARNING: Error counts reached a limit of 3. Device /dev/mapper/36001405d4a402200a5846a9982d5b908 was disabled', ' Volume group "9923a5a1-61e0-4edb-a04d-22c962190c53" not found', ' Skipping volume group 9923a5a1-61e0-4edb-a04d-22c962190c53'] Thread-15::DEBUG::2014-06-16 10:37:28,827::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-15::ERROR::2014-06-16 10:37:28,828::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 9923a5a1-61e0-4edb-a04d-22c962190c53 not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'9923a5a1-61e0-4edb-a04d-22c962190c53',) Thread-15::ERROR::2014-06-16 10:37:28,828::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 9923a5a1-61e0-4edb-a04d-22c962190c53 monitoring information Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 204, in _monitorDomain self.domain = sdCache.produce(self.sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 98, in produce domain.getRealDomain() File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce domain = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 171, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'9923a5a1-61e0-4edb-a04d-22c962190c53',) Thread-16::DEBUG::2014-06-16 10:37:29,293::lvm::373::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-16::DEBUG::2014-06-16 10:37:29,294::lvm::296::Storage.Misc.excCmd::(cmd) u'/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/36001405d4a402200a5846a9982d5b908|/dev/mapper/3600304800613dd001b074aa7122a934e|\', \'r|.*|\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 0751cdb1-0ee3-494c-88e2-c0e7df58aa52' (cwd None) Thread-16::DEBUG::2014-06-16 10:37:29,549::lvm::296::Storage.Misc.excCmd::(cmd) FAILED: <err> = ' /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 3942189760512: Input/output error\n /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 3942189858816: Input/output error\n /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/36001405d4a402200a5846a9982d5b908 was disabled\n Volume group "0751cdb1-0ee3-494c-88e2-c0e7df58aa52" not found\n Skipping volume group 0751cdb1-0ee3-494c-88e2-c0e7df58aa52\n'; <rc> = 5 Thread-16::WARNING::2014-06-16 10:37:29,551::lvm::378::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 3942189760512: Input/output error', ' /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 3942189858816: Input/output error', ' /dev/mapper/36001405d4a402200a5846a9982d5b908: read failed after 0 of 4096 at 0: Input/output error', ' WARNING: Error counts reached a limit of 3. Device /dev/mapper/36001405d4a402200a5846a9982d5b908 was disabled', ' Volume group "0751cdb1-0ee3-494c-88e2-c0e7df58aa52" not found', ' Skipping volume group 0751cdb1-0ee3-494c-88e2-c0e7df58aa52'] Thread-16::DEBUG::2014-06-16 10:37:29,551::lvm::415::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-16::ERROR::2014-06-16 10:37:29,551::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 0751cdb1-0ee3-494c-88e2-c0e7df58aa52 not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/blockSD.py", line 1365, in findDomain return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/blockSD.py", line 1334, in findDomainPath raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'0751cdb1-0ee3-494c-88e2-c0e7df58aa52',) Thread-16::ERROR::2014-06-16 10:37:29,551::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 0751cdb1-0ee3-494c-88e2-c0e7df58aa52 monitoring information Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 215, in _monitorDomain self.domain.selftest() File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce domain = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/blockSD.py", line 1365, in findDomain return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/blockSD.py", line 1334, in findDomainPath raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'0751cdb1-0ee3-494c-88e2-c0e7df58aa52',)
"ghost vm remains"…so you mean another occurence of the same thing or you're talking still about the same VM. I'm afraid we won't be able to do much without a clear reproducer. The storage errors are indeed understandable, the question is about the impact. It shouldn't lead to a state when you can't shut down a VM unless there is a real storage access error and the guest is paused
I'm still talking about the same thing. The VM's never got paused. To be clear, I know I have a storage problem as I caused it myself. The problem I have is that ovirt told me the nodes were still up (green) and the VM's were still up (also green). Even the storage domain was still up until I manually started rebooting some of the nodes. After restarting ovirt engine all the nodes were down but the VM's were still up according to ovirt engine. Only after reinstalling two of the nodes some more errors show up about "Reconstructing master domain on Data Center Default". As an admin I'd like things to go down and ovirt tell me there are big problems when such a critical storage error occurs and not pretend everything is fine.
Allon, it seems the issue is more about the storage domain status not reflected correctly. Can you please take a look?
Liron, can you take a look please?
Hi, Can you please provide the full engine/vdsm logs please? thanks.
These logs are long gone, I reinstalled multiple times already and got rid of the iSCSI storage
Created attachment 934079 [details] vdsm.log.1 to vdsm.log.9 I found the vdsm logs, as I had copied them locally. The engine log is missing though
Created attachment 934081 [details] vdsm.log.10 to vdsm.log.17
Re-targeting to 3.5.3 since this bug has not been marked as blocker for 3.5.2 and we have already released 3.5.2 Release Candidate.
Hi Netbulae, when all the hosts lose access to a storage domain, the storage domain status should change to "Inactive". If only some of the hosts can't access the domain while others can - the hosts that can't access the domain should move to status "Not Operational" while the domain should remain Active. The change of the statuses is done after 5 minutes (by defaut) when domain is monitored as problematic. In general, it's hard to diagnose that issue without the engine log as the engine decides on the action based of the report of all the hosts - so i can't review the engine decisions here - but i'll take a look. In https://bugzilla.redhat.com/show_bug.cgi?id=1108576#c15 you attached number of vdsm logs, of which host of your 4 hosts are those logs? also, what time is relevant for https://bugzilla.redhat.com/show_bug.cgi?id=1108576#c10 ?
Hi Liron, It's been a while and we redeploy our test environment every couple weeks. I saved the logs for debug purposes but I forgot to save the engine logs :-( There is about a month between the faillure and reinstall because I left it running degraded because I had been occupied. After a whole month the status was still green. The logs are from host1 and I would attach the node logs but they are too large and all the hostnames,ips and mac addresses have to be anonymized. Also I've repurposed the iSCSI of test in the last couple months, so I'm not able to reproduce at the moment.
(In reply to Netbulae from comment #19) > Also I've repurposed the iSCSI of test in the last couple months, so I'm not > able to reproduce at the moment. I'm closing this BZ, as we don't have any way to proceed here. If this recurs, please reopen the BZ.