Created attachment 783318 [details] engine, new spm, former spm logs Description of problem: There are errors due to monitoring tasks on former SPM which is already disconnected from the storage pool. These errors make needed amount of time for bringing DC up bigger. Backend - times where former spm was stopped and when new is selected 2013-08-06 13:54:05,045 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (ajp-/127.0.0.1:8702-6) START, SpmStopVDSCommand(HostName = 10.34.63.135, HostId = 10ab4708-16f7-4ff7-bae6-b4f8d48cf3f8, storagePoolId = afa055e0-94c4-477d-b793-2d0927f13341), log id: 2ed6510a 2013-08-06 13:54:10,830 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-71) hostFromVds::selectedVds - 10.34.63.210, spmStatus Free, storage pool datacenter_storage_spm_priority_sanity Former SPM: Thread-1158::INFO::2013-08-06 13:54:05,067::logUtils::44::dispatcher::(wrapper) Run and protect: spmStop(spUUID='afa055e0-94c4-477d-b793-2d0927f13341', options=None) Thread-1158::DEBUG::2013-08-06 13:54:05,068::resourceManager::197::ResourceManager.Request::(__init__) ResName=`Storage.afa055e0-94c4-477d-b793-2d0927f13341`ReqID=`029d4cfa-404f-43cc-9917-c55e7c18420e`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '594' at 'spmStop' Thread-1158::INFO::2013-08-06 13:54:05,143::logUtils::47::dispatcher::(wrapper) Run and protect: spmStop, Return response: None Thread-1161::INFO::2013-08-06 13:54:06,711::logUtils::44::dispatcher::(wrapper) Run and protect: disconnectStoragePool(spUUID='afa055e0-94c4-477d-b793-2d0927f13341', hostID=3, scsiKey='afa055e0-94c4-477d-b793-2d0927f13341', remove=False, options=None) Thread-1161::INFO::2013-08-06 13:54:08,717::logUtils::47::dispatcher::(wrapper) Run and protect: disconnectStoragePool, Return response: True New SPM: Thread-1487::INFO::2013-08-06 13:57:32,690::logUtils::44::dispatcher::(wrapper) Run and protect: spmStart(spUUID='afa055e0-94c4-477d-b793-2d0927f13341', prevID=3, prevLVER='13', recoveryMode=None, scsiFencing='false', maxHostID=250, domVersion='3', options=None) From the logs above is obvious that new spm was started after more than 3 minutes. More info can be found in attached logs. Version-Release number of selected component (if applicable): rhevm 3.3.0-0.13.master.el6ev vdsm-4.12.0-rc3.13.git06ed3cc.el6ev How reproducible: Always Steps to Reproduce: 1. Have at least two hosts in data-center and at least one storage 2. Move SPM host to maintenance 3. Actual results: It takes a long time till dc is up Expected results: new start of SPM is called sooner than 3 minutes after selection Additional info: Logs attached I'm not sure if this is regression
*** This bug has been marked as a duplicate of bug 986961 ***