Bug 993902 - A lot of time is needed after SPM is selected to bring DC up
A lot of time is needed after SPM is selected to bring DC up
Status: CLOSED DUPLICATE of bug 986961
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.3.0
x86_64 Linux
unspecified Severity medium
: ---
: 3.3.0
Assigned To: Nobody's working on this, feel free to take it
storage
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-06 09:01 EDT by Jakub Libosvar
Modified: 2016-02-10 15:35 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-06 09:58:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine, new spm, former spm logs (166.96 KB, application/gzip)
2013-08-06 09:01 EDT, Jakub Libosvar
no flags Details

  None (edit)
Description Jakub Libosvar 2013-08-06 09:01:52 EDT
Created attachment 783318 [details]
engine, new spm, former spm logs

Description of problem:
There are errors due to monitoring tasks on former SPM which is already disconnected from the storage pool. These errors make needed amount of time for bringing DC up bigger.

Backend - times where former spm was stopped and when new is selected
2013-08-06 13:54:05,045 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (ajp-/127.0.0.1:8702-6) START, SpmStopVDSCommand(HostName = 10.34.63.135, HostId = 10ab4708-16f7-4ff7-bae6-b4f8d48cf3f8, storagePoolId = afa055e0-94c4-477d-b793-2d0927f13341), log id: 2ed6510a
2013-08-06 13:54:10,830 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-71) hostFromVds::selectedVds - 10.34.63.210, spmStatus Free, storage pool datacenter_storage_spm_priority_sanity

Former SPM:
Thread-1158::INFO::2013-08-06 13:54:05,067::logUtils::44::dispatcher::(wrapper) Run and protect: spmStop(spUUID='afa055e0-94c4-477d-b793-2d0927f13341', options=None)
Thread-1158::DEBUG::2013-08-06 13:54:05,068::resourceManager::197::ResourceManager.Request::(__init__) ResName=`Storage.afa055e0-94c4-477d-b793-2d0927f13341`ReqID=`029d4cfa-404f-43cc-9917-c55e7c18420e`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '594' at 'spmStop'
Thread-1158::INFO::2013-08-06 13:54:05,143::logUtils::47::dispatcher::(wrapper) Run and protect: spmStop, Return response: None
Thread-1161::INFO::2013-08-06 13:54:06,711::logUtils::44::dispatcher::(wrapper) Run and protect: disconnectStoragePool(spUUID='afa055e0-94c4-477d-b793-2d0927f13341', hostID=3, scsiKey='afa055e0-94c4-477d-b793-2d0927f13341', remove=False, options=None)
Thread-1161::INFO::2013-08-06 13:54:08,717::logUtils::47::dispatcher::(wrapper) Run and protect: disconnectStoragePool, Return response: True


New SPM:
Thread-1487::INFO::2013-08-06 13:57:32,690::logUtils::44::dispatcher::(wrapper) Run and protect: spmStart(spUUID='afa055e0-94c4-477d-b793-2d0927f13341', prevID=3, prevLVER='13', recoveryMode=None, scsiFencing='false', maxHostID=250, domVersion='3', options=None)


From the logs above is obvious that new spm was started after more than 3 minutes. More info can be found in attached logs.

Version-Release number of selected component (if applicable):
rhevm 3.3.0-0.13.master.el6ev
vdsm-4.12.0-rc3.13.git06ed3cc.el6ev

How reproducible:
Always

Steps to Reproduce:
1. Have at least two hosts in data-center and at least one storage
2. Move SPM host to maintenance
3.

Actual results:
It takes a long time till dc is up

Expected results:
new start of SPM is called sooner than 3 minutes after selection


Additional info:
Logs attached
I'm not sure if this is regression
Comment 1 Jakub Libosvar 2013-08-06 09:58:23 EDT

*** This bug has been marked as a duplicate of bug 986961 ***

Note You need to log in before you can comment on or make changes to this bug.