993902 – A lot of time is needed after SPM is selected to bring DC up

Bug 993902 - A lot of time is needed after SPM is selected to bring DC up

Summary: A lot of time is needed after SPM is selected to bring DC up

Keywords:
Status:	CLOSED DUPLICATE of bug 986961
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.3.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.3.0
Assignee:	Nobody's working on this, feel free to take it
QA Contact:
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-06 13:01 UTC by Jakub Libosvar
Modified:	2016-02-10 20:35 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-08-06 13:58:23 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
engine, new spm, former spm logs (166.96 KB, application/gzip) 2013-08-06 13:01 UTC, Jakub Libosvar	no flags	Details
View All

Description Jakub Libosvar 2013-08-06 13:01:52 UTC

Created attachment 783318 [details]
engine, new spm, former spm logs

Description of problem:
There are errors due to monitoring tasks on former SPM which is already disconnected from the storage pool. These errors make needed amount of time for bringing DC up bigger.

Backend - times where former spm was stopped and when new is selected
2013-08-06 13:54:05,045 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (ajp-/127.0.0.1:8702-6) START, SpmStopVDSCommand(HostName = 10.34.63.135, HostId = 10ab4708-16f7-4ff7-bae6-b4f8d48cf3f8, storagePoolId = afa055e0-94c4-477d-b793-2d0927f13341), log id: 2ed6510a
2013-08-06 13:54:10,830 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-71) hostFromVds::selectedVds - 10.34.63.210, spmStatus Free, storage pool datacenter_storage_spm_priority_sanity

Former SPM:
Thread-1158::INFO::2013-08-06 13:54:05,067::logUtils::44::dispatcher::(wrapper) Run and protect: spmStop(spUUID='afa055e0-94c4-477d-b793-2d0927f13341', options=None)
Thread-1158::DEBUG::2013-08-06 13:54:05,068::resourceManager::197::ResourceManager.Request::(__init__) ResName=`Storage.afa055e0-94c4-477d-b793-2d0927f13341`ReqID=`029d4cfa-404f-43cc-9917-c55e7c18420e`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '594' at 'spmStop'
Thread-1158::INFO::2013-08-06 13:54:05,143::logUtils::47::dispatcher::(wrapper) Run and protect: spmStop, Return response: None
Thread-1161::INFO::2013-08-06 13:54:06,711::logUtils::44::dispatcher::(wrapper) Run and protect: disconnectStoragePool(spUUID='afa055e0-94c4-477d-b793-2d0927f13341', hostID=3, scsiKey='afa055e0-94c4-477d-b793-2d0927f13341', remove=False, options=None)
Thread-1161::INFO::2013-08-06 13:54:08,717::logUtils::47::dispatcher::(wrapper) Run and protect: disconnectStoragePool, Return response: True


New SPM:
Thread-1487::INFO::2013-08-06 13:57:32,690::logUtils::44::dispatcher::(wrapper) Run and protect: spmStart(spUUID='afa055e0-94c4-477d-b793-2d0927f13341', prevID=3, prevLVER='13', recoveryMode=None, scsiFencing='false', maxHostID=250, domVersion='3', options=None)


From the logs above is obvious that new spm was started after more than 3 minutes. More info can be found in attached logs.

Version-Release number of selected component (if applicable):
rhevm 3.3.0-0.13.master.el6ev
vdsm-4.12.0-rc3.13.git06ed3cc.el6ev

How reproducible:
Always

Steps to Reproduce:
1. Have at least two hosts in data-center and at least one storage
2. Move SPM host to maintenance
3.

Actual results:
It takes a long time till dc is up

Expected results:
new start of SPM is called sooner than 3 minutes after selection


Additional info:
Logs attached
I'm not sure if this is regression

Comment 1 Jakub Libosvar 2013-08-06 13:58:23 UTC


*** This bug has been marked as a duplicate of bug 986961 ***

Note You need to log in before you can comment on or make changes to this bug.