Bug 1393295

Summary: [scale] - setVdsStatus command failed
Product: [oVirt] ovirt-engine Reporter: Eldad Marciano <emarcian>
Component: Backend.CoreAssignee: Nobody <nobody>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Eldad Marciano <emarcian>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.0.5CC: bugs, emarcian, mperina, oourfali, rgolan
Target Milestone: ---Keywords: Performance
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-28 12:43:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Thread dumps none

Comment 2 Eldad Marciano 2016-11-09 09:58:27 UTC
(public comment)
some of the vms faield to migrate and the host is in recovering state for long time.
also I tried to stop all the running vms in the host manually via 'kill' command.

the engine didnt refresh the vm monitor and it still showing 54 vms on it.
i tired to push the host into "confirm host rebooted" and maintenance state but it fails

seems like the host state cannot be released.

2016-11-09 09:37:48,084 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-33) [3ce4c3ec] START, SetVdsStatusVDSCommand(HostName = , SetVdsStatusVDSCommandParameters:{runAsync='true', hostId='7c2a9b0b-4fca-4cd1-8950-01ad5af9ea68', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason='null'}), log id: 5b20d170
2016-11-09 09:37:49,940 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-33) [7bb7776] Correlation ID: 3ce4c3ec, Job ID: 86167ccb-08a7-4706-be4a-956c84b00df1, Call Stack: null, Custom Event ID: -1, Message: Host  cannot change into maintenance mode - not all Vms have been migrated successfully. Consider manual intervention: stopping/migrating Vms: 

seems like this problem depends on BZ https://bugzilla.redhat.com/show_bug.cgi?id=1390296

Comment 3 Eldad Marciano 2016-11-09 10:04:06 UTC
Created attachment 1218871 [details]
Thread dumps

Comment 4 Martin Perina 2016-11-10 14:37:24 UTC
Please provide VDSM logs, it's hard to say what happened and why the is recovering for a long time. Also it would help to engine.log from the time that host recovering issue started to appear

Comment 5 Eldad Marciano 2016-11-15 13:15:52 UTC
(In reply to Martin Perina from comment #4)
> Please provide VDSM logs, it's hard to say what happened and why the is
> recovering for a long time. Also it would help to engine.log from the time
> that host recovering issue started to appear

is it hard to say when it appears, we have a lot of hosts and the problem sporadically occur.
there is another method we can take?

Comment 6 Oved Ourfali 2016-11-17 13:16:47 UTC
The host id should be part of the log, so please provide the host logs for this one once this happens.

Comment 7 Oved Ourfali 2016-11-28 12:43:53 UTC
Please re-open if it happens again and provide all details.

Comment 8 Eldad Marciano 2017-01-04 10:23:50 UTC
this bug might related to the monitoring lock issue...
afaik, we agreed to pending with it, until the https://bugzilla.redhat.com/show_bug.cgi?id=1364791 will be fixed.


this bug https://bugzilla.redhat.com/show_bug.cgi?id=1364791
generates lots of side affects mainly around vds.