Bug 1660837

Summary: Host with HP VMs which have no suitable host for migration is stuck forever in PreparingForMaintenance state
Product: [oVirt] ovirt-engine Reporter: Polina <pagranat>
Component: BLL.VirtAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED DEFERRED QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: bugs, pagranat, rbarry
Target Milestone: ---Flags: rbarry: ovirt-4.5?
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-01 14:48:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs and screenshots
none
qemu logs none

Description Polina 2018-12-19 10:33:25 UTC
Created attachment 1515550 [details]
logs and screenshots

Description of problem: Host with HP VMs which have no suitable host for migration is stuck forever in PreparingForMaintenance state

Version-Release number of selected component (if applicable):
ovirt-engine-4.3.0-0.6.alpha2.el7.noarch

How reproducible:100%


Steps to Reproduce:
1. Three HP VMs were initially started on host3 in the environment with three hosts. From this host3 the VMs could not migrate to other hosts since incompatible CPU level:
   Host host_mixed_1 can't host the VM, flags are missing: model_Haswell-noTSX, bmi2, bmi1, sdbg, movbe, invpcid, cqm_llc, abm, fma, avx2, tsc_adjust, cqm, cqm_occup_llc
   Host host_mixed_2 can't host the VM, flags are missing: model_Haswell-noTSX, bmi2, bmi1, sdbg, movbe, invpcid, cqm_llc, abm, fma, avx2, tsc_adjust, cqm, cqm_occup_llc

2.Set host 3 in maintenance. => There is a warning in engine:

2018-12-19 11:33:38,679+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [4034961b] EVENT_ID: USER_VDS_MAINTENANCE_MIGRATION_FAILED(602), Host host_mixed_3 cannot change into maintenance mode - not all Vms have been migrated successfully. Consider manual intervention: stopping/migrating Vms: golden_env_mixed_virtio_1_0, golden_env_mixed_virtio_1_1, golden_env_mixed_virtio_2_0 (User: admin@internal-authz).

Actual results: host is stuck in PrepareForMaintenance forever.

2018-12-19 12:00:53,284+0200 INFO  (jsonrpc/2) [vdsm.api] FINISH stopMonitoringDomain error=Storage domain is member of pool: u'domain=e3c3813f-c8bb-4a20-9a65-57c45ecb7d92' from=::1,46140, task_id=efcd961b-adea-4cbb-8dad-b416ad759482 (api:52)
2018-12-19 12:00:53,284+0200 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='efcd961b-adea-4cbb-8dad-b416ad759482') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in stopMonitoringDomain
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3547, in stopMonitoringDomain
    raise se.StorageDomainIsMemberOfPool(sdUUID)
StorageDomainIsMemberOfPool: Storage domain is member of pool: u'domain=e3c3813f-c8bb-4a20-9a65-57c45ecb7d92'
Expected results:


Additional info: for QE: the scenario is reproducible on hosted-engine-05.lab.eng.tlv2.redhat.com

Comment 1 Polina 2018-12-19 10:52:39 UTC
Created attachment 1515563 [details]
qemu logs

Comment 2 Ryan Barry 2018-12-19 12:31:03 UTC
If a suitable host is found (affinity or scheduling rules relaxed), does it finish preparing?

Comment 3 Polina 2019-01-06 08:48:29 UTC
If a suitable host is found the host remains in PreparingForMaintenance state.

Comment 4 Michal Skrivanek 2020-03-18 15:50:37 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 5 Michal Skrivanek 2020-03-18 15:55:06 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 6 Michal Skrivanek 2020-04-01 14:48:54 UTC
Closing old bug. Please reopen if still relevant/you want to work on it.