Bug 1660837 - Host with HP VMs which have no suitable host for migration is stuck forever in PreparingForMaintenance state
Summary: Host with HP VMs which have no suitable host for migration is stuck forever i...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.3.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Michal Skrivanek
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-19 10:33 UTC by Polina
Modified: 2020-06-26 16:38 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-01 14:48:54 UTC
oVirt Team: Virt
Embargoed:
rbarry: ovirt-4.5?


Attachments (Terms of Use)
logs and screenshots (3.29 MB, application/gzip)
2018-12-19 10:33 UTC, Polina
no flags Details
qemu logs (3.28 KB, application/gzip)
2018-12-19 10:52 UTC, Polina
no flags Details

Description Polina 2018-12-19 10:33:25 UTC
Created attachment 1515550 [details]
logs and screenshots

Description of problem: Host with HP VMs which have no suitable host for migration is stuck forever in PreparingForMaintenance state

Version-Release number of selected component (if applicable):
ovirt-engine-4.3.0-0.6.alpha2.el7.noarch

How reproducible:100%


Steps to Reproduce:
1. Three HP VMs were initially started on host3 in the environment with three hosts. From this host3 the VMs could not migrate to other hosts since incompatible CPU level:
   Host host_mixed_1 can't host the VM, flags are missing: model_Haswell-noTSX, bmi2, bmi1, sdbg, movbe, invpcid, cqm_llc, abm, fma, avx2, tsc_adjust, cqm, cqm_occup_llc
   Host host_mixed_2 can't host the VM, flags are missing: model_Haswell-noTSX, bmi2, bmi1, sdbg, movbe, invpcid, cqm_llc, abm, fma, avx2, tsc_adjust, cqm, cqm_occup_llc

2.Set host 3 in maintenance. => There is a warning in engine:

2018-12-19 11:33:38,679+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [4034961b] EVENT_ID: USER_VDS_MAINTENANCE_MIGRATION_FAILED(602), Host host_mixed_3 cannot change into maintenance mode - not all Vms have been migrated successfully. Consider manual intervention: stopping/migrating Vms: golden_env_mixed_virtio_1_0, golden_env_mixed_virtio_1_1, golden_env_mixed_virtio_2_0 (User: admin@internal-authz).

Actual results: host is stuck in PrepareForMaintenance forever.

2018-12-19 12:00:53,284+0200 INFO  (jsonrpc/2) [vdsm.api] FINISH stopMonitoringDomain error=Storage domain is member of pool: u'domain=e3c3813f-c8bb-4a20-9a65-57c45ecb7d92' from=::1,46140, task_id=efcd961b-adea-4cbb-8dad-b416ad759482 (api:52)
2018-12-19 12:00:53,284+0200 ERROR (jsonrpc/2) [storage.TaskManager.Task] (Task='efcd961b-adea-4cbb-8dad-b416ad759482') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in stopMonitoringDomain
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 3547, in stopMonitoringDomain
    raise se.StorageDomainIsMemberOfPool(sdUUID)
StorageDomainIsMemberOfPool: Storage domain is member of pool: u'domain=e3c3813f-c8bb-4a20-9a65-57c45ecb7d92'
Expected results:


Additional info: for QE: the scenario is reproducible on hosted-engine-05.lab.eng.tlv2.redhat.com

Comment 1 Polina 2018-12-19 10:52:39 UTC
Created attachment 1515563 [details]
qemu logs

Comment 2 Ryan Barry 2018-12-19 12:31:03 UTC
If a suitable host is found (affinity or scheduling rules relaxed), does it finish preparing?

Comment 3 Polina 2019-01-06 08:48:29 UTC
If a suitable host is found the host remains in PreparingForMaintenance state.

Comment 4 Michal Skrivanek 2020-03-18 15:50:37 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 5 Michal Skrivanek 2020-03-18 15:55:06 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 6 Michal Skrivanek 2020-04-01 14:48:54 UTC
Closing old bug. Please reopen if still relevant/you want to work on it.


Note You need to log in before you can comment on or make changes to this bug.