Bug 1691244
Summary: | [scale] HE - unable to migrate HE vm by putting the host to maintenance | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-hosted-engine-ha | Reporter: | Ilan Zuckerman <izuckerm> | ||||
Component: | Broker | Assignee: | Simone Tiraboschi <stirabos> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | David Vaanunu <dvaanunu> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | --- | CC: | adevolder, ascerra, bugs, dvaanunu, jroberts, lrotenbe, michal.skrivanek, mkalinin, mlehrer, mperina, nsednev, rbarry, sabose | ||||
Target Milestone: | ovirt-4.3.5 | Keywords: | Performance, Regression | ||||
Target Release: | --- | Flags: | sbonazzo:
ovirt-4.3?
pm-rhel: blocker? sbonazzo: planning_ack? sbonazzo: devel_ack+ sbonazzo: testing_ack? |
||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | ovirt-hosted-engine-setup-2.3.11 | Doc Type: | Bug Fix | ||||
Doc Text: |
In the past when the host running the engine VM was set into maintenance mode from the engine, the engine VM was going to be migrated by ovirt-ha-agent as an indirect action caused by the maintenance mode.
This process wasn't fully controlled by the engine and it could potentially lead to side effects on specific cases.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-07-30 14:08:33 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1724792 | ||||||
Attachments: |
|
Description
Ilan Zuckerman
2019-03-21 08:58:40 UTC
One more addition to this BZ: When you trigger migration of the HE vm MANUALLY from the WebAdmin, then the migration works fine. Meaning that the HE vm is migrated successfully to the second available host. Looks like virt team field. Ryan can you please have a look? Migration is invoked by HE broker (most likely) because of local maint being set - 2019-03-21 07:15:29,281Z INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-10) [] VM 'a9a6eaa8-cd51-4d86-89b0-3f8b64dd9f21'(HostedEngine) moved from 'Up' --> 'MigratingFrom' in vdsm: 2019-03-21 07:15:29,273+0000 INFO (jsonrpc/2) [api.virt] START migrate(params={u'tunneled': False, u'autoConverge': True, u'dstqemu': u'b01-h03-r620.rhev.openstack.engineering.redhat.com', u'src': u'localhost', u'enableGuestEvents': True, u'dst': u'b01-h03-r620.rhev.openstack.engineering.redhat.com', u'convergenceSchedule': {u'init': [{u'params': [u'100'], u'name': u'setDowntime'}], u'stalling': [{u'action': {u'params': [u'150'], u'name': u'setDowntime'}, u'limit': 1}, {u'action': {u'params': [u'200'], u'name': u'setDowntime'}, u'limit': 2}, {u'action': {u'params': [u'300'], u'name': u'setDowntime'}, u'limit': 3}, {u'action': {u'params': [u'400'], u'name': u'setDowntime'}, u'limit': 4}, {u'action': {u'params': [u'500'], u'name': u'setDowntime'}, u'limit': 6}, {u'action': {u'params': [u'5000'], u'name': u'setDowntime'}, u'limit': -1}, {u'action': {u'params': [], u'name': u'abort'}, u'limit': -1}]}, u'vmId': u'a9a6eaa8-cd51-4d86-89b0-3f8b64dd9f21', u'abortOnError': True, u'compressed': False, u'method': u'online'}) from=::1,34144, vmId=a9a6eaa8-cd51-4d86-89b0-3f8b64dd9f21 (api:48) and then progresses indefinitely, e.g.: 2019-03-21 07:27:13,832+0000 WARN (migmon/a9a6eaa8) [virt.vm] (vmId='a9a6eaa8-cd51-4d86-89b0-3f8b64dd9f21') Migration stalling: remaining (12253MiB) > lowm ark (357MiB). (migration:849) 2019-03-21 07:27:13,832+0000 INFO (migmon/a9a6eaa8) [virt.vm] (vmId='a9a6eaa8-cd51-4d86-89b0-3f8b64dd9f21') Migration Progress: 700.357 seconds elapsed, 63% of data processed, total data: 32864MB, processed data: 36313MB, remaining data: 12253MB, transfer speed 52MBps, zero pages: 263841MB, compressed: 0MB, dirty rate: 20630, memory iteration: 2 (migration:879) the last migration monitor update is 2019-03-21 09:10:52,176+0000 INFO (migmon/a9a6eaa8) [virt.vm] (vmId='a9a6eaa8-cd51-4d86-89b0-3f8b64dd9f21') Migration Progress: 2500.89 seconds elapsed, 99% of data processed, total data: 32864MB, processed data: 273298MB, remaining data: 558MB, transfer speed 52MBps, zero pages: 757443MB, compressed: 0MB, dirty rate: 14328, memory iteration: 19 (migration:879) then it's aborted at 2019-03-21 09:11:02,184+0000 WARN (migmon/a9a6eaa8) [virt.vm] (vmId='a9a6eaa8-cd51-4d86-89b0-3f8b64dd9f21') Aborting migration (migration:914) the main difference between the manual attempt and he-broker invoked migration is that the cluster is using Legacy profile with 625Mbps bandwidth limit vs Suspend policy(with 5s downtime) capped at 52Mbps migrating indefinitely It seems the migration is actually abort due to miscalculated iterations which Milan is fixing in https://gerrit.ovirt.org/#/c/98792/. But that's unrelated to this problem moving to Integration for possible improvement in HE Broker to either 1) use engine API to invoke the migration (which would use the same parameters as a manual migration in UI) 2) or send better convergence schedule (use more bandwidth than the default of 52Mbps when nothing is set) Engine tries to guess link speed. 3) or let engine migrate itself, do not initiate migrating from the broker when engine is responsive This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. Moving to HA broker: plan is to remove the functionality from HA broker and let the engine do the migration. *** Bug 1691508 has been marked as a duplicate of this bug. *** rpm -qa | grep rhv-release rhv-release-4.3.5-6-001.noarch Software Version:4.3.5.4-0.1.el7 Test case verified twice successfully. 1. Migrate HE manually from Host1 to Host2 2019-07-15 12:58:15,128+03 INFO [org.ovirt.engine.core.bll.MigrateVmToServerCommand] (default task-230) [89932e62-aecd-4f68-800c-9dede34bb692] Running command: MigrateVmToServerCommand internal: false. Entities affected : ID: 81415768-ae4e-4b6b-9888-379656c5c9dd Type: VMAction group MIGRATE_VM with role type USER 2019-07-15 12:58:15,293+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-230) [89932e62-aecd-4f68-800c-9dede34bb692] EVENT_ID: VM_MIGRATION_START(62), Migration started (VM: HostedEngine, Source: host_mixed_1, Destination: host_mixed_2, User: admin@internal-authz). 2019-07-15 12:58:39,789+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-44) [] VM '81415768-ae4e-4b6b-9888-379656c5c9dd' is migrating to VDS '724fe6fb-2a51-44d8-a07b-484c9c9f6cc1'(host_mixed_2) ignoring it in the refresh until migration is done 2019-07-15 12:58:55,301+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-21) [] EVENT_ID: VM_MIGRATION_DONE(63), Migration completed (VM: HostedEngine, Source: host_mixed_1, Destination: host_mixed_2, Duration: 28 seconds, Total: 40 seconds, Actual downtime: 244ms) 2. Host2 change to maintenance, HE migrates automatically to Host1 2019-07-15 14:04:09,832+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-248) [6a0c4466] EVENT_ID: VM_MIGRATION_START_SYSTEM_INITIATED(67), Migration initiated by system (VM: HostedEngine, Source: host_mixed_2, Destination: host_mixed_1, Reason: ). 2019-07-15 14:04:09,840+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-248) [6a0c4466] EVENT_ID: USER_VDS_MAINTENANCE_WITHOUT_REASON(620), Host host_mixed_2 was switched to Maintenance mode by admin@internal-authz. 2019-07-15 14:04:10,398+03 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-25) [17d16dc4-32bb-4db9-a674-f05b49ef8ec1] Command 'MaintenanceNumberOfVdss' id: 'f27fc39e-d916-4cef-8c47-9de559ce9774' child commands '[]' executions were completed, status 'SUCCEEDED' 2019-07-15 14:04:13,886+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-40) [] VM '81415768-ae4e-4b6b-9888-379656c5c9dd' is migrating to VDS 'af584899-03a4-40d6-9cf8-60a74c66085e'(host_mixed_1) ignoring it in the refresh until migration is done 2019-07-15 14:04:49,837+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-8) [] EVENT_ID: VM_MIGRATION_DONE(63), Migration completed (VM: HostedEngine, Source: host_mixed_2, Destination: host_mixed_1, Duration: 23 seconds, Total: 40 seconds, Actual downtime: 339ms) This bug seams like a flavor of https://bugzilla.redhat.com/show_bug.cgi?id=1665934, which is fixed. This bugzilla is included in oVirt 4.3.5 release, published on July 30th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |