Created attachment 1425630 [details] local_maintenance_first_host Description of problem: The HE-VM didn't migrate to the additional host while putting this host to local maintenance. Version-Release number of selected component (if applicable): rhvh-4.2.2.1-0.20180420.0+1 cockpit-ovirt-dashboard-0.11.22-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.18-1.el7ev.noarch rhvm-appliance-4.2-20180420.0.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Install the latest RHVH4.1.11 2. Deploy HE on the first host via cockpit 3. Add the another host into the cluster 4. Put the first host into local maintenance 5. Check the HE-VM status on the engine and cockpit Actual results: After step5, The HE-VM didn't migrate to the additional host while putting this host to local maintenance. Expected results: After step5, the HE-VM should migrate to the additional host, and it's status is up on the additional host Additional info:
(In reply to Yihui Zhao from comment #0) > Created attachment 1425630 [details] > local_maintenance_first_host > > Description of problem: > The HE-VM didn't migrate to the additional host while putting this host to > local maintenance. > > Version-Release number of selected component (if applicable): > rhvh-4.2.2.1-0.20180420.0+1 > cockpit-ovirt-dashboard-0.11.22-1.el7ev.noarch > ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch > ovirt-hosted-engine-setup-2.2.18-1.el7ev.noarch > rhvm-appliance-4.2-20180420.0.el7.noarch > > How reproducible: > 100% > > > Steps to Reproduce: > 1. Install the latest RHVH4.1.11 Shoule be "install the latest RHVH4.2.2" > 2. Deploy HE on the first host via cockpit > 3. Add the another host into the cluster > 4. Put the first host into local maintenance > 5. Check the HE-VM status on the engine and cockpit > > Actual results: > After step5, The HE-VM didn't migrate to the additional host while putting > this host to local maintenance. > > Expected results: > After step5, the HE-VM should migrate to the additional host, and it's > status is up on the additional host > > Additional info:
Created attachment 1425631 [details] local_maintenance_terminal
Please ensure is also in local maintenance on the host (though it should be, because we pull this from "hosted-engine --vm-status --json"
Created attachment 1425632 [details] from_engine
Created attachment 1425634 [details] after_manual_migrate
Created attachment 1425635 [details] logs
(In reply to Ryan Barry from comment #3) > Please ensure is also in local maintenance on the host (though it should be, > because we pull this from "hosted-engine --vm-status --json" See https://bugzilla.redhat.com/attachment.cgi?id=1425631
I see an almost normal migration attempt in the log: 2018-04-23 13:26:00,498 Local maintenance detected 2018-04-23 13:26:00,526 EngineUp-LocalMaintenanceMigrateVm 2018-04-23 13:26:00,766 Score is 0 due to local maintenance mode 2018-04-23 13:26:00,832 The VM is running locally or we have no data, keeping the domain monitor. 2018-04-23 13:26:10,865 Continuing to monitor migration ... 2018-04-23 13:26:50,954 Global maintenance detected 2018-04-23 13:26:50,984 EngineMigratingAway-GlobalMaintenance 2018-04-23 13:26:51,231 Current state GlobalMaintenance (score: 3400) ... 2018-04-23 13:29:10,312 GlobalMaintenance-ReinitializeFSM 2018-04-23 13:29:20,576 ReinitializeFSM-EngineDown) 2018-04-23 13:29:30,848 Engine vm is running on host 10.73.73.106 (id 2)
Now this is interesting: 2018-04-23 17:46:55,061 EngineUp-LocalMaintenanceMigrateVm 2018-04-23 17:46:55,309 LocalMaintenanceMigrateVm-ReinitializeFSM 2018-04-23 17:46:55,309 The VM is running locally or we have no data, keeping the domain monitor. 2018-04-23 17:47:05,321 Local maintenance detected 2018-04-23 17:47:05,340ReinitializeFSM-LocalMaintenance This sequence generally means the migration failed. We do have logging there in 4.2, but we never backported the big change that contained the logging clenups to 4.1. Can we get the VDSM log?
(In reply to Martin Sivák from comment #9) > Now this is interesting: > > 2018-04-23 17:46:55,061 EngineUp-LocalMaintenanceMigrateVm > 2018-04-23 17:46:55,309 LocalMaintenanceMigrateVm-ReinitializeFSM > 2018-04-23 17:46:55,309 The VM is running locally or we have no data, > keeping the domain monitor. > 2018-04-23 17:47:05,321 Local maintenance detected > 2018-04-23 17:47:05,340ReinitializeFSM-LocalMaintenance > > > This sequence generally means the migration failed. We do have logging there > in 4.2, but we never backported the big change that contained the logging > clenups to 4.1. > > Can we get the VDSM log? The VDSM log is also here: https://bugzilla.redhat.com/attachment.cgi?id=1425635
Too bad I can't correlate the vdsm and hosted engine logs.. which host is the vdsm.log from?
Created attachment 1425891 [details] vdsm_log
Update: Tested with these versions, it works for me. cockpit-ovirt-dashboard-0.11.23-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.19-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.11-1.el7ev.noarch rhvm-appliance-4.2-20180420.0.el7.noarch Tested steps: 1. put one host into local maintenance(first host is deploying HE, and the second host is the additional host) #1. put first host into local maintenance: The VM migration is ok, find the completed message in first host vdsm.log: """ 2018-04-26 16:04:06,215+0800 INFO (migmon/d14af27b) [virt.vm] (vmId='d14af27b-9859-4197-ac79-50ec9693bc1b') Migration Progress: 80 seconds elapsed, 99% of data processed, total data: 16444MB, processed data: 4039MB, r emaining data: 70MB, transfer speed 52MBps, zero pages: 3336262MB, compressed: 0MB, dirty rate: 2697, memory iteration: 3 (migration:867)<br> 7744 2018-04-26 16:04:08,107+0800 INFO (libvirt/events) [virt.vm] (vmId='d14af27b-9859-4197-ac79-50ec9693bc1b') CPU stopped: onSuspend (vm:6104)<br> 7745 2018-04-26 16:04:09,216+0800 INFO (migsrc/d14af27b) [virt.vm] (vmId='d14af27b-9859-4197-ac79-50ec9693bc1b') migration took 83 seconds to complete (migration:514)<br> 7746 2018-04-26 16:04:09,216+0800 INFO (migsrc/d14af27b) [virt.vm] (vmId='d14af27b-9859-4197-ac79-50ec9693bc1b') Changed state to Down: Migration succeeded (code=4) (vm:1683)<br><br> """ #2. remove the first host from maintenance, then put the second host into local maintenance The VM migration is also OK. Also find the completed message in second host vdsm.log """ migsrc/d14af27b) [virt.vm] (vmId='d14af27b-9859-4197-ac79-50ec9693bc1b') migration took 120 seconds to complete (migration:514)<br>2018-04-26 18:04:57,986+0800 INFO (migsrc/d14af27b) [virt.vm] (vmId='d14af27b-9859-4197-ac79-50ec9693bc1b') Changed state to Down: Migration succeeded (code=4) (vm:1683)<br>2018-04-26 18:04:58,037+0800 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:573)<br>2018-04-26 18:04:58,041+0800 INFO (jsonrpc/7) [api.virt] START getMigrationStatus() from=::1,47502, vmId=d14af27b-9859-4197-ac79-50ec9693bc1b (api:46)<br>2018-04-26 18:04:58,041+0800 INFO (jsonrpc/7) [virt.vm] (vmId='d14af27b-9859-4197-ac79-50ec9693bc1b') new computed progress 98 < than old value 100, discarded (migration:200)<br>2018-04-26 18:04:58,041+0800 INFO (jsonrpc/7) [api.virt] FINISH getMigrationStatus return={'status': {'message': 'Done', 'code': 0}, 'migrationStats': {'status': {'message': 'Migration in progress', 'code': 0}, 'progress': 100, 'downtime': 193L}} from=::1,47502, vmId=d14af27b-9859-4197-ac79-50ec9693bc1b (api:52)<br>2018-04-26 18:04:58,041+0800 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call VM.getMigrationStatus succeeded in 0.00 seconds (__init__:573)<br><br> """ So, close it as working for me.