Bug 1419557

Summary: Switching to post-copy should catch exceptions
Product: [oVirt] vdsm Reporter: Milan Zamazal <mzamazal>
Component: CoreAssignee: Milan Zamazal <mzamazal>
Status: CLOSED CURRENTRELEASE QA Contact: Israel Pinto <ipinto>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.18.15CC: bugs, gklein
Target Milestone: ovirt-4.1.1Flags: rule-engine: ovirt-4.1+
rule-engine: planning_ack+
rule-engine: devel_ack+
mavital: testing_ack+
Target Release: 4.19.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-21 09:31:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Milan Zamazal 2017-02-06 13:50:13 UTC
Description of problem:

When switching to post-copy migration, the result value of the corresponding libvirt call is examined. However, the call raises an exception rather than returning an error code on failure. That exception should be caught and handled appropriately.

How reproducible:

I don't know how to reproduce the bug easily in a real situation. But it can be artificially reproduced with modified Vdsm sources.

Steps to Reproduce:
1. Modify Vdsm sources: Make _post_copy_flag method in migration.py always return 0.
2. Start a busy migration with post-copy schedule.
3. Wait until the switch to post-copy happens.

Actual results:

Traceback appears in vdsm.log and the migration continues "wildly".

Expected results:

The failure is logged in vdsm.log and the next migration schedule (abort) is executed.

Additional info:

Comment 1 Israel Pinto 2017-02-13 15:30:53 UTC
Verify with:
Engine: 4.1.1-0.1.el7
Host:
OS Version:RHEL - 7.3 - 7.el7
Kernel Version:3.10.0 - 550.el7.x86_64
KVM Version:2.6.0 - 28.el7_3.3.1
LIBVIRT Version:libvirt-2.0.0-10.el7_3.4
VDSM Version:vdsm-4.19.5-1.el7ev

Steps to Reproduce:
1. Modify Vdsm sources: Make _post_copy_flag method in migration.py always return 0.
2. Start a busy migration with post-copy schedule.
3. Wait until the switch to post-copy happens.

Results:
Error file to log and migration continue and finish in post_copy mode
from the log:
2017-02-13 13:21:05,781 INFO  (migmon/fe35b83e) [vdsm.api] START switch_migration_to_post_copy args=(<virt.vm.Vm object at 0x360dd90>,) kwargs={} (api:37)
2017-02-13 13:21:05,781 INFO  (migmon/fe35b83e) [virt.vm] (vmId='fe35b83e-62f5-4641-b0df-84bd4af2a10b') Switching to post-copy migration (vm:1578)
2017-02-13 13:21:05,781 INFO  (migmon/fe35b83e) [virt.vm] (vmId='fe35b83e-62f5-4641-b0df-84bd4af2a10b') Stopping connection (guestagent:430)
2017-02-13 13:21:05,782 INFO  (migmon/fe35b83e) [virt.vm] (vmId='fe35b83e-62f5-4641-b0df-84bd4af2a10b') Starting connection (guestagent:245)
2017-02-13 13:21:05,784 INFO  (migmon/fe35b83e) [vdsm.api] FINISH switch_migration_to_post_copy return=False (api:43)
2017-02-13 13:21:05,784 WARN  (migmon/fe35b83e) [virt.vm] (vmId='fe35b83e-62f5-4641-b0df-84bd4af2a10b') Failed to switch to post-copy migration (migration:820)
2017-02-13 13:21:05,784 INFO  (migmon/fe35b83e) [virt.vm] (vmId='fe35b83e-62f5-4641-b0df-84bd4af2a10b') Migration Progress: 280 seconds elapsed, 99% of data processed, total data: 1096MB, processed data: 565MB, remaining data: 10MB, transfer speed 2MBps, zero pages: 209008MB, compressed: 0MB, dirty rate: 1503, memory iteration: 57 (migration:787)