Created attachment 1795103 [details] migration_vma_new.yaml Description of problem: After a VM is migrated, its ARP tables are messed up and will prevent the VM from outside connectivity. That will recover only once the VM receives ping from the outside or after 5 minutes refresh of ARP. We should make sure that the connectivity becomes available immediately after the migration. Version-Release number of selected component (if applicable): CNV - v.4.8.0 OCP - v.4.8.0-fc.5 Kubernetes Version: v1.21.0-rc.0+88a3e8c How reproducible: every time the machine is migrated. Steps to Reproduce: 1. create a dedicated namespace for the resources that will be created in the next steps. Name it 'anat-test-migration-masquerade' to match the namespace defined in the files attached. 2. create 2 vms (vma and vmb) as a single interface vm's (masquerade) (use 'migration_vma_new.yaml' and 'migration_vmb_new.yaml' files attached). 3. start both VM's: $ virtctl start vma $ virtctl start vmb 4. migrate vmb (use 'migration_virtualmachineinstancemigration.yaml' file attached). 5. find the exact moment when the migration finishes. you can find that moment by checking when the vmi is assigned a new IP address using the command: $ oc get vmi -w 6. as soon as the migration finishes, connect to the migrated VM (vmb). it is important to connect using console and not ssh because connecting through ssh can solve the bug: $ virtctl console vmb 7. ping from vmb to vma over the main interface (masquerade): $ ping 10.0.2.1 Actual results: [fedora@vmb-1624797047-2293534 ~]$ ping 10.0.2.1 PING 10.0.2.1 (10.0.2.1) 56(84) bytes of data. 64 bytes from 10.0.2.1: icmp_seq=9 ttl=64 time=0.635 ms 64 bytes from 10.0.2.1: icmp_seq=10 ttl=64 time=0.289 ms 64 bytes from 10.0.2.1: icmp_seq=11 ttl=64 time=0.332 ms 64 bytes from 10.0.2.1: icmp_seq=12 ttl=64 time=0.765 ms --- 10.0.2.1 ping statistics --- 12 packets transmitted, 4 received, 66.6667% packet loss, time 11183ms rtt min/avg/max/mdev = 0.289/0.505/0.765/0.200 ms Expected results: no packet loss. Additional info:
Created attachment 1795104 [details] migration_vmb_new.yaml
Created attachment 1795105 [details] migration_virtualmachineinstancemigration.yaml
Should be fixed U/S. We are waiting for 4.9 D/S to pick up the fix.
Petr, when should this fix show up in the downstream version?
IIUIC, we switched D/S for 4.9 to follow the main branch only 2 days ago and it seems unstable. So the work on getting it is ongoing, but I can't tell when it will be available as stabilization of new D/S version is usually pretty difficult. I will let you know once we have it available.
D/S builds of 4.9 are now available.
(In reply to Petr Horáček from comment #6) > D/S builds of 4.9 are now available. @phoracek The fix will be available only when Kubevirt v.0.44 is on D/S build. The current status of 'ON_QA' is problematic because we cannot test it yet.
The feature should be already on D/S - D/S follows HEAD of the main brunch up until the feature freeze, when it switches to the stable one. With this, the fix should be available as a part of the current D/S build.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4104