Bug 1686208

Summary: When migration completed "oc get vmi" shows obsolete IP address for pod network
Product: Container Native Virtualization (CNV) Reporter: Denys Shchedrivyi <dshchedr>
Component: VirtualizationAssignee: Vatsal Parekh <vparekh>
Status: CLOSED ERRATA QA Contact: zhe peng <zpeng>
Severity: low Docs Contact:
Priority: medium    
Version: 2.0CC: aspauldi, cnv-qe-bugs, fdeutsch, ipinto, kbidarka, ncredi, sgordon, sgott, vparekh, vromanso
Target Milestone: ---   
Target Release: 2.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 2.5.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-07 09:50:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Denys Shchedrivyi 2019-03-06 22:48:28 UTC
Description of problem:
 After successful migration new POD has new IP address, but "oc get vmi" and "oc describe vmi" show old one:

# oc get pod -o wide
NAME                                    READY     STATUS      RESTARTS   AGE       IP            NODE                                        NOMINATED NODE
172.16.0.17   cnv-executor-dshchedr-master1.example.com   <none>
virt-launcher-vm-cirros-pvc-cdi-kr6nh   0/1       Completed   0          44m       10.129.0.21   cnv-executor-dshchedr-node1.example.com     <none>
virt-launcher-vm-cirros-pvc-cdi-rl6ww   1/1       Running     0          37m       10.130.0.23   cnv-executor-dshchedr-node2.example.com     <none>


# oc get vmi
NAME                AGE       PHASE     IP            NODENAME
vm-cirros-pvc-cdi   52m       Running   10.129.0.21   cnv-executor-dshchedr-node2.example.com

# oc describe vmi | grep Ip
    Ip Address:      10.129.0.21


Version-Release number of selected component (if applicable):
2.0

How reproducible:
100%

Steps to Reproduce:
1. create MV
2. run migration
3. check IP Address

Actual results:
 "oc get vmi" shows obsolete IP address

Expected results:
 IP address should be updated


Additional info:

Comment 1 Fabian Deutsch 2019-03-26 19:48:22 UTC
It's only definetly a bug, if the IP of a NIC with the default network wasn't changed.

IOW Only the IP of a NIC attached to the POD network must return the new IP (of the target pod) after migration.

Comment 2 Fabian Deutsch 2019-05-07 07:10:36 UTC
I actually wonder if this bug is a dupe of bug 1693532

Comment 4 sgott 2019-08-09 19:45:37 UTC
https://github.com/kubevirt/kubevirt/pull/2405 is related to this BZ (it prevents the scenario that allowed this to happen).

Comment 7 sgott 2019-09-17 21:05:16 UTC
To verify: This scenario is no longer possible because migration via bridge interface is disabled.

Comment 8 Denys Shchedrivyi 2019-09-23 20:35:00 UTC
Verified on hco-bundle-registry:v2.1.0-62:

 Migration with Bridge - blocked

 Migration with Masquerade - allowed, but "oc get vmi" still show obsolete ip address:

$ oc get pod -o wide
NAME                                      READY     STATUS      RESTARTS   AGE       IP            NODE               NOMINATED NODE   READINESS GATES
virt-launcher-vm-fedora-cloudinit-g7dz9   2/2       Running     0          46s       10.130.0.65   host-172-16-0-34   <none>           <none>
virt-launcher-vm-fedora-cloudinit-x8lfb   0/2       Completed   0          11m       10.131.0.56   host-172-16-0-18   <none>           <none>

$ oc get vmi
NAME                  AGE       PHASE     IP            NODENAME
vm-fedora-cloudinit   11m       Running   10.131.0.56   host-172-16-0-34

 VM is accessible thru new IP address, but in "oc get vmi" and "oc describe vmi" it shows old one.. The only way to find new IP address - check active pod ip.. Is it possible to update IP? We are updating NODENAME field, can we do the same with IP?

Comment 9 sgott 2019-09-23 22:27:01 UTC
Denys,

Good catch! As you've observed, we actually don't make any effort to track/change the IP field. Sorry for the confusion, but https://github.com/kubevirt/kubevirt/pull/2405 doesn't address this scenario.

Comment 11 Fabian Deutsch 2019-09-24 12:39:05 UTC
Audrey, can we add a known issue for this bug?

Comment 12 sgott 2019-09-25 12:44:09 UTC
The question was raised as to whether a service connected to the VM's pod would still be able to route to/reach the new pod post migration. Yes it will. Services generally use selectors to define a logical set of pods. Since the relevant metadata on the virt-launcher pod won't change the service will still effectively point to the correct pod.

Thus in that sense, this bug can be considered visual only. For this IP to matter, somebody would need to be trying to connect directly to the (old) pod IP from within the cluster.

Comment 13 Audrey Spaulding 2019-09-25 13:16:18 UTC
Fabian, if this is no longer something for the release notes, can you please remove it from the PR? https://github.com/openshift/openshift-docs/pull/16756

Comment 14 Nelly Credi 2019-09-25 13:19:49 UTC
@Audrey please keep it in the release notes for now, as we are not sure it will get fixed in 2.1 timeframe

Comment 15 Fabian Deutsch 2019-09-26 11:08:04 UTC
This is a release note for 2.1 as this bug will not be fixed in 2.1.

This falls into the category of bugs which require us to update the VMI after live migration (and maybe other events in future like suspend/resume)

Comment 16 Fabian Deutsch 2019-11-11 15:26:29 UTC
When a masquerade binding is used (which is the default in 2.2) then this bug should be gone.

Comment 17 sgott 2019-11-11 20:04:39 UTC
Moving back to NEW. This issue in particular has not yet been adressed. As Denys explained in Comment #8, this bug is specifically about the apparent pod IP as seen from the rest of the cluster. We need to update this field in the VMI's status.

Comment 19 Vatsal Parekh 2020-01-06 14:02:08 UTC
Should be fixed by https://github.com/kubevirt/kubevirt/pull/2963

Comment 20 sgott 2020-02-19 17:18:36 UTC
Vatsal,

What's the status of that PR?

Comment 21 Vatsal Parekh 2020-02-20 10:49:18 UTC
While fixing this, we found a little corner case of race-condition to update VMI status when Masquerede and Guest agent was present
So this PR is kinda blocked by https://github.com/kubevirt/kubevirt/pull/3063

Both seem near finish to me

Comment 23 Vatsal Parekh 2020-06-26 11:09:46 UTC
Latest PR fixing this issue https://github.com/kubevirt/kubevirt/pull/3642

Comment 24 Vatsal Parekh 2020-07-23 16:34:15 UTC
This got merged upstream https://github.com/kubevirt/kubevirt/pull/3642

Comment 25 zhe peng 2020-09-25 02:03:46 UTC
verify with build virt-operator-container-v2.5.0-29

step:
1 create a vm
2 do migration
3 check ip of vm
$ oc get pods -o wide
NAME                         READY   STATUS      RESTARTS   AGE   IP            NODE                               NOMINATED NODE   READINESS GATES
virt-launcher-fedora-8f5wv   2/2     Running     0          38s   10.129.2.73   zpeng-cnv25-j4jvz-worker-0-jtmwl   <none>           <none>
virt-launcher-fedora-vmkg9   0/2     Completed   0          16h   10.128.2.47   zpeng-cnv25-j4jvz-worker-0-z82fw   <none>           <none>

$ oc get vmi
NAME     AGE   PHASE     IP            NODENAME
fedora   17h   Running   10.129.2.73   zpeng-cnv25-j4jvz-worker-0-jtmwl

vmi ip updated, move to verified.