Description of problem: Connectivity issues after migrating an instance Version-Release number of selected component (if applicable): RHEL OSP 8 # awk '/neutron/ {print $1}' installed-rpms neutron-ml2-driver-apic-2015.2.5-95.el7.noarch neutron-opflex-agent-2015.2.3-35.el7.noarch openstack-neutron-7.2.0-5.el7ost.noarch openstack-neutron-common-7.2.0-5.el7ost.noarch openstack-neutron-ml2-7.2.0-5.el7ost.noarch openstack-neutron-openvswitch-7.2.0-5.el7ost.noarch python-neutron-7.2.0-5.el7ost.noarch python-neutronclient-3.1.0-2.el7ost.noarch How reproducible: Intermittently for customer. 55 compute nodes Steps to Reproduce: - Before live-migration instance was pingable. - Traffic lost during the migration. - Once the instance is spawned successfully on destination compute node still it not reachable for 5 mins. Which is equal to 300s default timeout for linux bridge entry. - Traffic was reaching upto the qvb interface of linux bridge but was not reaching to tap interface. - Security rules doesn't seem to be an issue here. - Captured linux bridge mac entries at the time of issue indicates that wrong port and MAC address mapping happened. - Newly populated after timeout of old entry, pick the right port corresponding to MAC address. - After that instance was reachable. Actual results: It was showing wrong mapping between port and MAC address on compute node. Expected results: It should show correct mapping right after the migration instead of waiting for timeout value. Additional info: More information coming in next internal comments.
This sounds similar to: https://bugzilla.redhat.com/show_bug.cgi?id=1372384#c24 Is the customer using Emulex NICs?
Thanks Assaf.
I looked into this further and remembered the 'qbr' bridges are managed by the nova libvirt code. After searching through the nova and os-vif repos I could not find a change related to a bug like this, so I'd like to re-assign this to the nova team to get some help with further debugging.
It's a knowing issue between Neutron and Nova. Where RARP packets sent by QEMU are dropped because Nova is starting migration while ports are not well tagged by Neutron Agent. There is an initiative to fix the issue from Neutron by sending an event when agent is discovering that new port and finishing to setup it (tagging). But that is only working for OVS [0], the other mechanisms are using a tap device which will be created only when the migration occurs. The patch in Nova side to wait for that event has been abandoned [1]. Two related issues: - https://bugzilla.redhat.com/show_bug.cgi?id=1259749 (OVS) - https://bugzilla.redhat.com/show_bug.cgi?id=1420587 (linuxbridge) In bug 1420587 David proposed a workaround in QEMU by increasing the number of RARP sent https://bugzilla.redhat.com/show_bug.cgi?id=1420587#c22 And it has been proposed upstream: http://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05586.html [0] https://review.openstack.org/#/c/246898/ [1] https://review.openstack.org/#/c/246910/
Thanks Sahid, I skim through the Red Hat bugs mentioned by you. Bug # 1420587 which is related to linuxbridge as a mechanism driver doesn't seem to be a issue here as Cu. not using linuxbridge as mechansim driver. I started reading about following bug and stumble upon comment, again they are talking about ARP entries about ovs bridge level however in this case Cu. is facing issue due to linux bridge which is used for security groups. https://bugzilla.redhat.com/show_bug.cgi?id=1259749#c27 What should we do to make further progress in this bug?
Hi Sahid, Can you please let me know, what next plan of action should we share with customer?
Hello Vikrant not sure to understand why you are thinking that issue is not related to 1420587 and 1259749. My thinking is that the problem is because the network is not setup by the time the guest CPUs start on destination host. Basically virtio-net has feature (see: FS_GUEST_ANNOUNCE) to attempt self-announce after 1ms, 50ms, 150ms, 250ms and 350ms when guest is starting, but we are still setting up the network. So the VM starts at 2017-04-26 11:30:49.643+0000 2017-04-26 11:30:49.643+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.2 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-11-10-04:43:57, x86-034.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-27.el7), hostname: cfs1pnc50.infra.es.iaas.igrupobbva LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00034699,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-168-instance-00034699/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -cpu Broadwell,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+vmx,+smx,+est,+tm2,+xtpr,+pdcm,+dca,+osxsave,+f16c,+rdrand,+arat,+tsc_adjust,+xsaveopt,+pdpe1gb,+abm,+rtm,+hle -m 8192 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 0ad52b3b-bac8-4659-bf95-ed14e8cb3045 -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=12.0.5-9.el7ost,serial=c928e5a8-3f47-4fa1-934a-90b9c1a009cc,uuid=0ad52b3b-bac8-4659-bf95-ed14e8cb3045,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-168-instance-00034699/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/0ad52b3b-bac8-4659-bf95-ed14e8cb3045/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=40 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:b5:c4:a9,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/0ad52b3b-bac8-4659-bf95-ed14e8cb3045/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:10 -k es -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on char device redirected to /dev/pts/16 (label charserial1) Looking at var/log/messages we can see how network get configured on the host: sferdjao@collab-shell log]$ grep cdd7b6e7 messages Apr 26 13:30:32 cfs1pnc50 kernel: qbrcdd7b6e7-a1: port 2(tapcdd7b6e7-a1) entered disabled state Apr 26 13:30:32 cfs1pnc50 kernel: device tapcdd7b6e7-a1 left promiscuous mode Apr 26 13:30:32 cfs1pnc50 kernel: qbrcdd7b6e7-a1: port 2(tapcdd7b6e7-a1) entered disabled state Apr 26 13:30:32 cfs1pnc50 lldpd[39294]: error while receiving frame on tapcdd7b6e7-a1: Network is down Apr 26 13:30:32 cfs1pnc50 lldpd[39294]: removal request for tapcdd7b6e7-a1, but no knowledge of it Apr 26 13:30:32 cfs1pnc50 kernel: qbrcdd7b6e7-a1: port 1(qvbcdd7b6e7-a1) entered disabled state Apr 26 13:30:33 cfs1pnc50 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=120 -- --if-exists del-port br-int qvocdd7b6e7-a1 Apr 26 13:30:33 cfs1pnc50 lldpd[39294]: error while receiving frame on qvocdd7b6e7-a1: Network is down Apr 26 13:30:33 cfs1pnc50 agent-ovs[39404]: [src/FSEndpointSource.cpp:448:deleted] Removed endpoint cdd7b6e7-a15b-4b7f-bf84-9cc4987ce3aa|fa-16-3e-b5-c4-a9 at "/var/lib/opflex-agent-ovs/endpoints/cdd7b6e7-a15b-4b7f-bf84-9cc4987ce3aa_fa:16:3e:b5:c4:a9.ep" Apr 26 13:30:34 cfs1pnc50 ntpd[17859]: Deleting interface #150 qvbcdd7b6e7-a1, fe80::309d:91ff:fe4d:f1fc#123, interface stats: received=0, sent=0, dropped=0, active_time=1816762 secs Apr 26 13:30:34 cfs1pnc50 ntpd[17859]: Deleting interface #149 qvocdd7b6e7-a1, fe80::606c:bff:feed:e1e2#123, interface stats: received=0, sent=0, dropped=0, active_time=1816762 secs Apr 26 13:30:34 cfs1pnc50 ntpd[17859]: Deleting interface #148 tapcdd7b6e7-a1, fe80::fc16:3eff:feb5:c4a9#123, interface stats: received=0, sent=0, dropped=0, active_time=1816762 secs Apr 26 13:30:48 cfs1pnc50 kernel: IPv6: ADDRCONF(NETDEV_UP): qvbcdd7b6e7-a1: link is not ready VM STARTING and QEMU self-announce... Apr 26 13:30:49 cfs1pnc50 kernel: device qvbcdd7b6e7-a1 entered promiscuous mode Apr 26 13:30:49 cfs1pnc50 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): qvbcdd7b6e7-a1: link becomes ready Apr 26 13:30:49 cfs1pnc50 kernel: device qvocdd7b6e7-a1 entered promiscuous mode Apr 26 13:30:49 cfs1pnc50 kernel: qbrcdd7b6e7-a1: port 1(qvbcdd7b6e7-a1) entered forwarding state Apr 26 13:30:49 cfs1pnc50 kernel: qbrcdd7b6e7-a1: port 1(qvbcdd7b6e7-a1) entered forwarding state Apr 26 13:30:49 cfs1pnc50 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=120 -- --if-exists del-port qvocdd7b6e7-a1 -- add-port br-int qvocdd7b6e7-a1 -- set Interface qvocdd7b6e7-a1 external-ids:iface-id=cdd7b6e7-a15b-4b7f-bf84-9cc4987ce3aa external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:b5:c4:a9 external-ids:vm-uuid=0ad52b3b-bac8-4659-bf95-ed14e8cb3045 FINISHING network configuration... At this point, 850 ms could have elapsed and so all the ARP requests to instruct the bridge lost... Apr 26 13:30:49 cfs1pnc50 kernel: device tapcdd7b6e7-a1 entered promiscuous mode Apr 26 13:30:49 cfs1pnc50 kernel: qbrcdd7b6e7-a1: port 2(tapcdd7b6e7-a1) entered forwarding state Apr 26 13:30:49 cfs1pnc50 kernel: qbrcdd7b6e7-a1: port 2(tapcdd7b6e7-a1) entered forwarding state Apr 26 13:30:50 cfs1pnc50 agent-ovs[39404]: [src/FSEndpointSource.cpp:430:updated] Updated endpoint Endpoint[uuid=cdd7b6e7-a15b-4b7f-bf84-9cc4987ce3aa|fa-16-3e-b5-c4-a9,ips=[192.168.10.135],ipAddressMappings=[192.168.10.135->10.48.232.101],eg=/PolicyUniverse/PolicySpace/_IaaS_S1P_Compute/GbpEpGroup/IaaS_S1P%7clbaas/,mac=fa:16:3e:b5:c4:a9,iface=qvocdd7b6e7-a1,dhcpv4 from "/var/lib/opflex-agent-ovs/endpoints/cdd7b6e7-a15b-4b7f-bf84-9cc4987ce3aa_fa:16:3e:b5:c4:a9.ep" Apr 26 13:30:52 cfs1pnc50 ntpd[17859]: Listen normally on 533 qvbcdd7b6e7-a1 fe80::1c63:1bff:fe30:8841 UDP 123 Apr 26 13:30:52 cfs1pnc50 ntpd[17859]: Listen normally on 534 tapcdd7b6e7-a1 fe80::fc16:3eff:feb5:c4a9 UDP 123 Apr 26 13:30:52 cfs1pnc50 ntpd[17859]: Listen normally on 535 qvocdd7b6e7-a1 fe80::9c4c:93ff:fe6c:2d01 UDP 123 Apr 26 13:30:54 cfs1pnc50 agent-ovs[39404]: [src/FSEndpointSource.cpp:430:updated] Updated endpoint Endpoint[uuid=cdd7b6e7-a15b-4b7f-bf84-9cc4987ce3aa|fa-16-3e-b5-c4-a9,ips=[192.168.10.135],ipAddressMappings=[192.168.10.135->10.48.232.101],eg=/PolicyUniverse/PolicySpace/_IaaS_S1P_Compute/GbpEpGroup/IaaS_S1P%7clbaas/,mac=fa:16:3e:b5:c4:a9,iface=qvocdd7b6e7-a1,dhcpv4 from "/var/lib/opflex-agent-ovs/endpoints/cdd7b6e7-a15b-4b7f-bf84-9cc4987ce3aa_fa:16:3e:b5:c4:a9.ep" To conclude I think 850ms is too short to consider the network to be well configured. Basically all of that work should have been done before we start the migration. One solution is the hack on QEMU discussed in comment 15, also there is work in QEMU to provide a guest announce API [0] so we can try to trigger it at post-live-migration. I continue my investigation between neutron/nova to workaround that issue without the need of a QEMU change. [0] http://lists.gnu.org/archive/html/qemu-devel/2017-05/msg00137.html
I discussed with David Gilbert who is suggesting to tcpdump destination for ARPs/RARPs before migration starts. So we should be able to see the packets that QEMU emits, then we should be able to trace them along the network and see where they disappear.
Many thanks Sahid for detailed description in C#18. I am just trying to reiterate what have you said to make sure that I have understood it correctly. - After the migration VM will self-announce about the MAC and port mapping when the network is set. VM will try to announce it at different intervals "1ms, 50ms, 150ms, 250ms and 350ms". - But in this case, VM is spending 850ms in network setup due to which all intervals (1ms, 50ms, 150ms, 250ms and 350ms) are missed to announce the MAC and port mapping to linux bridge hence linux bridge was not able to refresh the port and MAC mapping. Once the linux bridge entry lease expire (5min), it was able to pop-up the new MAC and port mapping. - By increasing the announce time to 12s as mentioned in [1] we should be able to circumvent this issue because by that time VM will be having the network configured. Sure, we can take the tcpdump but can you please let me know on which interfaces I need to take the tcpdump on destination node? I am just bit cautions because it's not very easily reproducible. Cu. earlier tried to capture the ICMP traffic on interface tap, qvb and qvo of migrated instance when they were not able to ping the instance. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1420587#c22
(In reply to VIKRANT from comment #20) > Many thanks Sahid for detailed description in C#18. I am just trying to > reiterate what have you said to make sure that I have understood it > correctly. > > - After the migration VM will self-announce about the MAC and port mapping > when the network is set. VM will try to announce it at different intervals > "1ms, 50ms, 150ms, 250ms and 350ms". > > - But in this case, VM is spending 850ms in network setup due to which all > intervals (1ms, 50ms, 150ms, 250ms and 350ms) are missed to announce the MAC > and port mapping to linux bridge hence linux bridge was not able to refresh > the port and MAC mapping. Once the linux bridge entry lease expire (5min), > it was able to pop-up the new MAC and port mapping. Yes except that the self-announce is not announcing port mapping. Basically it's RARP requests broadcasted on L2, The bridge is going to receive those packets only one port and so update its ARP table. > - By increasing the announce time to 12s as mentioned in [1] we should be > able to circumvent this issue because by that time VM will be having the > network configured. > > > Sure, we can take the tcpdump but can you please let me know on which > interfaces I need to take the tcpdump on destination node? Well I'd say a tcpdump on any interface filtering by the VM mac address and arp. > I am just bit > cautions because it's not very easily reproducible. Cu. earlier tried to > capture the ICMP traffic on interface tap, qvb and qvo of migrated instance > when they were not able to ping the instance. Yes, probably having the system loaded is going to create delay to help reproducing the issue. I captured the packets; 4 packets sent at 0ms 4 packets sent at 150ms 4 packets sent at 400ms 4 packets sent at 750ms > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1420587#c22
Ok after further analysis it seems that at some point during the live migration we have on a same L2 a MAC address associated to two different linux bridge; source/destination would result that the learning table of the destination bridge could consider that to reach that MAC to use the uplink. Setting the ageing to zero makes the MAC on the table persistent and avoid any override of the learning table. The fix pushed upstream seems to be in good shape to be accepted. The fix is on OS-VIF but can be easily backported on OSP versions which do not use that lib. https://review.openstack.org/#/c/501132/ We will probably have to consider backport it for OSP6, 7, 9 and 10 and so clone this issue.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3068