Bug 1410076
Summary: | [SR-IOV] - in-guest bond with virtio+passthrough slave lose connectivity after hotunplug/hotplug of passthrough slave | ||
---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Michael Burman <mburman> |
Component: | Core | Assignee: | Leon Goldberg <lgoldber> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Michael Burman <mburman> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.19.1 | CC: | bugs, danken, gklein, lgoldber, mburman, myakove, yfu |
Target Milestone: | ovirt-4.1.1 | Flags: | lgoldber:
needinfo-
rule-engine: ovirt-4.1+ rule-engine: blocker+ |
Target Release: | 4.19.5 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-04-21 09:51:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1341248 | ||
Bug Blocks: | 868811 |
Description
Michael Burman
2017-01-04 12:07:12 UTC
I suspect that this is the same issue we see with hotunplug/hotplug (regardless of migration), and that it shows up - sometimes - even when NM is stopped and masked. right? Correct. This only happens when the hosts have more than one VFs enabled and it's not related to plug/unplug vNIC. It seems that if the vNIC get different VF (slot) we lost connection. BOND created with: (Ping) <interface type='hostdev'> <mac address='00:1a:4a:16:20:76'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x05' slot='0x10' function='0x2'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </interface> After migration: (No ping) <interface type='hostdev'> <mac address='00:1a:4a:16:20:76'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x05' slot='0x10' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </interface> After migrate back to original host: (No ping) <interface type='hostdev'> <mac address='00:1a:4a:16:20:76'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x05' slot='0x10' function='0x0'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </interface> (In reply to Meni Yakove from comment #3) > This only happens when the hosts have more than one VFs enabled and it's not > related to plug/unplug vNIC. Not sure I understand. During migration we do plug/unplug, and that's the only way to change the VF connected to the guest > > It seems that if the vNIC get different VF (slot) we lost connection. Could it be that what kills traffic is having a stale VF on the host with the same mac as the one in the guest? 1. Create 3 VMs 2. Enable 2 VFs on the host 3. Add vNIC (passthrough) to VM 1 and start the VM 4. Make sure that VM 1 got IP and have connectivity 5. Check which VF the VM get: virsh -r dumpxml <vm-name> | grep -A8 "<interface type='hostdev'>" <source> <address type='pci' domain='0x0000' bus='0x05' slot='0x10' function='0x2'/> </source> VM 1 got function='0x2' 6. Unplug the vNIC from VM 1 7. Add vNIC (passthrough) to VM 2 and start the VM 8. Add vNIC (passthrough) to VM 3 and start the VM 9. Check which VM (2 or 3) got the same source VF that was on VM 1 10. Stop the VM that didn't get the same source VF that was on VM 1 11. Plug the vNIC back on VM 1 12. VM 1 should get different source VF and results with no connectivity. I am guessing that our problem is due to the former VF staying in the host with the same MAC as the VF owned by the VM (bug 1341248). Can you you try chainging its mac address, or take the vf down? To verify my guest, could you repeat the steps with a different nic type and driver? Leon, could you try the tricky workaround suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1341248#c17 ? Hi Michael and Dan, I tests in qemu side, according to the comment 5. Test version: qemu: qemu-kvm-rhev-2.6.0-27.el7.x86_64 kernel: kernel-3.10.0-514.el7.x86_64 nic driver: qlcnic Test steps: 1. create 2 VFs, and prepare 3 VMs. 2. add VF 1 to VM 1 ---> VF 1 get ip in VM 1 and work well 3. hotunplg VF 1 from VM 1. 4. add VF 1 to VM 2. ---> get ip, work well 5. add VF 2 to VM 3. ---> get ip, work well 6. hot unplug VF 2 from VM 3, then hot plug it to VM 1. 7. VM 1 work with VF 2, can get ip and work well. In step 2: VM 1 + VF 1: ip:10.73.33.183, Mac: 8a:ea:c2:7b:6e:f1 In step 4: VM 2 + VF 1: ip:10.73.33.183, Mac: 8a:ea:c2:7b:6e:f1 In step 5: VM 3 + VF 2: ip:10.73.33.190, Mac: 66:e3:ce:63:d7:64 In step 6: VM 1 + VF 2: ip:10.73.33.190, Mac: 66:e3:ce:63:d7:64 Conclusion: VF's mac don't change when add to different VMs, and can get ip normally. Seems can not reproduce this bug with qlcnic driver. Thanks! Let us verify this bug only when https://gerrit.ovirt.org/#/c/72135/ is in. Verified on - 4.1.1.2-0.1.el7 |