+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1703112 +++ ====================================================================== Description of problem: Sometimes, when an SR-IOV NIC is hotplugged, the PCI device address is getting lost from the database. Because of this, the NIC name is getting changed within the VM after a reboot, because the PCI address is getting changed as libvirt will generate a new address since the engine will not pass the same while starting the VM. In my test setup, I started a while loop to list the vm_device during hot plugging and also enabled the all_log statement for PostgreSQL. ==== while true; do date; /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select address,host_device from vm_device where vm_id='3d2859c2-31a3-4ce3-97fc-52b09e60ce8c' and type='interface';";sleep 1;done Thu Apr 25 08:53:55 EDT 2019 address | host_device --------------------------------------------------------------+------------------ {type=pci, slot=0x07, bus=0x00, domain=0x0000, function=0x0} | pci_0000_07_12_1 {type=pci, slot=0x08, bus=0x00, domain=0x0000, function=0x0} | pci_0000_07_11_5 {type=pci, slot=0x09, bus=0x00, domain=0x0000, function=0x0} | pci_0000_07_10_1 | Thu Apr 25 08:53:56 EDT 2019 address | host_device --------------------------------------------------------------+------------------ {type=pci, slot=0x07, bus=0x00, domain=0x0000, function=0x0} | pci_0000_07_12_1 {type=pci, slot=0x08, bus=0x00, domain=0x0000, function=0x0} | pci_0000_07_11_5 {type=pci, slot=0x09, bus=0x00, domain=0x0000, function=0x0} | pci_0000_07_10_1 {type=pci, slot=0x0a, bus=0x00, domain=0x0000, function=0x0} | pci_0000_07_10_5 ====> Address was saved here. (4 rows) Thu Apr 25 08:53:58 EDT 2019 address | host_device --------------------------------------------------------------+------------------ {type=pci, slot=0x07, bus=0x00, domain=0x0000, function=0x0} | pci_0000_07_12_1 {type=pci, slot=0x08, bus=0x00, domain=0x0000, function=0x0} | pci_0000_07_11_5 {type=pci, slot=0x09, bus=0x00, domain=0x0000, function=0x0} | pci_0000_07_10_1 | pci_0000_07_10_5 ====> Immediately after 2 seconds, the address was removed. ====== In the PostgreSQL logs, I can see that there were two updatevmdevice requests during hotplugging. The second one was with _empty_ address. === updatevmdevice with address. 2019-04-25 12:53:56.211 UTC LOG: execute S_81: select * from public.updatevmdevice(CAST ($1 AS uuid),CAST ($2 AS uuid),CAST ($3 AS varchar),CAST ($4 AS varchar),CAST ($5 AS varchar),CAST ($6 AS text),CAST ($7 AS bool),CAST ($8 AS bool),CAST ($9 AS bool),CAST ($10 AS varchar),CAST ($11 AS text),CAST ($12 AS uuid),CAST ($13 AS varchar),CAST ($14 AS varchar)) as result 2019-04-25 12:53:56.211 UTC DETAIL: parameters: $1 = 'a644c267-368b-4193-8935-7d1c3daf265b', $2 = '3d2859c2-31a3-4ce3-97fc-52b09e60ce8c', $3 = 'hostdev', $4 = 'interface', $5 = '{type=pci, slot=0x0a, bus=0x00, domain=0x0000, function=0x0}', $6 = '{ }', $7 = 't', $8 = 't', $9 = 'f', $10 = 'ua-a644c267-368b-4193-8935-7d1c3daf265b', $11 = '{ }', $12 = NULL, $13 = NULL, $14 = 'pci_0000_07_10_5' updatevmdevice with _empty_ address after 2 seconds. 2019-04-25 12:53:58.747 UTC LOG: execute <unnamed>: select * from updatevmdevice($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) as result 2019-04-25 12:53:58.747 UTC DETAIL: parameters: $1 = 'a644c267-368b-4193-8935-7d1c3daf265b', $2 = '3d2859c2-31a3-4ce3-97fc-52b09e60ce8c', $3 = 'hostdev', $4 = 'interface', $5 = '', $6 = '{ }', $7 = 't', $8 = 't', $9 = 'f', $10 = 'ua-a644c267-368b-4193-8935-7d1c3daf265b', $11 = '{ }', $12 = NULL, $13 = NULL, $14 = 'pci_0000_07_10_5' === Engine log doesn't show anything for the second request. === 2019-04-25 08:53:56,183-04 DEBUG [org.ovirt.engine.core.common.di.interceptor.DebugLoggingInterceptor] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [] method: runVdsCommand, params: [DumpXmls, Params:{hostId='bd982e52-cbea-4631-bfaf-31475f5683c7', vmIds='[3d2859c2-31a3-4ce3-97fc-52b09e60ce8c]'}], timeElapsed: 226ms 2019-04-25 08:53:56,210-04 DEBUG [org.ovirt.engine.core.dal.dbbroker.BatchProcedureExecutionConnectionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [] Executing batch for procedure UpdateVmDevice 2019-04-25 08:53:58,740-04 DEBUG [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-14) [49ff84ad] Before releasing a lock 'EngineLock:{exclusiveLocks='[bd982e52-cbea-4631-bfaf-31475f5683c7=HOST_DEVICES]', sharedLocks=''}' 2019-04-25 08:53:58,740-04 DEBUG [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-14) [49ff84ad] The exclusive lock for key 'bd982e52-cbea-4631-bfaf-31475f5683c7HOST_DEVICES' is released and lock is removed from map 2019-04-25 08:53:58,769-04 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-14) [49ff84ad] EVENT_ID: NETWORK_ACTIVATE_VM_INTERFACE_SUCCESS(1,012), Network Interface nic9 (PCI Passthrough) was plugged to VM vm_nijin. (User: admin@internal-authz) === Now if I reboot this VM, the NIC name will get changed within the VM because the engine will not pass the PCI address and libvirt generate another one. === 2019-04-25 08:55:46,137-04 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (EE-ManagedThreadFactory-engine-Thread-2517) [1df55bc1-4bf5-47f8-a8e0-458cd234a72c] START, CreateBrokerVDSCommand(HostName = , CreateVDSCommandParameters:{hostId='bd982e52-cbea-4631-bfaf-31475f5683c7', vmId='3d2859c2-31a3-4ce3-97fc-52b09e60ce8c', vm='VM [vm_nijin]'}), log id: 4cb9e33 2019-04-25 08:55:46,154-04 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (EE-ManagedThreadFactory-engine-Thread-2517) [1df55bc1-4bf5-47f8-a8e0-458cd234a72c] VM <?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0"> ------ ------ ------ <interface type="hostdev" managed="no"> <driver name="vfio"/> <source> <address bus="7" domain="0" function="1" slot="17" type="pci"/> </source> <alias name="ua-a644c267-368b-4193-8935-7d1c3daf265b"/> <mac address="00:1a:4a:16:01:29"/> <bandwidth/> </interface> ===== Since PCI address got changed, the network device name will get changed within the VM. ==== Apr 25 08:53:52 dhcp181-122 kernel: igbvf 0000:00:0a.0: Address: 00:1a:4a:16:01:29 Apr 25 08:53:52 dhcp181-122 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ens10: link becomes ready Apr 25 08:53:52 dhcp181-122 kernel: igbvf 0000:00:0a.0: Link is Up 1000 Mbps Full Duplex After reboot. Apr 25 08:56:02 dhcp181-122 kernel: igbvf 0000:00:06.0: Address: 00:1a:4a:16:01:29 Apr 25 08:56:06 dhcp181-122 kernel: igbvf 0000:00:06.0: Link is Up 1000 Mbps Full Duplex Apr 25 08:56:06 dhcp181-122 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ens6: link becomes ready ==== This can affect the configurations in the VM. Version-Release number of selected component (if applicable): rhvm-4.2.8.7-0.1.el7ev.noarch How reproducible: I can reproduce this almost 10% time when I try to hotplug the NIC to the VM in the test environment. Looks like this happens only on passthrough NIC since I was unable to reproduce this in virtio NICs. Steps to Reproduce: 1. Passthrough an SR-IOV NIC to the VM and watch the address in vm_device table. Actual results: PCI address of NICs are not stored in the database after a hotplug of passthrough NIC resulting in the change of network device name in VM after a reboot Expected results: PCI address of NIC should be persistent. Additional info: (Originally by Nijin Ashok)
Just from reading the source code, I expect that bug 1703112 will be reproducible by creating a before_get_caps hook on the host. The hook should just sleep for a short time between 1 to 15 seconds. I hope this would change the timing of the threads in Engine in a way that triggers the bug. (Originally by Dominik Holler)
(In reply to Dominik Holler from comment #7) > Just from reading the source code, I expect that bug 1703112 will be > reproducible by creating a before_get_caps hook on the host. > The hook should just sleep for a short time between 1 to 15 seconds. > I hope this would change the timing of the threads in Engine in a way > that triggers the bug. Maybe a high VdsRefreshRate would work, too. (Originally by Dominik Holler)
4.3.4.1 is available for QE (Originally by Dominik Holler)
Verified on - 4.3.4.1-0.1.el7 (Originally by Michael Burman)
*** Bug 1717356 has been marked as a duplicate of this bug. ***
Copy verified from https://bugzilla.redhat.com/show_bug.cgi?id=1703112#c12
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:1566