Description of problem: Even with the fix for BZ1905108, engine logs still show duplicate device address during simultaneous hotplug of virtio-scsi disks. 1. At 07:50, VM had 6 disks 2021-07-30 07:50:26,644+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DumpXmlsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) [] FINISH, DumpXmlsVDSCommand, return: {863b9986-37cc-4f6a-a583-907774cd2f4f=<domain type='kvm' id='12' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> <name>ost2-c6blc-worker-1-2vbpq</name> <uuid>863b9986-37cc-4f6a-a583-907774cd2f4f</uuid> .... <disk type='block' device='disk' snapshot='no'> <serial>7ece7407-754a-4d80-9733-b2dfb10ca692</serial> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='block' device='disk' snapshot='no'> <serial>f3bf8cbf-4122-427f-9d3a-96883461c415</serial> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk> <disk type='block' device='disk' snapshot='no'> <serial>eebf8ca0-c3cf-452e-8f01-200bddcb69d4'</serial> <address type='drive' controller='0' bus='0' target='0' unit='5'/> </disk> <disk type='block' device='disk' snapshot='no'> <serial>1b43b270-1c5d-432e-80a1-c1ca07e3522b</serial> <address type='drive' controller='0' bus='0' target='0' unit='3'/> </disk> <disk type='block' device='disk' snapshot='no'> <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> <disk type='block' device='disk' snapshot='no'> <serial>94535d43-af59-411f-b831-6dca256a603f</serial> <address type='drive' controller='0' bus='0' target='0' unit='4'/> </disk> Let's name them for clarity purposes Disk A = 7ece7407-754a-4d80-9733-b2dfb10ca692 Disk B = f3bf8cbf-4122-427f-9d3a-96883461c415 Disk C = eebf8ca0-c3cf-452e-8f01-200bddcb69d4 Disk D = 1b43b270-1c5d-432e-80a1-c1ca07e3522b Disk E = 34f1d6e1-e9c0-493c-9009-340764473212 Disk F = 94535d43-af59-411f-b831-6dca256a603f 2. Request to unplug 3 of them, disks D, E and F (last 3) 2021-07-30 07:53:57,880+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotUnPlugDiskVDSCommand] (default task-17773) [2e1bbf53-cf7e-4595-b701-6b9bac057365] Disk hot-unplug: <?xml version="1.0" encoding="UTF-8"?><hotunplug> <devices> <disk> <alias name="ua-94535d43-af59-411f-b831-6dca256a603f"/> </disk> </devices> </hotunplug> 2021-07-30 07:53:57,886+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotUnPlugDiskVDSCommand] (default task-17775) [531c6f8a-023e-42a0-aaaa-3ce4827e31d9] Disk hot-unplug: <?xml version="1.0" encoding="UTF-8"?><hotunplug> <devices> <disk> <alias name="ua-1b43b270-1c5d-432e-80a1-c1ca07e3522b"/> </disk> </devices> </hotunplug> 2021-07-30 07:53:57,893+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotUnPlugDiskVDSCommand] (default task-17765) [2756bc54-3dd9-495e-9f19-c13317c0b4ac] Disk hot-unplug: <?xml version="1.0" encoding="UTF-8"?><hotunplug> <devices> <disk> <alias name="ua-34f1d6e1-e9c0-493c-9009-340764473212"/> </disk> </devices> </hotunplug> 3. All 3 unplugged ok 2021-07-30 07:53:58,354+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17773) [2e1bbf53-cf7e-4595-b701-6b9bac057365] EVENT_ID: USER_DETACH_DISK_FROM_VM(2,018), Disk pvc-e51fe98b-bce9-48e2-85b5-c5d057baef3e was successfully detached from VM ost2-c6blc-worker-1-2vbpq by ostadm@internal. 2021-07-30 07:53:58,673+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17765) [2756bc54-3dd9-495e-9f19-c13317c0b4ac] EVENT_ID: USER_DETACH_DISK_FROM_VM(2,018), Disk pvc-f5d5e278-bcba-4a78-9743-6e82d6e07c5a was successfully detached from VM ost2-c6blc-worker-1-2vbpq by ostadm@internal. 2021-07-30 07:53:58,994+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17775) [531c6f8a-023e-42a0-aaaa-3ce4827e31d9] EVENT_ID: USER_DETACH_DISK_FROM_VM(2,018), Disk pvc-88c1ab75-3416-4672-8ad8-35e2b36f0338 was successfully detached from VM ost2-c6blc-worker-1-2vbpq by ostadm@internal. 4. Hotplug again, E, F and D 2021-07-30 07:54:04,406+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17775) [31ffe761-b28f-4285-975e-c58b325210a8] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug> <address bus="0" controller="0" unit="2" type="drive" target="0"/> <serial>94535d43-af59-411f-b831-6dca256a603f</serial> 2021-07-30 07:54:04,442+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17765) [99b7d303-d964-47df-9842-3ca7625479ef] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug> <address bus="0" controller="0" unit="2" type="drive" target="0"/> <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial> 2021-07-30 07:54:04,458+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17776) [5e1976cb-467a-4bd6-a2a5-8d1fa23deeab] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug> <address bus="0" controller="0" unit="3" type="drive" target="0"/> <serial>1b43b270-1c5d-432e-80a1-c1ca07e3522b</serial> So within ~50ms we have 3 hotplug requests, engine generated same address for 2 of them (E and F), while D got the address bumped. 5. The first and the last hotplugs suceeded 2021-07-30 07:54:05,567+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17775) [31ffe761-b28f-4285-975e-c58b325210a8] EVENT_ID: USER_ATTACH_DISK_TO_VM(2,016), Disk pvc-e51fe98b-bce9-48e2-85b5-c5d057baef3e was successfully attached to VM ost2-c6blc-worker-1-2vbpq by ostadm@internal. 2021-07-30 07:54:05,580+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17776) [5e1976cb-467a-4bd6-a2a5-8d1fa23deeab] EVENT_ID: USER_ATTACH_DISK_TO_VM(2,016), Disk pvc-88c1ab75-3416-4672-8ad8-35e2b36f0338 was successfully attached to VM ost2-c6blc-worker-1-2vbpq by ostadm@internal. 6. The one in the middle used the same address as the first, and it failed as expected: 2021-07-30 07:54:05,795+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17765) [99b7d303-d964-47df-9842-3ca7625479ef] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM s5cr060 command HotPlugDiskVDS failed: Requested operation is not valid: Domain already contains a disk with that address 7. Another request comes up to plug the failed disk, and it gets the unit bumped too 2021-07-30 07:54:06,000+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17776) [7861647f-3da9-45e2-ae37-8ab63adcb0d4] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug> <address bus="0" controller="0" unit="4" type="drive" target="0"/> <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial> Works, as expected. 2021-07-30 07:54:06,702+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17776) [7861647f-3da9-45e2-ae37-8ab63adcb0d4] EVENT_ID: USER_ATTACH_DISK_TO_VM(2,016), Disk pvc-f5d5e278-bcba-4a78-9743-6e82d6e07c5a was successfully attached to VM ost2-c6blc-worker-1-2vbpq by ostadm@internal. Version-Release number of selected component (if applicable): rhvm-4.4.5.11-0.1.el8ev.noarch How reproducible: 0% on 4.4.7, tried 2 ways: - In admin portal, select multiple disks and activate/deactivate simultaneously - python sdk doing 20 hotplugs in a loop.
(In reply to Germano Veit Michel from comment #0) > How reproducible: > 0% on 4.4.7, tried 2 ways: > - In admin portal, select multiple disks and activate/deactivate > simultaneously > - python sdk doing 20 hotplugs in a loop. Just tried with connections=20, pipeline=20, wait=False to increase paralellism with async, still doesnt reproduce.
Yeah, it should no longer happen since 4.4.6 that includes the fix for bz 1927243 Now there is no parallel assignment of addresses and the devices monitoring should not interfere with those assignments so I hope it fixes this issue once and for all
(In reply to Germano Veit Michel from comment #0) > How reproducible: > 0% on 4.4.7, tried 2 ways: > - In admin portal, select multiple disks and activate/deactivate > simultaneously > - python sdk doing 20 hotplugs in a loop. In light of the above and considering that the fix for bz 1927243 is likely to solve it, I'm closing it as CURRENTRELEASE on 4.4.7
(In reply to Arik from comment #4) > (In reply to Germano Veit Michel from comment #0) > > How reproducible: > > 0% on 4.4.7, tried 2 ways: > > - In admin portal, select multiple disks and activate/deactivate > > simultaneously > > - python sdk doing 20 hotplugs in a loop. > > In light of the above and considering that the fix for bz 1927243 is likely > to solve it, I'm closing it as CURRENTRELEASE on 4.4.7 Thanks, doing a KCS to put all the fixes together so we don't raise this bug again :)
Arik, I'm going to re-open this. Customer upgraded to 4.4.7 and it still happens for them, and I still can't reproduce. So I assume the bug is somehow still there and there, maybe just harder to hit. See: 2021-08-04 19:07:53,870+02 INFO [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 40) [] Running ovirt-engine 4.4.7.7-0.1.el8ev 2021-08-06 06:45:42,883+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-929) [146bf104-e8bb-4b21-8aa0-80958e77d9cb] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug> ... <address bus="0" controller="0" unit="1" type="drive" target="0"/> <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial> ... 2021-08-06 06:45:43,568+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-932) [f758b383-6534-45f1-9bba-d282d4576472] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug> ... <address bus="0" controller="0" unit="1" type="drive" target="0"/> <serial>1b43b270-1c5d-432e-80a1-c1ca07e3522b</serial> ... As above, the addresses are duplicate, the first succeeded but the second, with correlation f758b383-6534-45f1-9bba-d282d4576472, failed as expected: 2021-08-06 06:45:44,402+02 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-932) [f758b383-6534-45f1-9bba-d282d4576472] Command 'org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = Requested operation is not valid: Domain already contains a disk with that address, code = 45 (Failed with error FailedToPlugDisk and code 45)
Since this issue is really hard to reproduce, we decided to verify it by running our hotplug regression tests: All tests passed. Verified on ovirt-engine-4.4.8.4-0.7.el8ev. Moving to 'VERIFIED'.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.8]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3460