Bug 1989794 - engine still generates duplicate address for hotplug disks
Summary: engine still generates duplicate address for hotplug disks
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.5
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.4.8
: ---
Assignee: Arik
QA Contact: Evelina Shames
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-04 02:42 UTC by Germano Veit Michel
Modified: 2021-09-21 11:09 UTC (History)
4 users (show)

Fixed In Version: ovirt-engine-4.4.8.4
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-08 14:12:12 UTC
oVirt Team: Virt
Target Upstream Version:
eshames: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 6240381 0 None None None 2021-08-04 22:15:54 UTC
Red Hat Product Errata RHBA-2021:3460 0 None None None 2021-09-08 14:12:16 UTC
oVirt gerrit 116135 0 master MERGED core: serialize executions of AttachDiskToVmCommand 2021-08-09 13:21:21 UTC

Description Germano Veit Michel 2021-08-04 02:42:57 UTC
Description of problem:

Even with the fix for BZ1905108, engine logs still show duplicate device address during simultaneous hotplug of virtio-scsi disks.

1. At 07:50, VM had 6 disks

2021-07-30 07:50:26,644+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DumpXmlsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) [] FINISH, DumpXmlsVDSCommand, return: {863b9986-37cc-4f6a-a583-907774cd2f4f=<domain type='kvm' id='12' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>ost2-c6blc-worker-1-2vbpq</name>
  <uuid>863b9986-37cc-4f6a-a583-907774cd2f4f</uuid>
  ....
    <disk type='block' device='disk' snapshot='no'>
       <serial>7ece7407-754a-4d80-9733-b2dfb10ca692</serial>
       <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <serial>f3bf8cbf-4122-427f-9d3a-96883461c415</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <serial>eebf8ca0-c3cf-452e-8f01-200bddcb69d4'</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='5'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <serial>1b43b270-1c5d-432e-80a1-c1ca07e3522b</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <serial>94535d43-af59-411f-b831-6dca256a603f</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='4'/>
    </disk>
    
Let's name them for clarity purposes
Disk A = 7ece7407-754a-4d80-9733-b2dfb10ca692
Disk B = f3bf8cbf-4122-427f-9d3a-96883461c415
Disk C = eebf8ca0-c3cf-452e-8f01-200bddcb69d4
Disk D = 1b43b270-1c5d-432e-80a1-c1ca07e3522b
Disk E = 34f1d6e1-e9c0-493c-9009-340764473212
Disk F = 94535d43-af59-411f-b831-6dca256a603f

2. Request to unplug 3 of them, disks D, E and F (last 3)

2021-07-30 07:53:57,880+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotUnPlugDiskVDSCommand] (default task-17773) [2e1bbf53-cf7e-4595-b701-6b9bac057365] Disk hot-unplug: <?xml version="1.0" encoding="UTF-8"?><hotunplug>
  <devices>
    <disk>
      <alias name="ua-94535d43-af59-411f-b831-6dca256a603f"/>
    </disk>
  </devices>
</hotunplug>

2021-07-30 07:53:57,886+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotUnPlugDiskVDSCommand] (default task-17775) [531c6f8a-023e-42a0-aaaa-3ce4827e31d9] Disk hot-unplug: <?xml version="1.0" encoding="UTF-8"?><hotunplug>
  <devices>
    <disk>
      <alias name="ua-1b43b270-1c5d-432e-80a1-c1ca07e3522b"/>
    </disk>
  </devices>
</hotunplug>

2021-07-30 07:53:57,893+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotUnPlugDiskVDSCommand] (default task-17765) [2756bc54-3dd9-495e-9f19-c13317c0b4ac] Disk hot-unplug: <?xml version="1.0" encoding="UTF-8"?><hotunplug>
  <devices>
    <disk>
      <alias name="ua-34f1d6e1-e9c0-493c-9009-340764473212"/>
    </disk>
  </devices>
</hotunplug>

3. All 3 unplugged ok

2021-07-30 07:53:58,354+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17773) [2e1bbf53-cf7e-4595-b701-6b9bac057365] EVENT_ID: USER_DETACH_DISK_FROM_VM(2,018), Disk pvc-e51fe98b-bce9-48e2-85b5-c5d057baef3e was successfully detached from VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.
2021-07-30 07:53:58,673+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17765) [2756bc54-3dd9-495e-9f19-c13317c0b4ac] EVENT_ID: USER_DETACH_DISK_FROM_VM(2,018), Disk pvc-f5d5e278-bcba-4a78-9743-6e82d6e07c5a was successfully detached from VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.
2021-07-30 07:53:58,994+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17775) [531c6f8a-023e-42a0-aaaa-3ce4827e31d9] EVENT_ID: USER_DETACH_DISK_FROM_VM(2,018), Disk pvc-88c1ab75-3416-4672-8ad8-35e2b36f0338 was successfully detached from VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.

4. Hotplug again, E, F and D

2021-07-30 07:54:04,406+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17775) [31ffe761-b28f-4285-975e-c58b325210a8] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      <address bus="0" controller="0" unit="2" type="drive" target="0"/>
      <serial>94535d43-af59-411f-b831-6dca256a603f</serial>
 
2021-07-30 07:54:04,442+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17765) [99b7d303-d964-47df-9842-3ca7625479ef] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      <address bus="0" controller="0" unit="2" type="drive" target="0"/>
      <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial>
      
2021-07-30 07:54:04,458+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17776) [5e1976cb-467a-4bd6-a2a5-8d1fa23deeab] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      <address bus="0" controller="0" unit="3" type="drive" target="0"/>
      <serial>1b43b270-1c5d-432e-80a1-c1ca07e3522b</serial>
      
So within ~50ms we have 3 hotplug requests, engine generated same address for 2 of them (E and F), while D got the address bumped.

5. The first and the last hotplugs suceeded

2021-07-30 07:54:05,567+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17775) [31ffe761-b28f-4285-975e-c58b325210a8] EVENT_ID: USER_ATTACH_DISK_TO_VM(2,016), Disk pvc-e51fe98b-bce9-48e2-85b5-c5d057baef3e was successfully attached to VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.

2021-07-30 07:54:05,580+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17776) [5e1976cb-467a-4bd6-a2a5-8d1fa23deeab] EVENT_ID: USER_ATTACH_DISK_TO_VM(2,016), Disk pvc-88c1ab75-3416-4672-8ad8-35e2b36f0338 was successfully attached to VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.

6. The one in the middle used the same address as the first, and it failed as expected:

2021-07-30 07:54:05,795+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17765) [99b7d303-d964-47df-9842-3ca7625479ef] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM s5cr060 command HotPlugDiskVDS failed: Requested operation is not valid: Domain already contains a disk with that address

7. Another request comes up to plug the failed disk, and it gets the unit bumped too

2021-07-30 07:54:06,000+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17776) [7861647f-3da9-45e2-ae37-8ab63adcb0d4] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      <address bus="0" controller="0" unit="4" type="drive" target="0"/>
      <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial>

Works, as expected.
      
2021-07-30 07:54:06,702+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17776) [7861647f-3da9-45e2-ae37-8ab63adcb0d4] EVENT_ID: USER_ATTACH_DISK_TO_VM(2,016), Disk pvc-f5d5e278-bcba-4a78-9743-6e82d6e07c5a was successfully attached to VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.

Version-Release number of selected component (if applicable):
rhvm-4.4.5.11-0.1.el8ev.noarch

How reproducible:
0% on 4.4.7, tried 2 ways:
- In admin portal, select multiple disks and activate/deactivate simultaneously
- python sdk doing 20 hotplugs in a loop.

Comment 1 Germano Veit Michel 2021-08-04 03:02:56 UTC
(In reply to Germano Veit Michel from comment #0)
> How reproducible:
> 0% on 4.4.7, tried 2 ways:
> - In admin portal, select multiple disks and activate/deactivate
> simultaneously
> - python sdk doing 20 hotplugs in a loop.

Just tried with connections=20, pipeline=20, wait=False to increase paralellism with async, still doesnt reproduce.

Comment 3 Arik 2021-08-04 10:05:55 UTC
Yeah, it should no longer happen since 4.4.6 that includes the fix for bz 1927243
Now there is no parallel assignment of addresses and the devices monitoring should not interfere with those assignments so I hope it fixes this issue once and for all

Comment 4 Arik 2021-08-04 10:10:14 UTC
(In reply to Germano Veit Michel from comment #0)
> How reproducible:
> 0% on 4.4.7, tried 2 ways:
> - In admin portal, select multiple disks and activate/deactivate
> simultaneously
> - python sdk doing 20 hotplugs in a loop.

In light of the above and considering that the fix for bz 1927243 is likely to solve it, I'm closing it as CURRENTRELEASE on 4.4.7

Comment 5 Germano Veit Michel 2021-08-04 22:15:55 UTC
(In reply to Arik from comment #4)
> (In reply to Germano Veit Michel from comment #0)
> > How reproducible:
> > 0% on 4.4.7, tried 2 ways:
> > - In admin portal, select multiple disks and activate/deactivate
> > simultaneously
> > - python sdk doing 20 hotplugs in a loop.
> 
> In light of the above and considering that the fix for bz 1927243 is likely
> to solve it, I'm closing it as CURRENTRELEASE on 4.4.7

Thanks, doing a KCS to put all the fixes together so we don't raise this bug again :)

Comment 6 Germano Veit Michel 2021-08-06 06:15:18 UTC
Arik, I'm going to re-open this. Customer upgraded to 4.4.7 and it still happens for them, and I still can't reproduce.

So I assume the bug is somehow still there and there, maybe just harder to hit. See:

2021-08-04 19:07:53,870+02 INFO  [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 40) [] Running ovirt-engine 4.4.7.7-0.1.el8ev

2021-08-06 06:45:42,883+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-929) [146bf104-e8bb-4b21-8aa0-80958e77d9cb] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      ...
      <address bus="0" controller="0" unit="1" type="drive" target="0"/>
      <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial>
      ...
      
      
2021-08-06 06:45:43,568+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-932) [f758b383-6534-45f1-9bba-d282d4576472] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      ...
      <address bus="0" controller="0" unit="1" type="drive" target="0"/>
      <serial>1b43b270-1c5d-432e-80a1-c1ca07e3522b</serial>
      ...

As above, the addresses are duplicate, the first succeeded but the second, with correlation f758b383-6534-45f1-9bba-d282d4576472, failed as expected:

2021-08-06 06:45:44,402+02 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-932) [f758b383-6534-45f1-9bba-d282d4576472] Command 'org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = Requested operation is not valid: Domain already contains a disk with that address, code = 45 (Failed with error FailedToPlugDisk and code 45)

Comment 17 Evelina Shames 2021-08-19 12:23:56 UTC
Since this issue is really hard to reproduce, we decided to verify it by running our hotplug regression tests: All tests passed.
Verified on ovirt-engine-4.4.8.4-0.7.el8ev.

Moving to 'VERIFIED'.

Comment 22 errata-xmlrpc 2021-09-08 14:12:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.8]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3460


Note You need to log in before you can comment on or make changes to this bug.