Bug 1989794

Summary: engine still generates duplicate address for hotplug disks
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Arik <ahadas>
Status: CLOSED ERRATA QA Contact: Evelina Shames <eshames>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4.5CC: ahadas, eshames, mavital, unixbasis
Target Milestone: ovirt-4.4.8Keywords: Reopened
Target Release: ---Flags: eshames: testing_plan_complete-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.8.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-08 14:12:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2021-08-04 02:42:57 UTC
Description of problem:

Even with the fix for BZ1905108, engine logs still show duplicate device address during simultaneous hotplug of virtio-scsi disks.

1. At 07:50, VM had 6 disks

2021-07-30 07:50:26,644+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DumpXmlsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) [] FINISH, DumpXmlsVDSCommand, return: {863b9986-37cc-4f6a-a583-907774cd2f4f=<domain type='kvm' id='12' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>ost2-c6blc-worker-1-2vbpq</name>
  <uuid>863b9986-37cc-4f6a-a583-907774cd2f4f</uuid>
  ....
    <disk type='block' device='disk' snapshot='no'>
       <serial>7ece7407-754a-4d80-9733-b2dfb10ca692</serial>
       <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <serial>f3bf8cbf-4122-427f-9d3a-96883461c415</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <serial>eebf8ca0-c3cf-452e-8f01-200bddcb69d4'</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='5'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <serial>1b43b270-1c5d-432e-80a1-c1ca07e3522b</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <serial>94535d43-af59-411f-b831-6dca256a603f</serial>
      <address type='drive' controller='0' bus='0' target='0' unit='4'/>
    </disk>
    
Let's name them for clarity purposes
Disk A = 7ece7407-754a-4d80-9733-b2dfb10ca692
Disk B = f3bf8cbf-4122-427f-9d3a-96883461c415
Disk C = eebf8ca0-c3cf-452e-8f01-200bddcb69d4
Disk D = 1b43b270-1c5d-432e-80a1-c1ca07e3522b
Disk E = 34f1d6e1-e9c0-493c-9009-340764473212
Disk F = 94535d43-af59-411f-b831-6dca256a603f

2. Request to unplug 3 of them, disks D, E and F (last 3)

2021-07-30 07:53:57,880+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotUnPlugDiskVDSCommand] (default task-17773) [2e1bbf53-cf7e-4595-b701-6b9bac057365] Disk hot-unplug: <?xml version="1.0" encoding="UTF-8"?><hotunplug>
  <devices>
    <disk>
      <alias name="ua-94535d43-af59-411f-b831-6dca256a603f"/>
    </disk>
  </devices>
</hotunplug>

2021-07-30 07:53:57,886+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotUnPlugDiskVDSCommand] (default task-17775) [531c6f8a-023e-42a0-aaaa-3ce4827e31d9] Disk hot-unplug: <?xml version="1.0" encoding="UTF-8"?><hotunplug>
  <devices>
    <disk>
      <alias name="ua-1b43b270-1c5d-432e-80a1-c1ca07e3522b"/>
    </disk>
  </devices>
</hotunplug>

2021-07-30 07:53:57,893+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotUnPlugDiskVDSCommand] (default task-17765) [2756bc54-3dd9-495e-9f19-c13317c0b4ac] Disk hot-unplug: <?xml version="1.0" encoding="UTF-8"?><hotunplug>
  <devices>
    <disk>
      <alias name="ua-34f1d6e1-e9c0-493c-9009-340764473212"/>
    </disk>
  </devices>
</hotunplug>

3. All 3 unplugged ok

2021-07-30 07:53:58,354+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17773) [2e1bbf53-cf7e-4595-b701-6b9bac057365] EVENT_ID: USER_DETACH_DISK_FROM_VM(2,018), Disk pvc-e51fe98b-bce9-48e2-85b5-c5d057baef3e was successfully detached from VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.
2021-07-30 07:53:58,673+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17765) [2756bc54-3dd9-495e-9f19-c13317c0b4ac] EVENT_ID: USER_DETACH_DISK_FROM_VM(2,018), Disk pvc-f5d5e278-bcba-4a78-9743-6e82d6e07c5a was successfully detached from VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.
2021-07-30 07:53:58,994+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17775) [531c6f8a-023e-42a0-aaaa-3ce4827e31d9] EVENT_ID: USER_DETACH_DISK_FROM_VM(2,018), Disk pvc-88c1ab75-3416-4672-8ad8-35e2b36f0338 was successfully detached from VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.

4. Hotplug again, E, F and D

2021-07-30 07:54:04,406+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17775) [31ffe761-b28f-4285-975e-c58b325210a8] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      <address bus="0" controller="0" unit="2" type="drive" target="0"/>
      <serial>94535d43-af59-411f-b831-6dca256a603f</serial>
 
2021-07-30 07:54:04,442+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17765) [99b7d303-d964-47df-9842-3ca7625479ef] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      <address bus="0" controller="0" unit="2" type="drive" target="0"/>
      <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial>
      
2021-07-30 07:54:04,458+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17776) [5e1976cb-467a-4bd6-a2a5-8d1fa23deeab] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      <address bus="0" controller="0" unit="3" type="drive" target="0"/>
      <serial>1b43b270-1c5d-432e-80a1-c1ca07e3522b</serial>
      
So within ~50ms we have 3 hotplug requests, engine generated same address for 2 of them (E and F), while D got the address bumped.

5. The first and the last hotplugs suceeded

2021-07-30 07:54:05,567+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17775) [31ffe761-b28f-4285-975e-c58b325210a8] EVENT_ID: USER_ATTACH_DISK_TO_VM(2,016), Disk pvc-e51fe98b-bce9-48e2-85b5-c5d057baef3e was successfully attached to VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.

2021-07-30 07:54:05,580+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17776) [5e1976cb-467a-4bd6-a2a5-8d1fa23deeab] EVENT_ID: USER_ATTACH_DISK_TO_VM(2,016), Disk pvc-88c1ab75-3416-4672-8ad8-35e2b36f0338 was successfully attached to VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.

6. The one in the middle used the same address as the first, and it failed as expected:

2021-07-30 07:54:05,795+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17765) [99b7d303-d964-47df-9842-3ca7625479ef] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM s5cr060 command HotPlugDiskVDS failed: Requested operation is not valid: Domain already contains a disk with that address

7. Another request comes up to plug the failed disk, and it gets the unit bumped too

2021-07-30 07:54:06,000+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-17776) [7861647f-3da9-45e2-ae37-8ab63adcb0d4] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      <address bus="0" controller="0" unit="4" type="drive" target="0"/>
      <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial>

Works, as expected.
      
2021-07-30 07:54:06,702+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-17776) [7861647f-3da9-45e2-ae37-8ab63adcb0d4] EVENT_ID: USER_ATTACH_DISK_TO_VM(2,016), Disk pvc-f5d5e278-bcba-4a78-9743-6e82d6e07c5a was successfully attached to VM ost2-c6blc-worker-1-2vbpq by ostadm@internal.

Version-Release number of selected component (if applicable):
rhvm-4.4.5.11-0.1.el8ev.noarch

How reproducible:
0% on 4.4.7, tried 2 ways:
- In admin portal, select multiple disks and activate/deactivate simultaneously
- python sdk doing 20 hotplugs in a loop.

Comment 1 Germano Veit Michel 2021-08-04 03:02:56 UTC
(In reply to Germano Veit Michel from comment #0)
> How reproducible:
> 0% on 4.4.7, tried 2 ways:
> - In admin portal, select multiple disks and activate/deactivate
> simultaneously
> - python sdk doing 20 hotplugs in a loop.

Just tried with connections=20, pipeline=20, wait=False to increase paralellism with async, still doesnt reproduce.

Comment 3 Arik 2021-08-04 10:05:55 UTC
Yeah, it should no longer happen since 4.4.6 that includes the fix for bz 1927243
Now there is no parallel assignment of addresses and the devices monitoring should not interfere with those assignments so I hope it fixes this issue once and for all

Comment 4 Arik 2021-08-04 10:10:14 UTC
(In reply to Germano Veit Michel from comment #0)
> How reproducible:
> 0% on 4.4.7, tried 2 ways:
> - In admin portal, select multiple disks and activate/deactivate
> simultaneously
> - python sdk doing 20 hotplugs in a loop.

In light of the above and considering that the fix for bz 1927243 is likely to solve it, I'm closing it as CURRENTRELEASE on 4.4.7

Comment 5 Germano Veit Michel 2021-08-04 22:15:55 UTC
(In reply to Arik from comment #4)
> (In reply to Germano Veit Michel from comment #0)
> > How reproducible:
> > 0% on 4.4.7, tried 2 ways:
> > - In admin portal, select multiple disks and activate/deactivate
> > simultaneously
> > - python sdk doing 20 hotplugs in a loop.
> 
> In light of the above and considering that the fix for bz 1927243 is likely
> to solve it, I'm closing it as CURRENTRELEASE on 4.4.7

Thanks, doing a KCS to put all the fixes together so we don't raise this bug again :)

Comment 6 Germano Veit Michel 2021-08-06 06:15:18 UTC
Arik, I'm going to re-open this. Customer upgraded to 4.4.7 and it still happens for them, and I still can't reproduce.

So I assume the bug is somehow still there and there, maybe just harder to hit. See:

2021-08-04 19:07:53,870+02 INFO  [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 40) [] Running ovirt-engine 4.4.7.7-0.1.el8ev

2021-08-06 06:45:42,883+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-929) [146bf104-e8bb-4b21-8aa0-80958e77d9cb] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      ...
      <address bus="0" controller="0" unit="1" type="drive" target="0"/>
      <serial>34f1d6e1-e9c0-493c-9009-340764473212</serial>
      ...
      
      
2021-08-06 06:45:43,568+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (default task-932) [f758b383-6534-45f1-9bba-d282d4576472] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug>
      ...
      <address bus="0" controller="0" unit="1" type="drive" target="0"/>
      <serial>1b43b270-1c5d-432e-80a1-c1ca07e3522b</serial>
      ...

As above, the addresses are duplicate, the first succeeded but the second, with correlation f758b383-6534-45f1-9bba-d282d4576472, failed as expected:

2021-08-06 06:45:44,402+02 ERROR [org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand] (default task-932) [f758b383-6534-45f1-9bba-d282d4576472] Command 'org.ovirt.engine.core.bll.storage.disk.AttachDiskToVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HotPlugDiskVDS, error = Requested operation is not valid: Domain already contains a disk with that address, code = 45 (Failed with error FailedToPlugDisk and code 45)

Comment 17 Evelina Shames 2021-08-19 12:23:56 UTC
Since this issue is really hard to reproduce, we decided to verify it by running our hotplug regression tests: All tests passed.
Verified on ovirt-engine-4.4.8.4-0.7.el8ev.

Moving to 'VERIFIED'.

Comment 22 errata-xmlrpc 2021-09-08 14:12:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Manager (ovirt-engine) [ovirt-4.4.8]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3460