Bug 1446920 - [downstream clone - 4.1.2] ERROR: duplicate key value violates unique constraint "pk_unregistered_disks_to_vms"
Summary: [downstream clone - 4.1.2] ERROR: duplicate key value violates unique constra...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.0.6
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ovirt-4.1.2
: ---
Assignee: Maor
QA Contact: Eyal Shenitzky
URL:
Whiteboard:
Depends On: 1430865
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-30 14:49 UTC by rhev-integ
Modified: 2023-09-07 18:52 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, if snapshots of a virtual machine with disks attached were deleted, and a disaster occurred before the OVF_STORE was updated with this change, reattaching the storage domain during disaster recovery would fail because the OVF of the virtual machine incorrectly indicated that there were disks with snapshots. Now, the XML parser of the OVF uses 'set' instead of 'list', so that even if there are snapshots in the virtual machine's OVF, they are counted only once, so attaching the storage domain succeeds.
Clone Of: 1430865
Environment:
Last Closed: 2017-05-24 11:24:01 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1280 0 normal SHIPPED_LIVE Red Hat Virtualization Manager (ovirt-engine) 4.1.2 2017-05-24 15:18:48 UTC
oVirt gerrit 74280 0 master MERGED core: Use set for disk ids fetched from VM's OVF. 2017-04-30 14:50:48 UTC
oVirt gerrit 74320 0 ovirt-engine-4.1 MERGED core: Use set for disk ids fetched from VM's OVF. 2017-04-30 14:50:48 UTC

Description rhev-integ 2017-04-30 14:49:11 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1430865 +++
======================================================================

Description of problem:

Unable to attach a data storage domain that was detached/removed

Version-Release number of selected component (if applicable):

4.0.6.3-0.1.el7ev

How reproducible:

Not always

Steps to Reproduce:

Unsure - while detaching the data storage domain, the OVS network went offline in another data center while this was occuring

Actual results:

Adding the storage domain back and attaching fails with the following errors:

[....]
2017-03-09 11:47:59,861 ERROR [org.ovirt.engine.core.bll.storage.disk.image.RegisterDiskCommand] (org.ovirt.thread.pool-6-thread-33) [2ffff1f4] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.disk.image.RegisterDiskCommand'.
2017-03-09 11:47:59,861 ERROR [org.ovirt.engine.core.bll.storage.disk.image.RegisterDiskCommand] (org.ovirt.thread.pool-6-thread-33) [2ffff1f4] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.disk.image.RegisterDiskCommand'.
2017-03-09 11:47:59,861 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (org.ovirt.thread.pool-6-thread-33) [2ffff1f4] transaction rolled back
2017-03-09 11:47:59,862 ERROR [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-6-thread-33) [2ffff1f4] Command 'org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand' failed: CallableStatementCallback; SQL [{call insertunregistereddiskstovms(?, ?, ?, ?)}]; ERROR: duplicate key value violates unique constraint "pk_unregistered_disks_to_vms"
  Detail: Key (disk_id, entity_id, storage_domain_id)=(f1f4cd1e-6bc7-4915-991c-49cb0abb9aef, a5e20a64-4160-4963-8dc0-68b86410c910, 9b161463-ce93-45e7-8f14-074d670dd32b) already exists.
  Where: SQL statement "INSERT INTO unregistered_disks_to_vms (
        disk_id,
        entity_id,
        entity_name,
        storage_domain_id
        )
    VALUES (
        v_disk_id,
        v_entity_id,
        v_entity_name,
        v_storage_domain_id
        )"
PL/pgSQL function insertunregistereddiskstovms(uuid,uuid,character varying,uuid) line 3 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "pk_unregistered_disks_to_vms"
[....]

Expected results:

Attaching a storage domain back to its original location should work properly 

Additional info:

All engine.log output is located here:

http://pastebin.test.redhat.com/463201 - see lines 409 through 426 for the messages above.

(Originally by Sam Yangsao)

Comment 5 rhev-integ 2017-04-30 14:49:36 UTC
This constraint was introduced in RHV 4.0.1 as part of the fix for bug 1302780. Maor - can you take a look please?

(Originally by Allon Mureinik)

Comment 6 rhev-integ 2017-04-30 14:49:41 UTC
Hi Sam,

Just wanted to clear something.
You wrote that you have one DC with 2 clusters, Dell-cluster and HP-cluster.
by clusters, did you mean linux clusters to get High Availability for RHEV-M?

(Originally by Maor Lipchuk)

Comment 7 rhev-integ 2017-04-30 14:49:47 UTC
(In reply to Maor from comment #5)
> Hi Sam,
> 
> Just wanted to clear something.
> You wrote that you have one DC with 2 clusters, Dell-cluster and HP-cluster.
> by clusters, did you mean linux clusters to get High Availability for RHEV-M?

Hey Maor,

It's 1 Data center with 2 clusters.  No HA for the RHV-M :)

Thanks!

(Originally by Sam Yangsao)

Comment 11 rhev-integ 2017-04-30 14:50:09 UTC
Hi,

I will take a look at it first think tomorrow morning.
Thank you for the info

(Originally by Maor Lipchuk)

Comment 14 rhev-integ 2017-04-30 14:50:25 UTC
Hi Sam,

I think that I found the issue, thank you very much for your help and the access to your env, that was much helpful and reduced the time finding the issue.

It looks like there were VMs with disks and snapshots and some of the snapshots got deleted before there was an OVF update in the OVF_STORE disk.
In that point of time the OVF of the VM indicated the disks contains snapshots.
while those disks were without any snapshots.
Once the storage domain got attached those disks were fetched as potential disks to register which were part of the VMs also.

There seem to be a bug in the xml parser of the OVF that add those disks the VMs which those are attached to, since the XML was not updated after the removal of the snapshots those disks were initialized with VMs there were attached to, although those VMs were actually the same VM and that caused the SQL exception.

Steps to reproduce:
1. Create a VM with disks and snapshot
2. Delete the snapshot
3. force remove the storage domain (do not deactivate it since the OVF_STORE will be updated this way)
4. Try to attach the storage domain back again to the Data Center

I will post a patch that fixes it.
Thank you again for your help

(Originally by Maor Lipchuk)

Comment 15 rhev-integ 2017-04-30 14:50:31 UTC
Awe(In reply to Maor from comment #13)
> Hi Sam,
> 
> I think that I found the issue, thank you very much for your help and the
> access to your env, that was much helpful and reduced the time finding the
> issue.
> 
> It looks like there were VMs with disks and snapshots and some of the
> snapshots got deleted before there was an OVF update in the OVF_STORE disk.
> In that point of time the OVF of the VM indicated the disks contains
> snapshots.
> while those disks were without any snapshots.
> Once the storage domain got attached those disks were fetched as potential
> disks to register which were part of the VMs also.
> 
> There seem to be a bug in the xml parser of the OVF that add those disks the
> VMs which those are attached to, since the XML was not updated after the
> removal of the snapshots those disks were initialized with VMs there were
> attached to, although those VMs were actually the same VM and that caused
> the SQL exception.
> 
> Steps to reproduce:
> 1. Create a VM with disks and snapshot
> 2. Delete the snapshot
> 3. force remove the storage domain (do not deactivate it since the OVF_STORE
> will be updated this way)
> 4. Try to attach the storage domain back again to the Data Center
> 
> I will post a patch that fixes it.
> Thank you again for your help

Awesome, thanks for your hardwork in finding this Maor.

(Originally by Sam Yangsao)

Comment 16 rhev-integ 2017-04-30 14:50:37 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[FOUND CLONE FLAGS: ['rhevm-4.1.z', 'rhevm-4.2-ga'], ]

For more info please contact: rhv-devops

(Originally by rhev-integ)

Comment 17 Eyal Shenitzky 2017-05-01 05:51:00 UTC
Verified with the following code:
---------------------------------
vdsm-4.19.11-1.el7ev.x86_64
rhevm-4.1.2-0.1.el7

Steps to reproduce:
------------------------------------------
1. Create a VM with disks and snapshot
2. Delete the snapshot
3. force remove the storage domain (do not deactivate it since the OVF_STORE will be updated this way)
4. Try to attach the storage domain back again to the Data Center

Moving to VERIFIED

Comment 19 errata-xmlrpc 2017-05-24 11:24:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1280


Note You need to log in before you can comment on or make changes to this bug.