Bug 1446920

Summary: [downstream clone - 4.1.2] ERROR: duplicate key value violates unique constraint "pk_unregistered_disks_to_vms"
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-engineAssignee: Maor <mlipchuk>
Status: CLOSED ERRATA QA Contact: Eyal Shenitzky <eshenitz>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.0.6CC: amureini, lsurette, mkalinin, mlipchuk, pstehlik, ratamir, rbalakri, Rhev-m-bugs, srevivo, syangsao, tnisan, trichard, ykaul, ylavi
Target Milestone: ovirt-4.1.2Keywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, if snapshots of a virtual machine with disks attached were deleted, and a disaster occurred before the OVF_STORE was updated with this change, reattaching the storage domain during disaster recovery would fail because the OVF of the virtual machine incorrectly indicated that there were disks with snapshots. Now, the XML parser of the OVF uses 'set' instead of 'list', so that even if there are snapshots in the virtual machine's OVF, they are counted only once, so attaching the storage domain succeeds.
Story Points: ---
Clone Of: 1430865 Environment:
Last Closed: 2017-05-24 11:24:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1430865    
Bug Blocks:    

Description rhev-integ 2017-04-30 14:49:11 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1430865 +++
======================================================================

Description of problem:

Unable to attach a data storage domain that was detached/removed

Version-Release number of selected component (if applicable):

4.0.6.3-0.1.el7ev

How reproducible:

Not always

Steps to Reproduce:

Unsure - while detaching the data storage domain, the OVS network went offline in another data center while this was occuring

Actual results:

Adding the storage domain back and attaching fails with the following errors:

[....]
2017-03-09 11:47:59,861 ERROR [org.ovirt.engine.core.bll.storage.disk.image.RegisterDiskCommand] (org.ovirt.thread.pool-6-thread-33) [2ffff1f4] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.disk.image.RegisterDiskCommand'.
2017-03-09 11:47:59,861 ERROR [org.ovirt.engine.core.bll.storage.disk.image.RegisterDiskCommand] (org.ovirt.thread.pool-6-thread-33) [2ffff1f4] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.disk.image.RegisterDiskCommand'.
2017-03-09 11:47:59,861 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (org.ovirt.thread.pool-6-thread-33) [2ffff1f4] transaction rolled back
2017-03-09 11:47:59,862 ERROR [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (org.ovirt.thread.pool-6-thread-33) [2ffff1f4] Command 'org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand' failed: CallableStatementCallback; SQL [{call insertunregistereddiskstovms(?, ?, ?, ?)}]; ERROR: duplicate key value violates unique constraint "pk_unregistered_disks_to_vms"
  Detail: Key (disk_id, entity_id, storage_domain_id)=(f1f4cd1e-6bc7-4915-991c-49cb0abb9aef, a5e20a64-4160-4963-8dc0-68b86410c910, 9b161463-ce93-45e7-8f14-074d670dd32b) already exists.
  Where: SQL statement "INSERT INTO unregistered_disks_to_vms (
        disk_id,
        entity_id,
        entity_name,
        storage_domain_id
        )
    VALUES (
        v_disk_id,
        v_entity_id,
        v_entity_name,
        v_storage_domain_id
        )"
PL/pgSQL function insertunregistereddiskstovms(uuid,uuid,character varying,uuid) line 3 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "pk_unregistered_disks_to_vms"
[....]

Expected results:

Attaching a storage domain back to its original location should work properly 

Additional info:

All engine.log output is located here:

http://pastebin.test.redhat.com/463201 - see lines 409 through 426 for the messages above.

(Originally by Sam Yangsao)

Comment 5 rhev-integ 2017-04-30 14:49:36 UTC
This constraint was introduced in RHV 4.0.1 as part of the fix for bug 1302780. Maor - can you take a look please?

(Originally by Allon Mureinik)

Comment 6 rhev-integ 2017-04-30 14:49:41 UTC
Hi Sam,

Just wanted to clear something.
You wrote that you have one DC with 2 clusters, Dell-cluster and HP-cluster.
by clusters, did you mean linux clusters to get High Availability for RHEV-M?

(Originally by Maor Lipchuk)

Comment 7 rhev-integ 2017-04-30 14:49:47 UTC
(In reply to Maor from comment #5)
> Hi Sam,
> 
> Just wanted to clear something.
> You wrote that you have one DC with 2 clusters, Dell-cluster and HP-cluster.
> by clusters, did you mean linux clusters to get High Availability for RHEV-M?

Hey Maor,

It's 1 Data center with 2 clusters.  No HA for the RHV-M :)

Thanks!

(Originally by Sam Yangsao)

Comment 11 rhev-integ 2017-04-30 14:50:09 UTC
Hi,

I will take a look at it first think tomorrow morning.
Thank you for the info

(Originally by Maor Lipchuk)

Comment 14 rhev-integ 2017-04-30 14:50:25 UTC
Hi Sam,

I think that I found the issue, thank you very much for your help and the access to your env, that was much helpful and reduced the time finding the issue.

It looks like there were VMs with disks and snapshots and some of the snapshots got deleted before there was an OVF update in the OVF_STORE disk.
In that point of time the OVF of the VM indicated the disks contains snapshots.
while those disks were without any snapshots.
Once the storage domain got attached those disks were fetched as potential disks to register which were part of the VMs also.

There seem to be a bug in the xml parser of the OVF that add those disks the VMs which those are attached to, since the XML was not updated after the removal of the snapshots those disks were initialized with VMs there were attached to, although those VMs were actually the same VM and that caused the SQL exception.

Steps to reproduce:
1. Create a VM with disks and snapshot
2. Delete the snapshot
3. force remove the storage domain (do not deactivate it since the OVF_STORE will be updated this way)
4. Try to attach the storage domain back again to the Data Center

I will post a patch that fixes it.
Thank you again for your help

(Originally by Maor Lipchuk)

Comment 15 rhev-integ 2017-04-30 14:50:31 UTC
Awe(In reply to Maor from comment #13)
> Hi Sam,
> 
> I think that I found the issue, thank you very much for your help and the
> access to your env, that was much helpful and reduced the time finding the
> issue.
> 
> It looks like there were VMs with disks and snapshots and some of the
> snapshots got deleted before there was an OVF update in the OVF_STORE disk.
> In that point of time the OVF of the VM indicated the disks contains
> snapshots.
> while those disks were without any snapshots.
> Once the storage domain got attached those disks were fetched as potential
> disks to register which were part of the VMs also.
> 
> There seem to be a bug in the xml parser of the OVF that add those disks the
> VMs which those are attached to, since the XML was not updated after the
> removal of the snapshots those disks were initialized with VMs there were
> attached to, although those VMs were actually the same VM and that caused
> the SQL exception.
> 
> Steps to reproduce:
> 1. Create a VM with disks and snapshot
> 2. Delete the snapshot
> 3. force remove the storage domain (do not deactivate it since the OVF_STORE
> will be updated this way)
> 4. Try to attach the storage domain back again to the Data Center
> 
> I will post a patch that fixes it.
> Thank you again for your help

Awesome, thanks for your hardwork in finding this Maor.

(Originally by Sam Yangsao)

Comment 16 rhev-integ 2017-04-30 14:50:37 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[FOUND CLONE FLAGS: ['rhevm-4.1.z', 'rhevm-4.2-ga'], ]

For more info please contact: rhv-devops

(Originally by rhev-integ)

Comment 17 Eyal Shenitzky 2017-05-01 05:51:00 UTC
Verified with the following code:
---------------------------------
vdsm-4.19.11-1.el7ev.x86_64
rhevm-4.1.2-0.1.el7

Steps to reproduce:
------------------------------------------
1. Create a VM with disks and snapshot
2. Delete the snapshot
3. force remove the storage domain (do not deactivate it since the OVF_STORE will be updated this way)
4. Try to attach the storage domain back again to the Data Center

Moving to VERIFIED

Comment 19 errata-xmlrpc 2017-05-24 11:24:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1280