Bug 1214408 - Importing storage domains into an uninitialized datacenter leads to duplicate OVF_STORE disks being created, and can cause catastrophic loss of VM configuration data
Summary: Importing storage domains into an uninitialized datacenter leads to duplicate...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Maor
QA Contact: lkuchlan
URL:
Whiteboard:
Depends On:
Blocks: 902971 1217339 1218733
TreeView+ depends on / blocked
 
Reported: 2015-04-22 16:21 UTC by James W. Mills
Modified: 2019-06-13 08:29 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, when importing an existing, clean storage domain that contains OVF_STORE disks from an old setup to an uninitialized data center, the OVF_STORE disks did not get registered after the data center was initialized and all virtual machine information was lost. With this update, when importing clean storage domains to an uninitialized data center, the OVF_STORE disks are registered correctly, and new unregistered entities are available in the Administration Portal under the Storage tab. In addition, storage domains with dirty metadata cannot be imported to uninitialized data centers.
Clone Of:
: 1217339 (view as bug list)
Environment:
Last Closed: 2016-03-09 21:05:04 UTC
oVirt Team: Storage
ylavi: Triaged+


Attachments (Terms of Use)
engine.log capturing new OVF_STORE creation (186.06 KB, text/plain)
2015-04-27 15:35 UTC, James W. Mills
no flags Details
vdsm log well before and during duplicate OVF_STORE creation (1.42 MB, text/plain)
2015-04-27 15:36 UTC, James W. Mills
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:0376 normal SHIPPED_LIVE Red Hat Enterprise Virtualization Manager 3.6.0 2016-03-10 01:20:52 UTC
oVirt gerrit 40349 master MERGED core: Refactor register of OVF_STORE disks Never
oVirt gerrit 40350 master MERGED core: Add OVF_STORE disks registration on Data Center initialization Never
oVirt gerrit 40368 master MERGED core: Check return value from getUnregisteredDisks Never
oVirt gerrit 40464 master MERGED core: Reset OVF_STORE disks list Never

Description James W. Mills 2015-04-22 16:21:06 UTC
Description of problem:

Cleanly detaching an SD from one RHEVM and importing it into an uninitialized DC in a new RHEVM ignores the existing OVF_STORES and creates new ones.  THis leads to the customer not seeing the "VM Import" tab.  Any attempt by the customer to move the SD back to the original RHEVM in order to recover the missing VM definitions can result in ALL VM CONFIGURATION BEING OVERWRITTEN.


Version-Release number of selected component (if applicable):

rhevm 3.5.0-0.32.el6ev

How reproducible:

100%

Steps to Reproduce:

* Cleanly detach SD from original RHEVM

* Import into an uninitialized DC on new RHEVM

* wait until the OVF update occurs

* Check for the "VM Import" tab - It will not exist

* Verify OVF_STORE contents (and number) - FAIL - Two new OVF_STORE disks were created.  The old OVF_STORE disks were ignored, but still contained the original VM information.

At this point, the customer has successfully imported the SD from one RHEVM to another, but cannot see any VM import information.  Behind the scenes, RHEV has ignored the original primary/secondary OVF_STORE disks and created two new ones.  Of course, the customer cannot see this via the UI.  At this point, the customer wants to revert back to the original configuration in order to recover the missing VM information.  This can be done several ways:

* Scenario 1 - Detatch SD from new RHEVM - FAIL!  Cannot detach from a DC
  with only one SD!  So, force remove the DC.  Import into already initialized DC back into the original RHEVM - FAIL - Although the import will succeed, the import process sees ALL FOUR OVF_STORE disks, and uses the newer ones to determine what VMs are available for import

* Scenario 2 - DO nothing to the new RHEVM, just try and import the SD again on the original RHEVM - FAIL - Unable to attach the SD back into the initialized DC on the original RHEVM.

* Scenario 3 - Power off the new RHEVM and RHEVH, re-import on the original RHEVM -FAIL - Although the import will succeed, the import process sees ALL FOUR OVF_STORE disks, and uses the newer ones to determine what VMs are available for importst the first two), and there was no "VM Import" tab -

* Scenario 4 - Add another SD to the new RHEVM DC and make it master, detach the SD cleanly,  and re-import to original RHEVM DC - FAIL - Although the import will succeed, the import process sees ALL FOUR OVF_STORE disks, and uses the newer ones to determine what VMs are available for importst the first two), and there was no "VM Import" tab -

In all of the above scenarios where the re-import in to the original RHEVM was successful (1, 3, and 4), all VM information was destroyed when the OVF update process kicked off since it used the newer OVF_STORE information and propagated it to ALL OVF_STORE disks.



Actual results:

Import succeeds, but no VMs available for import.  Any attempt to move back to the original RHEVM resulted in catastrophic data loss

Expected results:

Import succeeds, original OVF_STORES were found, and VM information was available for import.

Additional info:

In order to run this test, I lowered the "OvfUpdateIntervalInMinutes" from 60 to 5 in order to see the results more quickly.

Comment 2 Allon Mureinik 2015-04-26 09:16:55 UTC
Maor - please take a look.
This looks very familiar to something you already solved for 3.5.1, no?

Comment 3 Maor 2015-04-26 13:42:37 UTC
This is indeed a duplicate of bug https://bugzilla.redhat.com/1138114
which was fixed in version org.ovirt.engine-root-3.5.0-13

Comment 5 Allon Mureinik 2015-04-26 15:53:56 UTC
(In reply to Maor from comment #3)
> This is indeed a duplicate of bug https://bugzilla.redhat.com/1138114
> which was fixed in version org.ovirt.engine-root-3.5.0-13

I don't understand how this is possible.
Bug 1138114 was solved in 3.5.0 vt4, and the customer is using the GA release.

Is it possible we have a regression on out hands?

Comment 6 Maor 2015-04-26 17:58:16 UTC
(In reply to Allon Mureinik from comment #5)
> (In reply to Maor from comment #3)
> > This is indeed a duplicate of bug https://bugzilla.redhat.com/1138114
> > which was fixed in version org.ovirt.engine-root-3.5.0-13
> 
> I don't understand how this is possible.
> Bug 1138114 was solved in 3.5.0 vt4, and the customer is using the GA
> release.
> 
> Is it possible we have a regression on out hands?

No, or at least it doesn't look like that on my setup.
Maybe the problem is something different, maybe the OVF_STORE disks are not valid to read from.
James, can u please attach the engine and VDSM logs?

Comment 7 James W. Mills 2015-04-27 15:34:43 UTC
Maor, 

As you and I discussed via email, I can easily reproduce this, even on a single RHEVM instance.  The key is to detatch/remove the SD (cleanly), and then import it into a RHEVM and attach it to an *uninitialized* DC.  I have recreated this scenario many times at this point, using two RHEVM instances and also just using a single RHEVM instance.  Here is the scenario I used:

* Create a single VM/disk on the SD

Created "TESTVM1" with a single 1GB drive

* Put the SD into maintenance

I can verify the OVF_STORE disks have been created at this point
directly in the FS:

# grep OVF */*meta
44e10311-f0f2-4cae-8dca-5e7a70e68684/d5f91f24-1cc0-41cc-a1d2-fe4add91dabf.meta:DESCRIPTION={"Updated":true,"Disk
Description":"OVF_STORE","Storage
Domains":[{"uuid":"1012643c-8407-4670-9bae-cd99e9fcd5ab"}],"Last
Updated":"Mon Apr 27 09:31:25 CDT 2015","Size":10240}
57ecb640-c8c6-4c09-aec6-3e990cb2fcf6/677436b3-3f70-4f26-99e7-20d4f807fcaf.meta:DESCRIPTION={"Updated":true,"Disk
Description":"OVF_STORE","Storage
Domains":[{"uuid":"1012643c-8407-4670-9bae-cd99e9fcd5ab"}],"Last
Updated":"Mon Apr 27 09:31:25 CDT 2015","Size":10240}

# strings  44e10311-f0f2-4cae-8dca-5e7a70e68684/d5f91f24-1cc0-41cc-a1d2-fe4add91dabf

...
<?xml version='1.0' encoding='UTF-8'?><ovf:Envelope
...

* Detach and Remove the SD (without formatting)

At this point, I still see the OVF_STORE, there are still only two of
them, and the XML information is still intact.

* Create a new DC in the same RHEVM instance with no attached SD

* Import the SD back into RHEVM

* Attach the imported SD to the uninitialized DC

At this point, it is exactly the same as before.  The original OVF_STORE disks were ignored, and two new ones were created.

I've attached logs from engine/vdsm for the above scenario.

Just to be extra thorough, and also to verify that the original OVF_STORE images were still perfectly readable, I added a new SD to the DC, detached/removed the SD with the duplicate OVF_STORE disks, *removed* the newly added OVF_STORE images on the filesystem, and then re-imported the SD into the DC (which is now initialized as opposed to uninitialized).

The "TESTVM1" VM and disk were available for import.

To be perfectly clear, the problem appears to be with importing SD into uninitialized DC.

Thanks!
~james

Comment 8 James W. Mills 2015-04-27 15:35:43 UTC
Created attachment 1019358 [details]
engine.log capturing new OVF_STORE creation

Comment 9 James W. Mills 2015-04-27 15:36:32 UTC
Created attachment 1019359 [details]
vdsm log well before and during duplicate OVF_STORE creation

Comment 11 Maor 2015-04-27 18:42:12 UTC
Thanks for the logs James,
It looks that the problem is that the customer used an uninitialized Storage Pool to attach the imported Storage Domain.(see [1]) although engine did not block this operation.

I'm working for a fix for that so the user will be able to attach a "detached" Storage Domain to an uninitialized Storage Pool with existing OVF_STORE disks.


[1]
http://www.ovirt.org/Features/ImportStorageDomain#Restrictions
"Attaching an imported Storage Domain can only be applied with an initialized Data Center."

Comment 12 Allon Mureinik 2015-04-28 08:09:59 UTC
(In reply to Maor from comment #11)
> Thanks for the logs James,
> It looks that the problem is that the customer used an uninitialized Storage
> Pool to attach the imported Storage Domain.(see [1]) although engine did not
> block this operation.
Note that in RHEV-M 3.5.1, this operation will be blocked with a user-friendly message (see bug 1178646), so the customer will get a clear indication what he's doing wrong, and eliminate the risk of potential data loss.

Reducing priority based on this analysis.
PM/GSS stakeholders, please chime in if you disagree with this move.

Comment 29 lkuchlan 2015-06-14 11:43:03 UTC
Tested using:
ovirt-engine-3.6.0-0.0.master.20150519172219.git9a2e2b3.el6.noarch
vdsm-4.17.0-749.git649f00a.el7.x86_64

Verification instructions:
1. Cleanly detach SD from original RHEVM
2. Import into an uninitialized DC on new RHEVM
3. wait until the OVF update occurs
4. Check for the "VM Import" tab - It should exist

Results:
The import succeeds and the VMs available for import

Comment 31 errata-xmlrpc 2016-03-09 21:05:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0376.html


Note You need to log in before you can comment on or make changes to this bug.