Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1623467

Summary: Failure to remove backup VM snapshot with cluster levels < 4.2
Product: [oVirt] ovirt-engine Reporter: Milan Zamazal <mzamazal>
Component: Backup-Restore.VMsAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED CURRENTRELEASE QA Contact: Roni <reliezer>
Severity: unspecified Docs Contact: bugs <bugs>
Priority: unspecified    
Version: 4.3.0CC: aefrat, bugs, bzlotnik, eshenitz, fromani, reliezer, tnisan
Target Milestone: ovirt-4.3.1Flags: rule-engine: ovirt-4.3+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: VerificationWeek
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-13 16:39:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Milan Zamazal 2018-08-29 12:43:40 UTC
Description of problem:

When oVirt system tests basic-suite-master is run with cluster level < 4.2, verify_backup_snapshot_removed fails on timeout. I can see error like this in engine.log:

2018-08-28 09:37:13,621-04 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-5) [3c61bbaa-f281-4131-8aee-f803d131df4c] Failed to live merge. Top volume 71c185c2-2813-414f-bca2-318146078153 is still in qemu chain [b8a303c6-6c2a-4adf-be7f-a997628b59a2, 71c185c2-2813-414f-bca2-318146078153]

Version-Release number of selected component (if applicable):

git master

How reproducible:

100%

Steps to Reproduce:

In oVirt system tests:

1. export OST_DC_VERSION=4.1
2. ./run_suite.sh basic-suite-master

Actual results:

verify_backup_snapshot_removed test fails on timeout.

Expected results:

verify_backup_snapshot_removed test passes.

Additional info:

See e.g. http://jenkins.ovirt.org/job/ovirt-system-tests_compat-3.6-suite-master/111/.

I'm going to disable the failing test for cluster versions < 4.2, so you may need to enable it again to reproduce the error.

Comment 1 Tal Nisan 2018-08-30 11:15:36 UTC
Ala, this issue rings a bell, I think you've handled something similar, isn't it?

Comment 2 Ala Hino 2018-08-30 11:23:59 UTC
(In reply to Tal Nisan from comment #1)
> Ala, this issue rings a bell, I think you've handled something similar,
> isn't it?

This might in the same area of bug 1594890.

Comment 3 Ala Hino 2018-08-30 12:02:17 UTC
(In reply to Ala Hino from comment #2)
> (In reply to Tal Nisan from comment #1)
> > Ala, this issue rings a bell, I think you've handled something similar,
> > isn't it?
> 
> This might in the same area of bug 1594890.

My bad. This bug 1554369.

Comment 4 Tal Nisan 2018-09-02 14:48:33 UTC
(In reply to Ala Hino from comment #3)
> (In reply to Ala Hino from comment #2)
> > (In reply to Tal Nisan from comment #1)
> > > Ala, this issue rings a bell, I think you've handled something similar,
> > > isn't it?
> > 
> > This might in the same area of bug 1594890.
> 
> My bad. This bug 1554369.

Which is fixed in 4.3 only, 4.2.z still has this bug, right?
In that case is it a dup of bug 1554369?

Comment 5 Milan Zamazal 2018-09-03 07:40:25 UTC
Please note the reported bug (still) appears in OST run on master, so contingent fixes of similar bugs in master haven't fixed this one. To make it clear: This bug happens in master (other versions haven't been checked) with cluster compatibility < 4.2.

Comment 6 Ala Hino 2018-09-16 10:36:23 UTC
(In reply to Tal Nisan from comment #4)
> (In reply to Ala Hino from comment #3)
> > (In reply to Ala Hino from comment #2)
> > > (In reply to Tal Nisan from comment #1)
> > > > Ala, this issue rings a bell, I think you've handled something similar,
> > > > isn't it?
> > > 
> > > This might in the same area of bug 1594890.
> > 
> > My bad. This bug 1554369.
> 
> Which is fixed in 4.3 only, 4.2.z still has this bug, right?
> In that case is it a dup of bug 1554369?

Bug 1554369 wasn't related to cluster level.
I have to dig more and see what the root cause of this one is.

Comment 7 Benny Zlotnik 2018-10-13 14:32:17 UTC
It can be reproduced manually as well by simply performing live merge in a 4.1 cluster.

It seems like there is a discrepancy between the VM xml libvirt holds and what vdsm reports to the engine upon execution of MERGE_STATUS.
The discrepancy for this specific cluster level seems to be related to the fact DomainXML was introduced in 4.2.

After a failed live merge attempt, I run
$ virsh -r dumpxml vm41
<domain type='kvm' id='29'>
  <name>vm41</name>
  <uuid>e640d129-4d19-4d4b-bbae-829671fa7dd7</uuid>
  <metadata xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
    <ovirt-tune:qos/>
    <ovirt-vm:vm xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
    <ovirt-vm:block_jobs>{}</ovirt-vm:block_jobs>
    <ovirt-vm:destroy_on_reboot type="bool">False</ovirt-vm:destroy_on_reboot>
    <ovirt-vm:memGuaranteedSize type="int">2</ovirt-vm:memGuaranteedSize>
    <ovirt-vm:startTime type="float">1539428553.61</ovirt-vm:startTime>
    <ovirt-vm:device devtype="balloon" name="balloon0">

...
        <ovirt-vm:volumeChain>
            <ovirt-vm:volumeChainNode>
                <ovirt-vm:domainID>33b47b70-f979-4b67-bea9-005de84f3c39</ovirt-vm:domainID>
                <ovirt-vm:imageID>51aa4556-d26b-4482-8137-2979ce73a43b</ovirt-vm:imageID>
                <ovirt-vm:leaseOffset type="int">0</ovirt-vm:leaseOffset>
                <ovirt-vm:leasePath>/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/12edadfa-7a34-4e3e-afdb-c96f46eac99a.lease</ovirt-vm:leasePath>
                <ovirt-vm:path>/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/12edadfa-7a34-4e3e-afdb-c96f46eac99a</ovirt-vm:path>
                <ovirt-vm:volumeID>12edadfa-7a34-4e3e-afdb-c96f46eac99a</ovirt-vm:volumeID>
            </ovirt-vm:volumeChainNode>
        </ovirt-vm:volumeChain>
    </ovirt-vm:device>
</ovirt-vm:vm>
  </metadata>
...
</domain>

As can be seen there is only a single node in the volume chain.

However, if I run
$ vdsm-client Host getVMFullList
[
...
                "xml": ...
                "volumeInfo": {
                    "path": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/23d071c7-1a37-481f-8d11-95b707653f74", 
                    "type": "file"
                }, 
                "serial": "51aa4556-d26b-4482-8137-2979ce73a43b", 
                "index": 0, 
                "iface": "scsi", 
                "apparentsize": "1073741824", 
                "alias": "scsi0-0-0-0", 
                "cache": "none", 
                "imageID": "51aa4556-d26b-4482-8137-2979ce73a43b", 
                "readonly": "False", 
                "shared": "false", 
                "truesize": "0", 
                "type": "disk", 
                "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39", 
                "reqsize": "0", 
                "format": "cow", 
                "deviceId": "51aa4556-d26b-4482-8137-2979ce73a43b", 
                "poolID": "bc6ce3e4-07c8-4d98-858d-868dd8aa33e8", 
                "device": "disk", 
                "path": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/23d071c7-1a37-481f-8d11-95b707653f74", 
                "propagateErrors": "off", 
                "name": "sda", 
                "vm_custom": {}, 
                "bootOrder": "1", 
                "vmid": "e640d129-4d19-4d4b-bbae-829671fa7dd7", 
                "volumeID": "23d071c7-1a37-481f-8d11-95b707653f74", 
                "diskType": "file", 
                "specParams": {}, 
                "discard": false, 
                "volumeChain": [
                    {
                        "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39", 
                        "leaseOffset": 0, 
                        "volumeID": "12edadfa-7a34-4e3e-afdb-c96f46eac99a", 
                        "leasePath": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/12edadfa-7a34-4e3e-afdb-c96f46eac99a.lease", 
                        "imageID": "51aa4556-d26b-4482-8137-2979ce73a43b", 
                        "path": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/12edadfa-7a34-4e3e-afdb-c96f46eac99a"
                    }, 
                    {
                        "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39", 
                        "leaseOffset": 0, 
                        "volumeID": "23d071c7-1a37-481f-8d11-95b707653f74", 
                        "leasePath": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/23d071c7-1a37-481f-8d11-95b707653f74.lease", 
                        "imageID": "51aa4556-d26b-4482-8137-2979ce73a43b", 
                        "path": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/23d071c7-1a37-481f-8d11-95b707653f74"
                    }
                ]
            }, 
]

We can see there are two nodes, this is what's reported to the engine when executing MERGE_STATUS.
The xml property inside the output however, contains the correct libvirt xml without the extra VolumaChainNode.

It seems the self.conf is not synchronized with the xml, specifically the "devices" property

I edited vm.py#untrackBlockJob
and added 
        disk_params = vmdevices.common.storage_device_params_from_domain_xml(self.id, self._domain, self._md_desc, self.log)
        self._override_disk_device_config(disk_params)

This seems to have resolved the issue.
I added logs of self.conf before and after:

Before:
{
   "devices": [
      {
         ....
      {
         "address": {
            "bus": "0",
            "controller": "0",
            "target": "0",
            "type": "drive",
            "unit": "0"
         },
         "alias": "scsi0-0-0-0",
         "apparentsize": "1073741824",
         "bootOrder": "1",
         "cache": "none",
         "device": "disk",
         "deviceId": "b16a5774-7507-4162-a8d1-7deaf82bedef",
         "discard": false,
         "diskType": "file",
         "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39",
         "format": "cow",
         "iface": "scsi",
         "imageID": "b16a5774-7507-4162-a8d1-7deaf82bedef",
         "index": 0,
         "name": "sda",
         "path": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/b16a5774-7507-4162-a8d1-7deaf82bedef/6c7d1d8a-2566-4a25-a1d0-a93dfc690cad",
         "poolID": "bc6ce3e4-07c8-4d98-858d-868dd8aa33e8",
         "propagateErrors": "off",
         "readonly": "False",
         "reqsize": "0",
         "serial": "b16a5774-7507-4162-a8d1-7deaf82bedef",
         "shared": "false",
         "specParams": {},
         "truesize": "0",
         "type": "disk",
         "vm_custom": {},
         "vmid": "c231edc6-adbf-4956-9afe-98e754757803",
         "volumeChain": [
            {
               "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39",
               "imageID": "b16a5774-7507-4162-a8d1-7deaf82bedef",
               "leaseOffset": 0,
               "leasePath": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/b16a5774-7507-4162-a8d1-7deaf82bedef/6c7d1d8a-2566-4a25-a1d0-a93dfc690cad.lease",
               "path": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/b16a5774-7507-4162-a8d1-7deaf82bedef/6c7d1d8a-2566-4a25-a1d0-a93dfc690cad",
               "volumeID": "6c7d1d8a-2566-4a25-a1d0-a93dfc690cad"
            },
            {
               "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39",
               "imageID": "b16a5774-7507-4162-a8d1-7deaf82bedef",
               "leaseOffset": 0,
               "leasePath": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/b16a5774-7507-4162-a8d1-7deaf82bedef/cf49c67f-77db-49fb-b4cd-c21ef2e80386.lease",
               "path": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/b16a5774-7507-4162-a8d1-7deaf82bedef/cf49c67f-77db-49fb-b4cd-c21ef2e80386",
               "volumeID": "cf49c67f-77db-49fb-b4cd-c21ef2e80386"
            }
         ],
         "volumeID": "6c7d1d8a-2566-4a25-a1d0-a93dfc690cad",
         "volumeInfo": {
            "path": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/b16a5774-7507-4162-a8d1-7deaf82bedef/6c7d1d8a-2566-4a25-a1d0-a93dfc690cad",
            "type": "file"
         }
      },
      ...
}



After:
{
   "devices": [
      {
         "address": {
            "bus": "1",
            "controller": "0",
            "target": "0",
            "type": "drive",
            "unit": "0"
         },
         "alias": "ide0-1-0",
         "device": "cdrom",
         "deviceId": "e0f7896a-7055-46e5-8642-112df127999c",
         "discard": false,
         "diskType": "file",
         "format": "raw",
         "iface": "ide",
         "index": 2,
         "name": "hdc",
         "path": "",
         "propagateErrors": "off",
         "readonly": true,
         "shared": "false",
         "specParams": {
            "path": ""
         },
         "type": "disk",
         "vm_custom": {},
         "vmid": "c231edc6-adbf-4956-9afe-98e754757803"
      },
      {
         "address": {
            "bus": "0",
            "controller": "0",
            "target": "0",
            "type": "drive",
            "unit": "0"
         },
         "alias": "scsi0-0-0-0",
         "bootOrder": "1",
         "cache": "none",
         "device": "disk",
         "deviceId": "b16a5774-7507-4162-a8d1-7deaf82bedef",
         "discard": false,
         "diskType": "file",
         "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39",
         "format": "raw",
         "iface": "scsi",
         "imageID": "b16a5774-7507-4162-a8d1-7deaf82bedef",
         "index": 0,
         "name": "sda",
         "path": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/b16a5774-7507-4162-a8d1-7deaf82bedef/cf49c67f-77db-49fb-b4cd-c21ef2e80386",
         "poolID": "bc6ce3e4-07c8-4d98-858d-868dd8aa33e8",
         "propagateErrors": "off",
         "serial": "b16a5774-7507-4162-a8d1-7deaf82bedef",
         "shared": "false",
         "specParams": {},
         "type": "disk",
         "vm_custom": {},
         "vmid": "c231edc6-adbf-4956-9afe-98e754757803",
         "volumeChain": [
            {
               "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39",
               "imageID": "b16a5774-7507-4162-a8d1-7deaf82bedef",
               "leaseOffset": 0,
               "leasePath": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/b16a5774-7507-4162-a8d1-7deaf82bedef/cf49c67f-77db-49fb-b4cd-c21ef2e80386.lease",
               "path": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/b16a5774-7507-4162-a8d1-7deaf82bedef/cf49c67f-77db-49fb-b4cd-c21ef2e80386",
               "volumeID": "cf49c67f-77db-49fb-b4cd-c21ef2e80386"
            }
         ],
         "volumeID": "cf49c67f-77db-49fb-b4cd-c21ef2e80386",
         "volumeInfo": {
            "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39",
            "imageID": "b16a5774-7507-4162-a8d1-7deaf82bedef",
            "leaseOffset": 0,
            "leasePath": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/b16a5774-7507-4162-a8d1-7deaf82bedef/cf49c67f-77db-49fb-b4cd-c21ef2e80386.lease",
            "path": "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/b16a5774-7507-4162-a8d1-7deaf82bedef/cf49c67f-77db-49fb-b4cd-c21ef2e80386",
            "volumeID": "cf49c67f-77db-49fb-b4cd-c21ef2e80386"
         }
      }
   ],
   "external": false,
   "memSize": 1024,
   "smp": "16",
   "vmId": "c231edc6-adbf-4956-9afe-98e754757803",
   "vmName": "vm1",
   "vmType": "kvm",
   "xml": ""
}


Francesco, what do you think?

Comment 8 Francesco Romani 2018-10-15 15:49:04 UTC
(In reply to Benny Zlotnik from comment #7)
> It can be reproduced manually as well by simply performing live merge in a
> 4.1 cluster.

So it reproduces also in 4.1? Let's keep in mind that there is no domainXML in 4.1, so it is unlikely that bugs in domainXML code make this bug happen also in 4.1 :)


> After a failed live merge attempt, I run
> $ virsh -r dumpxml vm41
> <domain type='kvm' id='29'>
>   <name>vm41</name>
>   <uuid>e640d129-4d19-4d4b-bbae-829671fa7dd7</uuid>
>   <metadata xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0"
> xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
>     <ovirt-tune:qos/>
>     <ovirt-vm:vm xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
>     <ovirt-vm:block_jobs>{}</ovirt-vm:block_jobs>
>     <ovirt-vm:destroy_on_reboot
> type="bool">False</ovirt-vm:destroy_on_reboot>
>     <ovirt-vm:memGuaranteedSize type="int">2</ovirt-vm:memGuaranteedSize>
>     <ovirt-vm:startTime type="float">1539428553.61</ovirt-vm:startTime>
>     <ovirt-vm:device devtype="balloon" name="balloon0">
> 
> ...
>         <ovirt-vm:volumeChain>
>             <ovirt-vm:volumeChainNode>
>                
> <ovirt-vm:domainID>33b47b70-f979-4b67-bea9-005de84f3c39</ovirt-vm:domainID>
>                
> <ovirt-vm:imageID>51aa4556-d26b-4482-8137-2979ce73a43b</ovirt-vm:imageID>
>                 <ovirt-vm:leaseOffset type="int">0</ovirt-vm:leaseOffset>
>                
> <ovirt-vm:leasePath>/rhev/data-center/mnt/10.35.0.209:
> _root_storage__domains_sd1/33b47b70-f979-4b67-bea9-005de84f3c39/images/
> 51aa4556-d26b-4482-8137-2979ce73a43b/12edadfa-7a34-4e3e-afdb-c96f46eac99a.
> lease</ovirt-vm:leasePath>
>                
> <ovirt-vm:path>/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/
> 33b47b70-f979-4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-
> 2979ce73a43b/12edadfa-7a34-4e3e-afdb-c96f46eac99a</ovirt-vm:path>
>                
> <ovirt-vm:volumeID>12edadfa-7a34-4e3e-afdb-c96f46eac99a</ovirt-vm:volumeID>
>             </ovirt-vm:volumeChainNode>
>         </ovirt-vm:volumeChain>
>     </ovirt-vm:device>
> </ovirt-vm:vm>
>   </metadata>
> ...
> </domain>
> 
> As can be seen there is only a single node in the volume chain.
> 
> However, if I run
> $ vdsm-client Host getVMFullList
> [
> ...
>                 "xml": ...
>                 "volumeInfo": {
>                     "path":
> "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-
> 4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/23d071c7-
> 1a37-481f-8d11-95b707653f74", 
>                     "type": "file"
>                 }, 
>                 "serial": "51aa4556-d26b-4482-8137-2979ce73a43b", 
>                 "index": 0, 
>                 "iface": "scsi", 
>                 "apparentsize": "1073741824", 
>                 "alias": "scsi0-0-0-0", 
>                 "cache": "none", 
>                 "imageID": "51aa4556-d26b-4482-8137-2979ce73a43b", 
>                 "readonly": "False", 
>                 "shared": "false", 
>                 "truesize": "0", 
>                 "type": "disk", 
>                 "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39", 
>                 "reqsize": "0", 
>                 "format": "cow", 
>                 "deviceId": "51aa4556-d26b-4482-8137-2979ce73a43b", 
>                 "poolID": "bc6ce3e4-07c8-4d98-858d-868dd8aa33e8", 
>                 "device": "disk", 
>                 "path":
> "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-
> 4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/23d071c7-
> 1a37-481f-8d11-95b707653f74", 
>                 "propagateErrors": "off", 
>                 "name": "sda", 
>                 "vm_custom": {}, 
>                 "bootOrder": "1", 
>                 "vmid": "e640d129-4d19-4d4b-bbae-829671fa7dd7", 
>                 "volumeID": "23d071c7-1a37-481f-8d11-95b707653f74", 
>                 "diskType": "file", 
>                 "specParams": {}, 
>                 "discard": false, 
>                 "volumeChain": [
>                     {
>                         "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39", 
>                         "leaseOffset": 0, 
>                         "volumeID": "12edadfa-7a34-4e3e-afdb-c96f46eac99a", 
>                         "leasePath":
> "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-
> 4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/12edadfa-
> 7a34-4e3e-afdb-c96f46eac99a.lease", 
>                         "imageID": "51aa4556-d26b-4482-8137-2979ce73a43b", 
>                         "path":
> "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-
> 4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/12edadfa-
> 7a34-4e3e-afdb-c96f46eac99a"
>                     }, 
>                     {
>                         "domainID": "33b47b70-f979-4b67-bea9-005de84f3c39", 
>                         "leaseOffset": 0, 
>                         "volumeID": "23d071c7-1a37-481f-8d11-95b707653f74", 
>                         "leasePath":
> "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-
> 4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/23d071c7-
> 1a37-481f-8d11-95b707653f74.lease", 
>                         "imageID": "51aa4556-d26b-4482-8137-2979ce73a43b", 
>                         "path":
> "/rhev/data-center/mnt/10.35.0.209:_root_storage__domains_sd1/33b47b70-f979-
> 4b67-bea9-005de84f3c39/images/51aa4556-d26b-4482-8137-2979ce73a43b/23d071c7-
> 1a37-481f-8d11-95b707653f74"
>                     }
>                 ]
>             }, 
> ]
> 
> We can see there are two nodes, this is what's reported to the engine when
> executing MERGE_STATUS.
> The xml property inside the output however, contains the correct libvirt xml
> without the extra VolumaChainNode.

This could be indeed a bug in the comaptibility code in 4.2.
But then: it happens in 4.2 clusters doing live merge on a VM created with 4.1?
Is this the scenario we are checking?


> It seems the self.conf is not synchronized with the xml, specifically the
> "devices" property
> 
> I edited vm.py#untrackBlockJob
> and added 
>         disk_params =
> vmdevices.common.storage_device_params_from_domain_xml(self.id,
> self._domain, self._md_desc, self.log)
>         self._override_disk_device_config(disk_params)
> 
> This seems to have resolved the issue.

that could be a side effect of Iceb73e068cd7e6c2280c68ff1a02f53f72cf4bd3

But in this case, it should be relevant only in master (that changed didn't land in stable branch yet) , and only if vdsm is being run alongside a 4.1 Engine.

Comment 9 Francesco Romani 2018-10-15 16:17:07 UTC
Benny, https://gerrit.ovirt.org/94936 is a partial, up-to-date revert of change Iceb73e068cd7e6c2280c68ff1a02f53f72cf4bd3. Could you please check that patch solves this issue?

Comment 10 Benny Zlotnik 2018-10-16 08:08:06 UTC
(In reply to Francesco Romani from comment #8)
> (In reply to Benny Zlotnik from comment #7)
> > It can be reproduced manually as well by simply performing live merge in a
> > 4.1 cluster.
> 
> So it reproduces also in 4.1? Let's keep in mind that there is no domainXML
> in 4.1, so it is unlikely that bugs in domainXML code make this bug happen
> also in 4.1 :)
> 
I meant domainXML fixes the issue :)
Haven't tried on 4.1

> This could be indeed a bug in the comaptibility code in 4.2.
> But then: it happens in 4.2 clusters doing live merge on a VM created with
> 4.1?
> Is this the scenario we are checking?
> 
The scenario is running live merge on VM in a 4.1 cluster

> But in this case, it should be relevant only in master (that changed didn't
> land in stable branch yet) , and only if vdsm is being run alongside a 4.1
> Engine.


(In reply to Francesco Romani from comment #9)
> Benny, https://gerrit.ovirt.org/94936 is a partial, up-to-date revert of
> change Iceb73e068cd7e6c2280c68ff1a02f53f72cf4bd3. Could you please check
> that patch solves this issue?
It solves the issue when I try to reproduce manually, I'll run OST

Comment 11 Milan Zamazal 2018-11-22 15:19:39 UTC
OST works for me when run locally on oVirt master with cluster level 4.0.

Comment 12 Sandro Bonazzola 2019-01-28 09:34:13 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 13 Benny Zlotnik 2019-02-04 14:57:30 UTC
Moving to modified at this was already fixed in 4.3

Comment 14 Roni 2019-03-06 13:37:27 UTC
Verified: 4.3.2-0.1.el7

Comment 15 Sandro Bonazzola 2019-03-13 16:39:32 UTC
This bugzilla is included in oVirt 4.3.1 release, published on February 28th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.