Bug 1840414

Summary: Live merge failure with libvirt error virDomainBlockCommit() failed
Product: [oVirt] vdsm Reporter: Marco Fais <evilmf>
Component: CoreAssignee: Milan Zamazal <mzamazal>
Status: CLOSED CURRENTRELEASE QA Contact: Beni Pelled <bpelled>
Severity: high Docs Contact:
Priority: high    
Version: 4.40.16CC: ahadas, bugs, mavital, mtessun, mzamazal
Target Milestone: ovirt-4.4.2Keywords: EasyFix, Upstream, ZStream
Target Release: 4.40.24Flags: mtessun: ovirt-4.4?
mtessun: planning_ack+
ahadas: devel_ack+
mavital: testing_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.40.24 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-18 07:13:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
VDSM extract with live merge failure none

Description Marco Fais 2020-05-26 21:57:49 UTC
Created attachment 1692446 [details]
VDSM extract with live merge failure

Description of problem:

Live merge fails on ovirt-node 4.4.0 when deleting snapshots on running VMs. Error seems to be raised by libvirt.

Similar to issues described here: https://bugzilla.redhat.com/show_bug.cgi?id=1785939


Version-Release number of selected component (if applicable):
ovirt-node 4.4.0

How reproducible:
Always

Steps to Reproduce:
1.Create a snapshot on running VM
2.Try to delete the snapshot
3.Snapshot not removed, error raised. 

Actual results:

[api.virt] FINISH merge return={'status': {'code': 52, 'message': 'Merge failed'}} from=::ffff:10.144.138.240,60914, flow_id=dbf9c831-e0cb-4891-a38c-d61136daf029, vmId=baaf6be8-dcf4-4f26-b0f1-435287eeed95 (api:54)

Full log of failed live merge included.

Expected results:
Live merge should complete successfully

Additional info:
Storage is glusterfs, running on ovirt-node 4.3.9

Comment 1 Ryan Barry 2020-05-27 04:28:46 UTC
Yet another block merge. Milan, Nir, dupe?

Comment 2 Milan Zamazal 2020-05-27 10:37:19 UTC
The Vdsm traceback and error message look the same as in bug 1785939. Marco, does it happen with libvirt >= 6.0.0-17.module+el8.2.0+6257+0d066c28?

Comment 3 Marco Fais 2020-05-27 10:48:05 UTC
Milan, would like to test it -- is there a ovirt-node pre-release I can use?
Or do I need to start from Centos 8.2?

Not sure on how best to get libvirt 6 on the current setup (ovirt-node 4.4.0)

Thanks,
Marco

Comment 4 Milan Zamazal 2020-05-27 11:22:22 UTC
I don't know about ovirt-node releases, some pre-releases are listed at https://www.ovirt.org/download/node.html, I guess "Latest master" could already include libvirt 6. CentOS Advanced Virt was updated with libvirt 6 etc. last week, it should be available with current ovirt-release-master package, but beware, there may be other issues with the updated versions.

If you manage to test it with libvirt 6, please update this bug. Once libvirt 6 in CentOS Advanced Virt stabilizes, we should require it also for CentOS in Vdsm dependencies.

Comment 5 Arik 2020-06-08 14:31:15 UTC
could you please check with centos stream?

Comment 6 Marco Fais 2020-06-08 15:16:44 UTC
Hi Arik,

I have just checked CentOS Stream and I see it ships with libvirt 4.5.0:

libvirt-4.5.0-35.3.module_el8.1.0+297+df420408.x86_64

So I don't think it would be a useful test...
Can't find the advanced virtualization packages for CentOS Stream -- any suggestion?

In the meantime I can confirm the bug is there in three different environment (all based on oVirt-node 4.4.0) and is always 100% reproducible.

Regards,
Marco

Comment 7 Milan Zamazal 2020-06-09 08:27:46 UTC
(In reply to Marco Fais from comment #6)
>
> I have just checked CentOS Stream and I see it ships with libvirt 4.5.0:
> 
> libvirt-4.5.0-35.3.module_el8.1.0+297+df420408.x86_64
> 
> So I don't think it would be a useful test...
> Can't find the advanced virtualization packages for CentOS Stream -- any
> suggestion?

Hi Marco, if you are willing to check the latest oVirt Node Master from https://www.ovirt.org/download/node.html, it contains the new libvirt. The current ovirt-release-master package also sets up the following repo, which contains up to date versions of libvirt and QEMU:

  [ovirt-master-advanced-virtualization-testing]
  name=Advanced Virtualization testing packages for $basearch
  baseurl=https://buildlogs.centos.org/centos/8/virt/$basearch/advanced-virtualization/
  enabled=1
  gpgcheck=0
  module_hotfixes=1

You can try either oVirt Node Master or the ovirt-release-master package or adding the repo manually. It's all master, so you risk replacing old bugs with new ones, but you may want to try it on one of your hosts whether it fixes your snapshot problem. Please let us know.

Comment 8 Marco Fais 2020-06-13 14:47:24 UTC
> Hi Marco, if you are willing to check the latest oVirt Node Master from
> https://www.ovirt.org/download/node.html, it contains the new libvirt. The
> current ovirt-release-master package also sets up the following repo, which
> contains up to date versions of libvirt and QEMU:
> 
>   [ovirt-master-advanced-virtualization-testing]
>   name=Advanced Virtualization testing packages for $basearch
>  
> baseurl=https://buildlogs.centos.org/centos/8/virt/$basearch/advanced-
> virtualization/
>   enabled=1
>   gpgcheck=0
>   module_hotfixes=1
> 
> You can try either oVirt Node Master or the ovirt-release-master package or
> adding the repo manually. It's all master, so you risk replacing old bugs
> with new ones, but you may want to try it on one of your hosts whether it
> fixes your snapshot problem. Please let us know.

Hi Milan,

thanks a million -- that's exactly what I needed!

I have added the repository in one of my ovirt-node-4.4.0 hosts and installed the packages (I had to disable the versionlock plugin in order to upgrade the existing versions).
I can confirm everything works fine with the packages in the testing repository:

libvirt-daemon.x86_64                             6.0.0-17.el8
libguestfs.x86_64                                 1.40.2-22.el8
qemu-kvm.x86_64                                   4.2.0-19.el8
(and the other relevant packages)

Storage backend used is glusterfs replica 3 -- I have problems (qemu crashes) removing snapshots with a distributed-disperse backend but looking at the logs it might be a problem with the gluster storage rather than qemu-kvm.

If you need any logs let me know. If I use the same VM/same backend on a standard ovirt-node-4.4.0 system I can replicate the issues again.

Should I roll-out the repository to the rest of the cluster or should I wait for the above to be integrated in a future ovirt-node release?

Thanks again for your help...
Marco

Comment 9 Arik 2020-06-15 19:54:45 UTC
(In reply to Marco Fais from comment #8)
> Storage backend used is glusterfs replica 3 -- I have problems (qemu
> crashes) removing snapshots with a distributed-disperse backend but looking
> at the logs it might be a problem with the gluster storage rather than
> qemu-kvm.
> 
> If you need any logs let me know. If I use the same VM/same backend on a
> standard ovirt-node-4.4.0 system I can replicate the issues again.

Sounds like a different issue, if you'd like us to take a look into that please file a separate bug.

> 
> Should I roll-out the repository to the rest of the cluster or should I wait
> for the above to be integrated in a future ovirt-node release?

As Milan wrote above, by using that repo "you risk replacing old bugs with new ones".
It really depends on the risk you're willing to take for using, possibly, less mature tools but that address the reported issue.

Comment 10 Marco Fais 2020-06-16 14:00:29 UTC
(In reply to Arik from comment #9)

Arik, 

thanks for your comments

> > If you need any logs let me know. If I use the same VM/same backend on a
> > standard ovirt-node-4.4.0 system I can replicate the issues again.
> 
> Sounds like a different issue, if you'd like us to take a look into that
> please file a separate bug.

I have opened an issue with the gluster team -- see here: https://github.com/gluster/glusterfs/issues/1309
Not sure if/where to track it on the oVirt side...

> > Should I roll-out the repository to the rest of the cluster or should I wait
> > for the above to be integrated in a future ovirt-node release?
> 
> As Milan wrote above, by using that repo "you risk replacing old bugs with
> new ones".
> It really depends on the risk you're willing to take for using, possibly,
> less mature tools but that address the reported issue.

Thanks -- I will roll-out only on our test cluster for the moment until I get more clarify on which ovirt-node release will contain the updated qemu/libvirt packages.

Regards,
Marco

Comment 11 Arik 2020-06-28 19:54:49 UTC
The only thing that I see to do here is adding a newer libvirt version as a dependency of VDSM, once available on centos.

Comment 13 Sandro Bonazzola 2020-09-18 07:13:03 UTC
This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.