Created attachment 1746031 [details] engine, vdsm, daemon engine, daemon of the host, qemu log of the VM Description of problem: During an Automation run, the engine fails to complete a full backup attempt on a VM with regular disks permutations [1]. This issue in potential can affect the customer by failing to complete his backup attempts. I am not marking this as a regression, because it wasnt reproduced by any means, but still this issue needs an attention just in case we are hitting something deeper here. The backup attempt fails with the following Java exception [2]. Need to mention that this failure occurred after a failed attempt to download a backed up disks of a previous test. The failure was most probably due to this BZ and had the following ERROR: https://bugzilla.redhat.com/show_bug.cgi?id=1854038 2021-01-07 13:57:35,771+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54) [567adea9-94c5-43f6-b887-68ccd5480430] EVENT_ID: DOWNLOAD_IMAGE_CANCELED_TIMEOUT(1,072), Download was canceled by system. Reason: timeout due to transfer inactivity. VDSM shows only a failure to complete a backup of a previous test VM c35dd489-b9c7-4f79-9091-489103f58eb3 : 2021-01-07 13:57:30,751+0200 ERROR (libvirt/events) [virt.vm] (vmId='c35dd489-b9c7-4f79-9091-489103f58eb3') Block job (untracked) type BACKUP for drive sda was canceled (vm:5595) The VM which is relevant for this issue is the following: vm_TestCase27284_0713580682 c08b1e6b-b026-4119-84d6-3c115f509d72 Here is the invocation of backup: 2021-01-07 14:00:48,455 - MainThread - backups - DEBUG - CREATE request content is -- url:/ovirt-engine/api/vms/c08b1e6b-b026-4119-84d6-3c115f509d72/backups?require_consistency=True body:<backup> <disks> <disk id="63bea3a8-a8ed-41fc-9b40-5caffbb438bf"/> <disk id="cf37cf63-2eba-4745-acd5-2602837dc46c"/> <disk id="d34475c5-d5ef-4f6f-9129-85ea6edcd1d9"/> <disk id="3a1da83d-ba26-4597-ad1b-5ff0aa4b5b4c"/> </disks> The response we are getting is just fine: 2021-01-07 14:00:48,749 - MainThread - backups - DEBUG - Response body for GET request is: <backups> <backup href="/ovirt-engine/api/vms/c08b1e6b-b026-4119-84d6-3c115f509d72/backups/98a2bbac-71f0-4a3e-b3df-78616595fda2" id="98a2bbac-71f0-4a3e-b3df-78616595fda2"> <actions> <link href="/ovirt-engine/api/vms/c08b1e6b-b026-4119-84d6-3c115f509d72/backups/98a2bbac-71f0-4a3e-b3df-78616595fda2/finalize" rel="finalize"/> </actions> <link href="/ovirt-engine/api/vms/c08b1e6b-b026-4119-84d6-3c115f509d72/backups/98a2bbac-71f0-4a3e-b3df-78616595fda2/disks" rel="disks"/> <creation_date>2021-01-07T14:00:48.619+02:00</creation_date> <phase>starting</phase> <host href="/ovirt-engine/api/hosts/8b15e521-a3d5-4b7d-9cba-bff3249bed10" id="8b15e521-a3d5-4b7d-9cba-bff3249bed10"/> <vm href="/ovirt-engine/api/vms/c08b1e6b-b026-4119-84d6-3c115f509d72" id="c08b1e6b-b026-4119-84d6-3c115f509d72"/> </backup> </backups> [1]: Add a disk with {'provisioned_size': 1073741824, 'wipe_after_delete': False, 'storagedomain': 'nfs_2', 'bootable': False, 'shareable': False, 'active': True, 'format': 'raw', 'sparse': True, 'alias': 'disk_virtioraw_0713583472'} Add a disk with {'provisioned_size': 1073741824, 'wipe_after_delete': False, 'storagedomain': 'nfs_2', 'bootable': False, 'shareable': False, 'active': True, 'format': 'cow', 'sparse': True, 'alias': 'disk_virtio_scsicow_0713583472'} Add a disk with {'provisioned_size': 1073741824, 'wipe_after_delete': False, 'storagedomain': 'nfs_2', 'bootable': False, 'shareable': False, 'active': True, 'format': 'cow', 'sparse': True, 'alias': 'disk_virtiocow_0713583472'} Add a disk with {'provisioned_size': 1073741824, 'wipe_after_delete': False, 'storagedomain': 'nfs_2', 'bootable': False, 'shareable': False, 'active': True, 'format': 'raw', 'sparse': True, 'alias': 'disk_virtio_scsiraw_0713583472'} [2]: 2021-01-07 14:00:48,725+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-69) [backups_create_a5a5036c-b370-4a4b] Error invoking callback method 'doPolling' for 'ACTIVE' command '98a2bbac-71f0-4a3e-b3df-78616595fda2' 2021-01-07 14:00:48,725+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-69) [backups_create_a5a5036c-b370-4a4b] Exception: java.lang.ClassCastException: class org.ovirt.engine.core.bll.RedefineVmCheckpointCommand cannot be cast to class org.ovirt.engine.core.bll.SerialChildExecutingCommand (org.ovirt.engine.core.bll.RedefineVmCheckpointCommand and org.ovirt.engine.core.bll.SerialChildExecutingCommand are in unnamed module of loader 'deployment.engine.ear.bll.jar' @b6194c1) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:29) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:80) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) 2021-01-07 14:00:49,797+02 ERROR [org.ovirt.engine.core.bll.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-41) [backups_create_a5a5036c-b370-4a4b] Ending command 'org.ovirt.engine.core.bll.StartVmBackupCommand' with failure. Version-Release number of selected component (if applicable): rhv-4.4.4-8 How reproducible: Couldnt reproduce for the second time. The reason for this failure might have a clue in a previously failed test due to failed download of a backed up disks (see description) Steps to Reproduce: 1. Clone VM from template 2. Attach to it disks permutations with FS on them 3. Attempt a backup for all of the disks Actual results: Backup fails Expected results: Backup should succeed. Additional info: The following logs are attached: engine, vdsm, daemon engine, daemon of the host, qemu log of the VM
Reproduced again on rhv-4.4.5-4 After making a second full backup for the same VM and the same Disks, then after a successful checkpoints redefinition, there is an engine error RedefineVmCheckpointCommand cannot be cast to class org.ovirt.engine.core.bll.SerialChildExecutingCommand [1] Attaching engine + VDSMs logs + libvirtd log for the relevant host (host_mixed_3) + qemu log for the relevant VM. [1]: 2021-02-06 02:50:09,552+02 INFO [org.ovirt.engine.core.bll.StartVmBackupCommand] (default task-2) [backups_create_b3715fb3-5aff-4ba1] Successfully redefined previous VM checkpoints for VM 'ae962689-a4e6-4a3d-9db9-b9c842abd62a' 2021-02-06 02:50:09,552+02 INFO [org.ovirt.engine.core.bll.StartVmBackupCommand] (default task-2) [backups_create_b3715fb3-5aff-4ba1] Creating VmCheckpoint entity for VM 'ae962689-a4e6-4a3d-9db9-b9c842abd62a' 2021-02-06 02:50:09,559+02 INFO [org.ovirt.engine.core.bll.StartVmBackupCommand] (default task-2) [backups_create_b3715fb3-5aff-4ba1] Created VmCheckpoint entity 'fd4a3fdb-9eef-4cba-bcb5-8e84604a9a3a' 2021-02-06 02:50:09,574+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-2) [backups_create_b3715fb3-5aff-4ba1] EVENT_ID: VM_BACKUP_STARTED(10,790), Backup accfd32a-2761-4067-b013-a9049f4ac4b7 for VM vm_TestCase27319_0602481529 started (User: admin@internal-authz). 2021-02-06 02:50:09,624+02 INFO [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-8) [backups_create_b3715fb3-5aff-4ba1] Exception in invoking callback of command RedefineVmCheckpoint (accfd32a-2761-4067-b013-a9049f4ac4b7): ClassCastException: class org.ovirt.engine.core.bll.RedefineVmCheckpointCommand cannot be cast to class org.ovirt.engine.core.bll.SerialChildExecutingCommand (org.ovirt.engine.core.bll.RedefineVmCheckpointCommand and org.ovirt.engine.core.bll.SerialChildExecutingCommand are in unnamed module of loader 'deployment.engine.ear.bll.jar' @79e8ef70) 2021-02-06 02:50:09,624+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-8) [backups_create_b3715fb3-5aff-4ba1] Error invoking callback method 'doPolling' for 'ACTIVE' command 'accfd32a-2761-4067-b013-a9049f4ac4b7' 2021-02-06 02:50:09,624+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-8) [backups_create_b3715fb3-5aff-4ba1] Exception: java.lang.ClassCastException: class org.ovirt.engine.core.bll.RedefineVmCheckpointCommand cannot be cast to class org.ovirt.engine.core.bll.SerialChildExecutingCommand (org.ovirt.engine.core.bll.RedefineVmCheckpointCommand and org.ovirt.engine.core.bll.SerialChildExecutingCommand are in unnamed module of loader 'deployment.engine.ear.bll.jar' @79e8ef70) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:29) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:80) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) 2021-02-06 02:50:10,635+02 ERROR [org.ovirt.engine.core.bll.StartVmBackupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-90) [backups_create_b3715fb3-5aff-4ba1] Ending command 'org.ovirt.engine.core.bll.StartVmBackupCommand' with failure.
Created attachment 1755683 [details] Second reproduction logs
I wasn't able to reproduce this bug but I am pretty sure that the fix will solve the problem. There are no steps to reproduce and we can verify it by running the regular tiers and make sure we don't see that error again.
This issue wasnt seen again from the point when I started using the 'consistency' flag on 'true' and the SE boolean was set correctly on the guest. Current version is rhv-release-4.4.5-7-001.noarch Moving this to verified, and if the issue is seen again, will re-open.
This bugzilla is included in oVirt 4.4.5 release, published on March 18th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.